Abstract
As a new data processing era like Big Data, Cloud Computing, and Internet of Things approaches, the amount of data being collected in databases far exceeds the ability to reduce and analyze these data without the use of automated analysis techniques, data mining. As the importance of data mining has grown, one of the critical issues to emerge is how to scale data mining techniques to larger and complex databases so that it is particularly imperative for computationally intensive data mining tasks such as identifying natural clusters of instances. In this paper, we suggest an optimized combinatorial clustering algorithm for noisy performance which is essential for large data with random sampling. The algorithm outperforms conventional approaches through various numerical and qualitative thresholds like mean and standard deviation of accuracy and computation speed.
Original language | English |
---|---|
Pages (from-to) | 1135-1148 |
Number of pages | 14 |
Journal | Cluster Computing |
Volume | 20 |
Issue number | 2 |
DOIs | |
State | Published - 1 Jun 2017 |
Bibliographical note
Publisher Copyright:© 2017, The Author(s).
Keywords
- Data clustering
- Nested partitions method
- Optimized combinatorial clustering algorithm
- Stochastic process