Clustering Based Feature Data Selection Technique Algorithm for High Dimensional Data: A Novel Approach

Identifying a subset of the most valuable features that gives the same results as the whole collection of features is what feature selection entails. Both the efficiency and effectiveness of a feature selection method can be evaluated. While efficiency is concerned with the amount of time it takes to locate a subset of features, effectiveness is concerned with the subset’s quality. Based on these criteria, this study presents and tests FAST, a fast clustering-based feature selection approach. The FAST algorithm is split into two parts. In the beginning, graph-theoretic clustering methods are employed to divide characteristics into clusters. In the second stage, the most representative feature from each cluster that is highly associated to target classes is picked to create a subset of features. FAST’s clustering-based technique is expected to yield a subset of valuable and independent features since the attributes in separate clusters are relatively independent. To assure FAST’s efficiency, we adopt the efficient Minimum-spanning tree clustering approach. The FAST algorithm’s efficiency and efficacy are evaluated through an empirical investigation. Before and after feature selection, four types of well-known classifiers, including the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER, are compared to FAST and several representative feature selection algorithms, such as FCBF, ReliefF, CFS, Consist, and FOCUS-SF. According to the findings, which were based on 35 publicly accessible real-world high-dimensional image, microarray, and text data, FAST not only delivers smaller subsets of features but also improves the performances of the four types of classifiers.

Author(S) Details

Amos R
Department of MCA, MIT Mysore, India.

Kowshik N
Department of MCA, MIT Mysore, India.

Suraksha M. S
Department of MCA, MIT Mysore, India.

View Book:-

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top