More Videos...
 

Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest

Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest In the era of big data, the applications generating tremendous amount of data are becoming the main focus of attention as the wide increment of data generation and storage that has taken place in the last few years. This scenario is challenging for data mining techniques which are not arrogated to the new space and time requirements. In many of the real world applications, classification of imbalanced data-sets is the point of attraction. Most of the classification methods focused on two-class imbalanced problem. So, it is necessary to solve multi-class imbalanced problem, which exist in real-world domains. In the proposed work, we introduced a methodology for classification of multi-class imbalanced data. This methodology consists of two steps: In first step we used Binarization techniques (OVA and OVO) for decomposing original dataset into subsets of binary classes. In second step, the SMOTE algorithm is applied against each subset of imbalanced binary class in order to get balanced data. Finally, to achieve classification goal Random Forest (RF) classifier is used. Specifically, oversampling technique is adapted to big data using MapReduce so that this technique is able to handle as large data-set as needed. An experimental study is carried out to evaluate the performance of proposed method. For experimental analysis, we have used different datasets from UCI repository and the proposed system is implemented on Apache Hadoop and Apache Spark platform. The results obtained shows that proposed method outperforms over other methods.

Recent Projects

More +