Design of a Hybrid Ensemble Feature Selection Framework for Big Data Text Mining

  • Smah Smari Computer Science Department, Laboratory of Computer science of Oran (LIO), Oran 1 University, Ahmed Ben Bella, Oran, Algeria. https://orcid.org/0000-0002-5787-5380
  • Barigou Fatiha Computer Science Department, Laboratory of Computer science of Oran (LIO), Oran 1 University, Ahmed Ben Bella, Oran, Algeria. https://orcid.org/0000-0001-5444-4000
  • Belalem Ghalem Computer Science Department, Laboratory of Computer science of Oran (LIO), Oran 1 University, Ahmed Ben Bella, Oran, Algeria. https://orcid.org/0000-0002-9694-7586

Abstract

The growing volume of textual data often exceeds the capacity of available computing resources, and conventional machine learning algorithms struggle to scale up. Today, the quality of data is becoming more critical than its raw quantity: it is therefore essential to transform massive data into intelligent data through appropriate pre-processing steps. Feature selection plays a key role in this process. In this work, we propose the design of a hybrid ensemble-based feature selection framework for processing large-scale textual data. The approach is based on the MFD-AFSA algorithm combined with different feature evaluation functions, applied on multiple data subsets. To improve scalability, we also outline a distributed strategy in an Apache Spark environment, based on the Random Sample Partitioning model. Finally, we introduce an automatic approximation mechanism, which we call auto-approximation, enabling selection sets to be built dynamically via an approximation technique. This work is part of a methodological design approach; experimental validation and practical evaluations will be the subject of future work.

Downloads

Download data is not yet available.
Published
2026-03-25
How to Cite
Smari, S., Fatiha, B., & Ghalem, B. (2026). Design of a Hybrid Ensemble Feature Selection Framework for Big Data Text Mining. ITEGAM-JETIA, 12(58), 589-601. https://doi.org/10.5935/jetia.v12i58.3192
Section
Articles