WIT Press


A Multi-criteria Decision Making Approach In Feature Selection For Enhancing Text Categorization

Price

Free (open access)

Volume

35

Pages

11

Published

2005

Size

420 kb

Paper DOI

10.2495/DATA050081

Copyright

WIT Press

Author(s)

S. Doan & S. Horiguchi

Abstract

This paper considers the problem of feature selection in text categorization. Previous works in feature selection often used a filter model in which features, after ranked by a measure, are selected based on a given threshold. In this paper, we present a novel approach to feature selection based on multi-criteria decision making of each feature. Instead of only one criterion, multi-criteria of a feature are used; and a procedure based on each threshold of the criterion is proposed. This framework seems to be suitable for text data and can be applied to feature selection in text categorization. Experimental results on Reuters-21578 benchmark data show that our approach has a promising scheme and enhances the performance of a text categorization system. 1 Introduction Feature selection is an interesting issue recently in machine learning as well as data mining communities [1, 2, 3]. Up to now, there has been two most common approaches: the filter and the wrapper [4, 5]. Both approaches use prior knowledge as the heuristics indicator for selecting the optimal feature subset. The filter approach uses measurements of features as the criterion. In [6], Huang listed four measurements of features belonging to the filtering approach, that is information, distance, dependence and consistency measures. Based on a measurement, the optimal subset of features is chosen by filtering the \“noise” or \“irrelevant” features. Apart from the filter model, the wrapper approach based on the criterion of

Keywords