Abstract
Due to the explosive growth of available information on the World Wide Web (WWW), users have suffered from the information overload. To alleviate this problem, there is a need for an intelligent tool to help the users screening and filtering for interesting and useful information. In this paper, a method of automatically identifying topics for Web documents via a classification technique is proposed. Topic identification can be applied as a filtering tool for recommender systems to prune down the number of documents to within some particular topics. We adopt the fuzzy association concept as a machine learning technique to classify the documents into some predefined categories or topics. Our approach is compared to the vector space model with the cosine coefficient using the data sets collected from three different Web portals: Yahoo!, Open Directory Project and Excite. The results show that our approach yields higher classification accuracy compared to the vector space model.