Header Repository Gunadarma

Repository Universitas Gunadarma >
E-Journal >
E-Journal Komputer >

Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/726

Title: Concept-Based Text Document Clustering
Authors: Hamzah, Amir
Susanto, Adhi
Soesianto, F.
Keywords: Text Document Clustering
intensively
Issue Date: 17-Jun-2007
Publisher: Proceedings of the International Conference on Electrical Engineering and Informatics
Series/Report no.: C-06;
Abstract: Text document clustering has been intensively studied because of its important role in text-mining and information retrieval. High dimensionality problem caused by high number of words is always happened in vector space model clustering. On the other hand, a text document is not only a collection of word (“bag of word”) but also a collection of concept. Therefore if we can transform term-document matrix into concept-document matrix the reduction of dimension will be significant. This paper report on experiments of transformation of the matrix and the performance of concept-based clustering. The concept-document matrix was constructed by utilizing cluster centre . Three clustering models was chosen i.e. hierachical, partional and hybrid. Four similarity technique i.e. GroupAverage, CompleteLink, SingleLink, and ClusterCenter were tried for hierarchical, K-Means and Bisecting K-Mean for partitonal and buckshot for hybrid. Document collections from manually categorized of 500-800 news text was used to test these algorithms by using F-measure as criteria of clustering performance. Results show that by using concept-based clustering the performance of clastering can significantly be improved from 80% to 90% compare to word-based clustering.
URI: http://hdl.handle.net/123456789/726
ISSN: 978-979-16338-0-2
Appears in Collections:E-Journal Komputer

Files in This Item:

File Description SizeFormat
T Technology (General) C-06.PDF733.89 kBAdobe PDFView/Open

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! Repository Software Copyright © 2002-2010  Duraspace - Feedback