|
Repository Universitas Gunadarma >
E-Journal >
E-Journal Komputer >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/123456789/726
|
| Title: | Concept-Based Text Document Clustering |
| Authors: | Hamzah, Amir Susanto, Adhi Soesianto, F. |
| Keywords: | Text Document Clustering intensively |
| Issue Date: | 17-Jun-2007 |
| Publisher: | Proceedings of the International Conference on Electrical Engineering and Informatics |
| Series/Report no.: | C-06; |
| Abstract: | Text document clustering has been intensively studied because of its important role in text-mining and information retrieval. High dimensionality problem caused by high number of words is always happened in vector space model clustering. On the other hand, a text document is not only a collection of word (“bag of word”) but also a collection of concept. Therefore if we can transform term-document matrix into concept-document matrix the reduction of dimension will be significant. This paper report on experiments of transformation of the matrix and the performance of concept-based clustering. The concept-document matrix was constructed by utilizing cluster centre . Three clustering models was chosen i.e. hierachical, partional and hybrid. Four similarity technique i.e. GroupAverage, CompleteLink, SingleLink, and ClusterCenter were tried for hierarchical, K-Means and Bisecting K-Mean for partitonal and buckshot for hybrid. Document collections from manually categorized of 500-800 news text was used to test these algorithms by using F-measure as criteria of clustering performance. Results show that by using concept-based clustering the performance of clastering can significantly be improved from 80% to 90% compare to word-based clustering. |
| URI: | http://hdl.handle.net/123456789/726 |
| ISSN: | 978-979-16338-0-2 |
| Appears in Collections: | E-Journal Komputer
|
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
|