Support Vector Machine and Back Propagation Neural Network Approach for Text Classification


Yaqeen Saad

Department of Computer Science,

University of Anbar

Anbar, Iraq

This email address is being protected from spambots. You need JavaScript enabled to view it.


Khaled Shaker

Department of Computer Science,

University of Anbar

Anbar, Iraq

This email address is being protected from spambots. You need JavaScript enabled to view it.


Abstract

Text classification is the process of inserting text into one or additional categories. Text categorization has many of significant application, Mostly in the field of organization, and for browsing within great groups of document. It is sometimes completed by means of "machine learning.". Since the system is built based on a wide range of document features."Feature selection." is an important approach within this process, since there are typically several thousand possible features terms. Within text categorization, The target goal of features selection is to improve the efficiency of procedures and reliability of classification by deleting features that have no relevance and non-essential terms. While keeping terms which hold enough data that facilitate with the classification task. The target goal of this work is to increase the efficient text categorization models. Within the "text mining" algorithms, a document is appearing as "vector" whose dimension is that the range of special keywords in it, which can be very large. Classic document categorization may be computationally costly. Therefore, feature extraction through the singular valued decomposition is employed for decrease the dimensionality of the documents, we are applying classification algorithms based on "Back propagation" and "Support Vector Machine." methodology. before the classification we applied "Principle Component Analysis." technique in order to improve the result accuracy . We then compared the performance of these two algorithms via computing standard precision and recall for the documentscollection.                                                                                                          

 


Download PDF File

DOI : 10.21928/juhd.20170610.40


 

REFERENCES

  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., and Spyropoulos, C. D. "An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages". In Proceedings of SIGIR-00, 23rd ACM International Conference on Research and Development in Information Retrieval (Athens, GR, 2000), pp. 160–167.
  2. Janez Brank, Marko Grobelnik, Nataˇsa Mili´c-Frayling, Dunja Mladeni´c" Interaction of feature selection methods and linear classification model"s. Proc. ICML-2002 Workshop on Text Learning, pp. 12–17, 2002. Longer version available as Microsoft Research Technical Report MSR-TR-2002-63, 12 June 2002.34
  3. Kwok, J.T-K. " Automated Text Categorization Using Support Vector Machine". Proceedings of the International Conference on Neural Information Processing (ICONIP)1998.
  4. Ashis Kumar Mandal1and Rikta Sen "Supervised Learning MethodsFor Bangla Web Document Categorization" International Journal ofArtificial Intelligence & Applications (IJAIA), Vol. 5, No. 5, September2014 DOI.
  5. Erlin,Unang Rio "Text Message Categorization of Collaborative Learning Skills in Online Discussion Using Support Vector Machine".International Conference on Computer, Control, Informatics and Its Applications. 2013
  6. M. A. Wajeed and T. Adilakshmi, "Text Classification Using Machine Learning," Journal of Theoretical and Applied Information Technology, vol. 7, no. 2,pp. 119_123, 2009.
  7. Fabrizio Sebastiani. "automatic Text classification",. In Keith Brown (ed.), The Encyclopedia of Language and Linguistics, 2ndEdition, Vol. 14, Elsevier Science, Amsterdam, NL, 2004.
  8. Addis, A. "Study and Development of Novel Techniques for Hierarchical Text Categorization". PhD Thesis, Electrical and Electronic Engineering Dept., University of Cagliari, Italy.2010
  9. Dr. Ronen Feldman and James Sanger,"THE TEXT MINING HANDBOOK", Advanced Approaches in Analyzing Unstructured Data, Published in the United States of America by Cambridge University Press, New York,2007.
  10. Ronen Feldman and James Sanger. "The Text Mining Handbook:Advanced Approaches in Analyzing Unstructured Data". Cambridge University Press, New York, NY, USA, 2007.
  11. G. Forman, "An extensive empirical study of feature selection metrics for text classification," The Journal of Machine Learning Research, vol. 3, pp. 1289{ 1305, 2003.
  12. A. Dasgupta, “Feature selection methods for text classification.”,In Proceedings of the 13th ACMSIGKDD international conference on Knowledge discovery and data mining,pp. 230 -239, 2007
  13. S. M. Kamruzzaman1, Farhana Haider and Ahmed Ryadh Hasan," Text Classification Using Data Mining",2005.
  14. J. Ramos, "Using tf-idf to determine word relevance in document queries," inProceedings of the First Instructional Conference on Machine Learning, 2003.
  15. Anirban Dasgupta “Feature Selection Methods for Text Classification “KDD’07, August 12–15,2007.
  16. Y. Saeys, I. Inza, and P. Larra~naga, "A review of feature selection techniquesin bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507{2517, 2007.
  17. M. Aljlayl and O. Frieder, "On arabic search: improving the retrieval effectiveness via a light stemming approach," in Proceedings of the eleventh inter- national conference on Information and knowledge management, pp. 340{347, ACM, 2002.
  18. Kiritchenko, S. "Hierarchical Text Categorization and Its Application to Bioinformatics". PhD Thesis, School of Information Technology and Engineering, Faculty of Engineering, University of Ottawa, Ottawa, Canada.2005.
  19. Z. Markov and D. T. Larose, "Data mining the Web: uncovering patterns inWeb content, structure, and usage." Wiley-Interscience, 2007.
  20. Shigeo Abe, "support vector machine for pattern recognition" second edition ,springer, new york ,2010.
  21. Gunn, S., "Support Vector Machines for Classification and Regression". http://homepages.cae.wisc.edu/~ece539/software/svmtoolbox/svm.pdf, (Date:05/ 02/06).
  22. Thorsten Joachims" Text categorization with support vector machines" learning with many relevant features, Proc. of ECML-98, 10th EuropeanConference on Machine Learning, Springer Verlag, Heidelberg, DE, ,pp. 137-142.1998
  23. Christopher J. C. Burges" A tutorial on support vector machines for patternRecognition". Data Mining and Knowledge Discovery, 2(2), pp. 955-974.1998
  24. Miguel E. Ruiz, Padmini Srinivasan; “Automatic Text Categorization Using Neural Network", In Proceedings of the 8th ASIS SIG/CR Workshop on Classification Research, pp. 59-72. 1998.
  25. Mikhail Krivenko, Vitaly Vasilyev “ Sequential Latent Semantic Indexing” ,proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors ,Article no3 , 2009.
  26. David D. Lewis, ‘Reuters 21578, Distribution 1.0 Test collection’(n.d.)www.daviddlewis.com/resources/testcollections/reuters21578/