Implementasi Text-Mining untuk Analisis Sentimen pada Twitter dengan Algoritma Support Vector Machine
DOI:
https://doi.org/10.23887/jstundiksha.v12i1.52358Keywords:
Sentiment Analysis, Support Vector Machine, TwitterAbstract
Setiap tahun, jumlah orang yang menggunakan media sosial bertambah seiring dengan jumlah orang yang menggunakan internet. Peningkatan tersebut diiringi dengan meningkatnya informasi pada internet yang tentunya informasi tersebut mempunyai nilai jika dilakukan analisa. Untuk menganalisa data dalam jumlah besar dapat menggunakan teknik text mining. Text mining mampu memproses untuk memperoleh informasi berkualitas tinggi dari teks. Text mining juga dapat digunakan untuk menganalisa informasi seperti sentimen dari sebuah kalimat dengan sangat cepat untuk memudahkan dalam mendapatkan informasi yang berkualitas. Informasi diproses berasal dari media sosial berbasis text yaitu twitter yang mana pengambilan data dilakukan dengan bantuan Application Programming Interface dan menggunakan kata kunci berupa sebuah kata atau hashtag. Kalimat tersebut akan dilakukan proses text mining dengan menggunakan algoritma Support Vector machine untuk menghasilkan klasifikasi dari sentimen suatu kalimat ke dalam sentiment positif, netral atau negatif. Tingkat akurasi yang dihasilkan oleh proses ini adalah sebesar 73% berdasarkan data sentimen yang dimiliki. Tingkat akurasi dalam melakukan text mining sangat dipengarui pada proses Pre-Processing karena terdapat banyak kata perlu dilakukan pengelolahan lebih lanjut.
References
Ahuja, R., Rastogi, H., Choudhuri, A., & Garg, B. (2015). Stock market forecast using sentiment analysis. 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), 1008–1010.
Altawaier, M. M., & Tiun, S. (2016). Comparison of machine learning approaches on Arabic twitter sentiment analysis. International Journal on Advanced Science, Engineering and Information Technology, 6(6), 1067–1073. https://doi.org/10.18517/ijaseit.6.6.1456.
Batista, F., & Ribeiro, R. (2013). Sentiment analysis and topic classification based on binary maximum entropy classifiers. Procesamiento Del Lenguaje Natural, 50, 77–84.
Benchimol, J., Kazinnik, S., & Saadon, Y. (2020). Communication and transparency through central bank texts. 132nd Annual Meeting of the American Economic Association.
Benchimol, J., Kazinnik, S., & Saadon, Y. (2022). Text mining methodologies with R: An application to central bank texts. Machine Learning with Applications, 8(March 2021), 100286. https://doi.org/10.1016/j.mlwa.2022.100286.
Chen, D., Wang, L., & Li, L. (2015). Position computation models for high-speed train based on support vector machine approach. Applied Soft Computing, 30, 758–766. https://doi.org/https://doi.org/10.1016/j.asoc.2015.01.017.
Fauzi, M. A. (2018). Random forest approach fo sentiment analysis in Indonesian language. Indonesian Journal of Electrical Engineering and Computer Science, 12(1), 46–50. https://doi.org/10.11591/ijeecs.v12.i1.pp46-50.
Guenther, N., & Schonlau, M. (2016). Support Vector Machines. The Stata Journal: Promoting Communications on Statistics and Stata, 16(4), 917–937. https://doi.org/10.1177/1536867X1601600407.
Kartiwi, M., Gunawan, T. S., Arundina, T., & Omar, M. A. (2018). Feature Selection for Financial Data Classification: Islamic Finance Application. 2018 IEEE 5th International Conference on Smart Instrumentation, Measurement and Application (ICSIMA), 1–4. https://doi.org/10.1109/ICSIMA.2018.8688803.
Kashina, M., Lenivtceva, I. D., & Kopanitsa, G. D. (2020). Preprocessing of unstructured medical data: The impact of each preprocessing stage on classification. Procedia Computer Science, 178(2019), 284–290. https://doi.org/10.1016/j.procs.2020.11.030.
Kemp, S. (2019). DIGITAL 2019: GLOBAL DIGITAL OVERVIEW. https://datareportal.com/reports/digital-2019-global-digital-overview.
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information (Switzerland), 10(4). https://doi.org/10.3390/info10040150.
Kremer, J., Steenstrup Pedersen, K., & Igel, C. (2014). Active learning with support vector machines. WIREs Data Mining and Knowledge Discovery, 4(4), 313–326. https://doi.org/https://doi.org/10.1002/widm.1132.
Leelawat, N., Jariyapongpaiboon, S., Promjun, A., Boonyarak, S., Saengtabtim, K., Laosunthara, A., Yudha, A. K., & Tang, J. (2022). Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using machine learning. Heliyon, 8(10), e10894. https://doi.org/10.1016/j.heliyon.2022.e10894.
Liu, B., Hu, M., & Cheng, J. (2005). Opinion Observer: Analyzing and Comparing Opinions on the Web. Proceedings of the 14th International Conference on World Wide Web, 342–351. http://dl.acm.org/citation.cfm?id=1060797.
Masdevid. (2021). Kata Positif dan Negatif. https://github.com/masdevid/US-OpinionWords.
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093–1113. https://doi.org/10.1016/j.asej.2014.04.011.
Nota, G., Postiglione, A., & Carvello, R. (2022). Text mining techniques for the management of predictive maintenance. Procedia Computer Science, 200, 778–792. https://doi.org/10.1016/j.procs.2022.01.276.
Passi, K., & Motisariya, J. (2022). Twitter Sentiment Analysis of the 2019 Indian Election. In IOT with Smart Systems (pp. 805–814). Springer. https://doi.org/10.1007/978-981-16-3945-6_79.
Pilar, G. D., Isabel, S. B., Diego, P. M., & José Luis, G. Á. (2022). A novel flexible feature extraction algorithm for Spanish tweet sentiment analysis based on the context of words. Expert Systems with Applications, 212(September 2022). https://doi.org/10.1016/j.eswa.2022.118817.
Pintas, J. T., Fernandes, L. A. F., & Garcia, A. C. B. (2021). Feature selection methods for text classification: a systematic literature review. Artificial Intelligence Review, 54(8), 6149–6200. https://doi.org/10.1007/s10462-021-09970-6.
Pratama, R. P., & Tjahyanto, A. (2021). The influence of fake accounts on sentiment analysis related to COVID-19 in Indonesia. Procedia Computer Science, 197(2021), 143–150. https://doi.org/10.1016/j.procs.2021.12.128.
Qian, Y., Zhou, W., Yan, J., Li, W., & Han, L. (2015). Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sensing, 7(1), 153–168. https://doi.org/10.3390/rs70100153.
Rashid, A., & Shoaib, U. (2016). Knowledge Discovery in Database using intention mining. Sci.Int.(Lahore), 28(6), 5145–5151.
Rathika, J., & Soranamageswari, M. (2022). Intensified Gray Wolf Optimization-based Extreme Learning Machine for Sentiment Analysis in Big Data. In P. S. R. Chowdary, J. Anguera, S. C. Satapathy, & V. Bhateja (Eds.), Evolution in Signal Processing and Telecommunication Networks (pp. 103–114). Springer Singapore.
Riesener, M., Kuhn, M., Lauf, H., Manoharan, S., & Schuh, G. (2022). Concept for the identification of product innovation potentials by the application of text mining. Procedia CIRP, 109(June), 281–286. https://doi.org/10.1016/j.procir.2022.05.250.
Robbani, H. A. (2016). Sastrawi 1.0.1. Https://Pypi.Org/Project/Sastrawi/. https://pypi.org/project/Sastrawi/.
Saputra, P. S. (2021). Perbandingan Algoritma Fuzzy C-Means Dan Algoritma Naive Bayes Dalam Menentukan Keluarga Penerima Manfaat (Kpm) Berdasarkan Status Sosial Ekonomi (Sse) Terendah. JST (Jurnal Sains Dan Teknologi), 10(1), 1–8. https://doi.org/10.23887/jstundiksha.v10i1.23340.
Starosta, K. (2022). Sentiment Analysis as a New Source of Information. In Measuring the Impact of Online Media on Consumers, Businesses and Society (pp. 33–48). Springer Fachmedien Wiesbaden. https://doi.org/10.1007/978-3-658-36729-9_4.
Tandel, S. S., Jamadar, A., & Dudugu, S. (2019). A Survey on Text Mining Techniques. 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, March, 1022–1026. https://doi.org/10.1109/ICACCS.2019.8728547.
Villavicencio, C., Macrohon, J. J., Inbaraj, X. A., Jeng, J. H., & Hsieh, J. G. (2021). Twitter sentiment analysis towards covid-19 vaccines in the Philippines using naïve bayes. Information (Switzerland), 12(5). https://doi.org/10.3390/info12050204.
Wenda, A. (2022). Support Vector Machine Untuk Pengenalan Bentuk Manusia Menggunakan Kumpulan Fitur Yang Dioptimalkan. JST (Jurnal Sains Dan Teknologi), 11(1), 77–84. https://doi.org/10.23887/jstundiksha.v11i1.44437.
Xue, L., Wang, H., Wang, F., & Ma, H. (2021). Sentiment Analysis of Stock Market Investors and Its Correlation with Stock Price Using Maximum Entropy. In R. Lee (Ed.), Computer and Information Science 2021---Summer (pp. 29–44). Springer International Publishing. https://doi.org/10.1007/978-3-030-79474-3_3.
Zulfa, I., & Winarko, E. (2017). Sentimen Analisis Tweet Berbahasa Indonesia Dengan Deep Belief Network. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 11(2), 187. https://doi.org/10.22146/ijccs.24716.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Aditiya Hermawan, Indrico Jowensen, Junaedi Junaedi; Edy
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with the Jurnal Sains dan Teknologi (JST) agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work. (See The Effect of Open Access)