Optimizing Sentiment Analysis of Electric Vehicles Through Oversampling Techniques on YouTube Comments

Authors

  • Jessica Crisfin Lapendy Universitas Negeri Makassar
  • Andi Aulia Cahyana Resky Universitas Negeri Makassar
  • Andi Tenriola Universitas Negeri Makassar
  • Dewi Fatmarani Surianto Universitas Negeri Makassar https://orcid.org/0009-0003-3169-9993
  • Udin Sidik Sidin Universitas Negeri Makassar

DOI:

https://doi.org/10.23887/janapati.v14i1.88205

Keywords:

Classification, Electric Vehicles, Random Oversampling, SMOTE, ADASYN

Abstract

Air pollution from motorized fuel vehicles causes adverse impacts on the environment and human health, driving the need for more sustainable alternatives such as electric vehicles. However, the transition to electric vehicles is often met with mixed responses from the public, reflected by sentiments that are split between positive and negative. This research investigates such sentiments through analyzing comments on the YouTube platform, which are classified using two algorithms, SVM and Naïve Bayes, and three oversampling techniques: Random Oversampling, SMOTE, and ADASYN. A comparative evaluation is conducted to determine the most effective algorithm and oversampling strategy for handling imbalanced sentiment data, where negative comments dominate. Initial experiments showed that Naïve Bayes with SMOTE achieved the best result among baseline models, with 64% accuracy. However, traditional oversampling methods alone were not sufficient to significantly improve classification quality. To address this, the study proposes a hybrid method that combines Easy Data Augmentation (EDA), specifically Synonym Replacement (SR), with oversampling techniques. The proposed method substantially improved performance. Naïve Bayes combined with SR and SMOTE or Random Oversampling achieved 88% accuracy, with F1-scores of 0.84–0.85 for the positive class. The best result was obtained using SVM with SR and Random Oversampling, reaching 97% accuracy and F1-scores of 0.97 (negative) and 0.96 (positive). These findings demonstrate the effectiveness of combining augmentation and oversampling in improving sentiment classification and provide insights for stakeholders in promoting EV adoption.

References

A. Sreesurya, C. A. K. Reddy, S. S. Kumar, M. E. Lavanya, and M. N. Asjad, “IOT based Voice Controlled Robot with Pollution Detection,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 9, no. VI, pp. 118–123, 2021, doi: 10.22214/ijraset.2021.34883.

W. A. Madhoun, E. Salem, A. Eljedi, H. A. Isiyaka, and F. C. Ros, “Impact Assessment of Traffic Emission on the Respiratory System of Non-Smoking Traffic Policemen in Palestine,” IOP Conf. Ser. Mater. Sci. Eng., vol. 875, no. 1, 2020, doi: 10.1088/1757-899X/875/1/012022.

K. Lee and M. Greenstone, “Polusi Udara Indonesia dan Dampaknya terhadap Usia Harapan Hidup,” AQLI (Air Qual. Life Index), 2021.

R. D. Pradhana, “Analisis Sentimen Penggunaan Hastag Motor Bensin dan Motor Listrik Pada pedia Sosial Twitter dengan Metode Support Vector Machine (SVM),” Politeknik Negeri Jember, 2023.

A. Ahmed Elgharbawy, W. Sadik, O. Sadek, and M. Kasaby, “Transesterification reaction conditions and low-quality feedstock treatment processes for biodiesel production- A review,” J. Pet. Min. Eng., vol. 23, no. 1, pp. 98–103, 2021, doi: 10.21608/jpme.2021.67482.1076.

S. Vermani, A. Rajput, and P. Sharma, “E-Rickshaw: Shared Micro Mobility-the Green Revolution on Indian Roads,” Int. J. Innov. Appl. Stud., vol. 29, no. 4, pp. 945–949, 2020, [Online]. Available: http://www.ijias.issr-journals.org/.

A. Shvetsov, “Transition from traditional cars to electric ones in Arctic regions,” E3S Web Conf., vol. 471, pp. 1–6, 2024, doi: 10.1051/e3sconf/202447102015.

C. Yang, J. C. Tu, and Q. Jiang, “The Influential Factors of Consumers- Sustainable Consumption: A case on Electric Vehicles in China,” Sustainability, vol. 12, no. 8, 2020, doi: 10.3390/SU12083496.

A. C. Mersky, F. Sprei, C. Samaras, and Z. (Sean) Qian, “Effectiveness of Incentives on Electric Vehicle Adoption in Norway,” Transp. Res. Part D Transp. Environ., vol. 46, pp. 56–68, 2016, doi: https://doi.org/10.1016/j.trd.2016.03.011.

J. Larminie and J. Lowry, The Future of Electric Vehicles. 2012.

C. Corchero, S. González-Villafranca, and M. Sanmartí, “European Electric Vehicle Fleet: Driving and Charging Data Analysis,” in IEEE International Electric Vehicle Conference (IEVC), 2014, pp. 1–6, doi: 10.1109/IEVC.2014.7056144.

I. A. Nata, D. Wicaksono, and D. J. N. Salim, “Analisis Sentimen Publik Indonesia Terhadap Motor Listrik pada Media Sosial Twitter,” THETA OMEGA J. Electr. Eng. Comput. Inf. Technol., 2023.

“Menhub Dorong Instansi Pusat dan Daerah jadi Role Model Penggunaan Kendaraan Listrik,” Biro Komunikasi dan Informasi Publik, 2022. https://dephub.go.id/post/read/menhub-dorong-instansi-pusat-dan-daerah-jadi-role-model-penggunaan-kendaraan-listrik.

P. G. Aryanti and I. Santoso, “Analisis Sentimen pada Twitter terhadap Mobil Listrik Menggunakan Algoritma Naive Bayes,” IKRA-ITH Inform. J. Komput. dan Inform., vol. 7, no. 2, pp. 133–137, 2023, [Online]. Available: https://journals.upi-yai.ac.id/index.php/ikraith-informatika/article/view/2821.

A. Agustian, T. Tukiro, and F. Nurapriani, “Penerapan Analisis Sentimen Dan Naive Bayes Terhadap Opini Penggunaan Kendaraan Listrik Di Twitter,” J. TIKA, vol. 7, no. 3, pp. 243–249, 2022, doi: 10.51179/tika.v7i3.1550.

A. P. Nardilasari, A. L. Hananto, S. S. Hilabi, T. Tukino, and B. Priyatna, “Analisis Sentimen Calon Presiden 2024 Menggunakan Algoritma SVM Pada Media Sosial Twitter,” JOINTECS (Journal Inf. Technol. Comput. Sci., vol. 8, no. 1, p. 11, 2023, doi: 10.31328/jointecs.v8i1.4265.

Z. Alhaq, A. Mustopa, S. Mulyatun, and J. D. Santoso, “Penerapan Metode Support Vector Machine untuk Analisis Sentimen Pengguna Twitter,” JOISM J. Inf. Syst. Manag., vol. 3, no. 1, 2021.

H. Hermanto, A. Mustopa, and A. Y. Kuntoro, “Algoritma Klasifikasi Naive Bayes Dan Support Vector Machine Dalam Layanan Komplain Mahasiswa,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 5, no. 2, pp. 211–220, 2020, doi: 10.33480/jitk.v5i2.1181.

A. C. Khotimah and E. Utami, “Comparison Naïve Bayes Classifier, K-Nearest Neighbor and Support Vector Machine in the Classification of Individual on Twitter Account,” J. Tek. Inform., vol. 3, no. 3, pp. 673–680, 2022, [Online]. Available: http://jutif.if.unsoed.ac.id/index.php/jurnal/article/view/254.

F. Zubedi, B. Sartono, and K. A. Notodiputro, “Implementation of Winsorizing and random oversampling on data containing outliers and unbalanced data with the random forest classification method,” J. Nat., vol. 22, no. 2, pp. 108–116, 2022, doi: 10.24815/jn.v22i2.25499.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information, vol. 14, no. 1, 2023, doi: 10.3390/info14010054.

D. I. Sumantiawan, J. E. Suseno, and W. A. Syafei, “Sentiment Analysis of Customer Reviews Using Support Vector Machine and Smote-Tomek Links For Identify Customer Satisfaction,” J. Sist. Inf. Bisnis, vol. 13, no. 1, pp. 1–9, 2023, doi: 10.21456/vol13iss1pp1-9.

R. Mia et al., “Exploring Machine Learning for Predicting Cerebral Stroke: A Study in Discovery,” Electronics, vol. 13, no. 4, pp. 1–17, 2024, doi: 10.3390/electronics13040686.

E. M. O. N. Haryanto, A. K. A. Estetikha, and R. A. Setiawan, “Implementasi SMOTE untuk Mengatasi Imbalanced Data pada Sentimen Analisis Sentimen Hotel di Nusa Tenggara Barat dengan Menggunakan Algoritma SVM,” J. Inf. Interaktif, vol. 7, no. 1, pp. 16–20, 2022.

S. Rahayu, T. Bharata Adji, and N. Akhmad Setiawan, “Penghitungan k-NN pada Adaptive Synthetic-Nominal (ADASYN-N) dan Adaptive Synthetic-kNN (ADASYN-kNN) untuk Data Nominal-Multi Kategori,” J. Otomasi Kontrol dan Instrumentasi, vol. 9, no. 2, pp. 119–129, 2017, doi: 10.5614/joki.2017.9.2.5.

I. K. Ananda, A. Z. Fanani, D. Setiawan, and D. Firdaus, “Penerapan Random Oversampling dan Algoritma Boosting untuk Memprediksi Kualitas Buah Jeruk,” Edumatic J. Pendidik. Inform., vol. 8, no. 1, pp. 282–289, 2024, doi: 10.29408/edumatic.v8i1.25836.

M. Iqbal, “YouTube Revenue and Usage Statistics,” Business of Apps, 2024. https://www.businessofapps.com/data/youtube-statistics/.

A. Gruzd and P. Mai, “Netlytic,” 2016. https://netlytic.org/index.php.

A. Karimi, L. Rossi, and A. Prati, “AEDA: An Easier Data Augmentation Technique for Text Classification,” Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2021, pp. 2748–2754, 2021, doi: 10.18653/v1/2021.findings-emnlp.234.

J. Wei and K. Zou, “EDA: Easy data augmentation techniques for boosting performance on text classification tasks,” Proc. 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process., pp. 6382–6388, 2019, doi: 10.18653/v1/d19-1670.

A. Nur Azizah, M. Falach Asy’ari, I. Wisma Dwi Prastya, and D. Purwitasari, “Easy Data Augmentation untuk Data yang Imbalance pada Konsultasi Kesehatan Daring,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 5, pp. 1095–1104, 2023, doi: 10.25126/jtiik.20231057082.

I. M. A. Wirawan and I. W. B. Diarsa, “Mobile-based recommendation system for the tour package using the hybrid method,” Int. J. Interact. Mob. Technol., vol. 12, no. 8, pp. 64–84, 2018, doi: 10.3991/ijim.v12i8.9483.

M. I. Falih, N. Hafifah Matondang, and N. Chamidah, “Seleksi Fitur Information Gain pada Analisis Sentimen terhadap Ulasan Aplikasi Flip dengan Algoritma Support Vector Machine,” Semin. Nas. Mhs. IlmuKomputer dan Apl., vol. 3, no. 2, p. 319, 2022.

B. Firmanto, H. Soekotjo, and H. Suyono, “Perbandingan Kinerja Algoritma Promethee dan Topsis untuk Pemilihan Guru Teladan,” J. Penelit. Pendidik. IPA, vol. 2, no. 1, 2016, doi: 10.29303/jppipa.v2i1.31.

J. J. A. Limbong, I. Sembiring, and K. D. Hartomo, “Analisis Klasifikasi Sentimen Ulasan pada E-Commerce Shopee Berbasis Word Cloud dengan Metode Naive Bayes dan K-Nearest Neighbor,” J. Teknol. Inf. dan Ilmu Komput., vol. 9, no. 2, pp. 347–356, 2022, doi: 10.25126/jtiik.2022924960.

M. F. Hanif, S. H. Wijoyo, and W. H. N. Putra, “Klasifikasi Sentimen Ulasan Aplikasi Threads Berbasis Algoritma Naïve Bayes dan Metode Root Cause Analysis,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 6, 2024, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/13786.

I. Sabilirrasyad, Z. Hasan, and M. Hermansyah, “Sentiment Analysis of Twitter Discussions on Rafael Alun: Multinomial Naïve Bayes and Decision Tree Approach,” Proceeding Int. Conf. Econ. Bus. Inf. Technol., vol. 4, pp. 803–809, 2023, doi: 10.31967/prmandala.v4i0.827.

W. Widayani and H. Harliana, “Perbandingan Kernel Support Vector Machine Dalam Melakukan Klasifikasi Penundaan Biaya Kuliah Mahasiswa,” J. Sains dan Inform., vol. 7, no. 1, pp. 20–27, 2021, doi: 10.34128/jsi.v7i1.268.

S. P. Astuti, “Analisis Sentimen Berbasis Aspek pada Aplikasi Tokopedia Menggunakan LDA dan Naïve Bayes,” UIN SYARIF HIDAYATULLAH JAKARTA, 2020.

H. He and H. Yang, “Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization,” Math. Probl. Eng., vol. 2021, 2021, doi: 10.1155/2021/6654071.

M. Bayer, M. A. Kaufhold, and C. Reuter, “A Survey on Data Augmentation for Text Classification,” ACM Comput. Surv., vol. 55, no. 7, pp. 1–44, 2022, doi: 10.1145/3544558.

Published

2025-03-31

How to Cite

Lapendy, J. C., Resky, A. A. C., Tenriola, A., Surianto, D. F., & Sidin, U. S. (2025). Optimizing Sentiment Analysis of Electric Vehicles Through Oversampling Techniques on YouTube Comments. Jurnal Nasional Pendidikan Teknik Informatika : JANAPATI, 14(1). https://doi.org/10.23887/janapati.v14i1.88205

Issue

Section

Articles