PEMANFAATAN MODEL DEEP LEARNING (CHATGPT) DALAM DETEKSI KESALAHAN PENYELESAIAN SOAL MATEMATIKA: STUDI PERBANDINGAN PENILAIAN OTOMATIS DAN MANUAL
DOI:
https://doi.org/10.23969/jp.v10i04.38579Keywords:
deep learning, ChatGPT, kesalahan matematika, penilaian otomatisAbstract
This study aims to analyze the effectiveness of deep learning models, particularly ChatGPT, in detecting errors in students’ mathematical problem-solving processes and to compare automated assessment results with manual assessment conducted by lecturers. The research employed a quantitative descriptive-comparative approach involving 37 first-semester university students in a mathematics education program. Data were collected through written essay tests on plane geometry topics, questionnaires, and documentation. The written responses were assessed manually by lecturers and automatically by ChatGPT, focusing on conceptual, procedural, and computational errors. Data analysis used descriptive statistics and comparative analysis to examine score differences and consistency between the two assessment methods. The results show that the average score differences between manual assessment and ChatGPT assessment were relatively small, ranging from 0.4 to 4.5 points, indicating a high level of accuracy and consistency of the automated system. ChatGPT demonstrated advantages in efficiency, objectivity, and speed of assessment, while manual assessment remained superior in interpreting implicit reasoning and contextual understanding. These findings suggest that ChatGPT has strong potential as an automated assessment tool to support mathematics educators, particularly in identifying student error patterns systematically, although human judgment is still necessary for comprehensive pedagogical interpretation.
Downloads
References
Aljura, A. N., Retnawati, H., Zulnaidi, H., & Mbazumutima, V. (2025). Understanding High School Students’ Errors in solving Mathematics Problems: A Phenomenological Research. Indonesian Journal on Learning and Advanced Education (IJOLAE), 7(1), 154–178. https://doi.org/10.23917/ijolae.v7i1.24005
Altamimi, M., Altameemi, Y., Alkhalil, A., Mansour, R. F., Abdelrhman, M., Ahmed, I., Ahmad, A., & Alogali, A. (2025). A deep learning model for automated marking of students’ assessments in a learning management system (LMS). International Journal of Advanced and Applied Sciences, 12(10), 1–10. https://doi.org/10.21833/ijaas.2025.10.001
Atasoy, A., & Moslemi Nezhad Arani, S. (2025). ChatGPT: A reliable assistant for the evaluation of students’ written texts? In Education and Information Technologies (Vol. 30, Issue 14). Springer US. https://doi.org/10.1007/s10639-025-13553-1
Bewersdorff, A., Seßler, K., Baur, A., Kasneci, E., & Nerdel, C. (2023). Assessing student errors in experimentation using artificial intelligence and large language models: A comparative study with human raters. Computers and Education: Artificial Intelligence, 5. https://doi.org/10.1016/j.caeai.2023.100177
Faseeh, M., Jaleel, A., Iqbal, N., Ghani, A., Abdusalomov, A., Mehmood, A., & Cho, Y. I. (2024). Hybrid Approach to Automated Essay Scoring: Integrating Deep Learning Embeddings with Handcrafted Linguistic Features for Improved Accuracy. Mathematics, 12(21). https://doi.org/10.3390/math12213416
García-Varela, F., Nussbaum, M., Mendoza, M., Martínez-Troncoso, C., & Bekerman, Z. (2025). ChatGPT as a Stable and Fair Tool for Automated Essay Scoring. Education Sciences, 15(8). https://doi.org/10.3390/educsci15080946
Hooshyar, D., Azevedo, R., & Yang, Y. (2024). Augmenting Deep Neural Networks with Symbolic Educational Knowledge: Towards Trustworthy and Interpretable AI for Education. Machine Learning and Knowledge Extraction, 6(1), 593–618. https://doi.org/10.3390/make6010028
Li, J., Gui, L., Zhou, Y., West, D., Aloisi, C., & He, Y. (2023). Distilling ChatGPT for Explainable Automated Student Answer Assessment. Findings of the Association for Computational Linguistics: EMNLP 2023, 6007–6026. https://doi.org/10.18653/v1/2023.findings-emnlp.399
Lu, P., Qiu, L., Yu, W., Welleck, S., & Chang, K. W. (2023). A Survey of Deep Learning for Mathematical Reasoning. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 14605–14631. https://doi.org/10.18653/v1/2023.acl-long.817
Munfarikhatin, A., & Natsir, I. (2025). Rejecting Reduction: Clarifying the Concept of Deep Learning in Mathematics Teaching in the Era of Artificial Intelligence. J Statistika: Jurnal Ilmiah Teori Dan Aplikasi Statistika, 18(1), 930–936. https://doi.org/10.36456/jstat.vol18.no1.a10570
Niemi, H., Pea, R. D., & Lu, Y. (2022). AI in Learning: Designing the Future. In AI in Learning: Designing the Future. https://doi.org/10.1007/978-3-031-09687-7
Oates, A., & Johnson, D. (2025). ChatGPT in the Classroom: Evaluating its Role in Fostering Critical Evaluation Skills. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-024-00452-8
Ofusori, L. O., & Hendradi, R. (2025). ChatGPT and Its Impact on Students Assessment Practices in the Higher Educational Sector: A Systematic Review. Journal of Information Systems Engineering and Business Intelligence, 11(1), 65–78. https://doi.org/10.20473/jisebi.11.1.65-78
Pujawati, F., Azkia, M. N., & Susilawati, W. (2025). Exploration of the Implementation of Deep Learning Approach in Teaching Mathematics in Secondary Schools. Unnes Journal of Mathematics Education, 14(2), 98–105. https://doi.org/10.15294/ujme.v14i2.27374
Syarnubi, Efriani, A., Pranita, S., Zulhijra, Anggara, B., Alimron, Maryamah, & Rohmadi. (2024). An analysis of student errors in solving HOTS mathematics problems based on the newman procedure. AIP Conference Proceedings, 3058(1), 321–332. https://doi.org/10.1063/5.0201077
Testolin, A. (2024). Can Neural Networks Do Arithmetic? A Survey on the Elementary Numerical Skills of State-of-the-Art Deep Learning Models. Applied Sciences (Switzerland), 14(2). https://doi.org/10.3390/app14020744
Ulfa, S. M. (2024). Analysis of Student Errors in Solving Mathematical Story Problems Based on Newman’s Theory in View of Student Learning Styles. Journal of Mathematical Pedagogy (JoMP), 4(2), 97–105. https://doi.org/10.26740/jomp.v4n2.p97-105
Yunianto, W., Lavicza, Z., Kastner-Hauler, O., & Houghton, T. (2024). Investigating the use of ChatGPT to solve a GeoGebra based mathematics+computational thinking task in a geometry topic. Journal on Mathematics Education, 15(3), 1027–1052. https://doi.org/10.22342/jme.v15i3.pp1027-1052
Zhang, H., Li, L. H., Meng, T., Chang, K., & Broeck, G. Van Den. (2020). On the Paradox of Learning to Reason from Data.
Zhao, C., Silva, M., & Poulsen, S. (2025). Autograding Mathematical Induction Proofs with Natural Language Processing. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-025-00498-2
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Pendas : Jurnal Ilmiah Pendidikan Dasar

This work is licensed under a Creative Commons Attribution 4.0 International License.
















