The Development of an Automated Scoring System for Assessing Writing Test in The Thai Language Subject at The Lower Secondary Level
Main Article Content
Abstract
This study aimed to develop a Thai constructed-response test and an automated scoring system for assessing Thai writing at the lower secondary level, and to compare scoring outcomes between human raters and the automated system. The research was conducted in three phases. In Phase 1, a six-item, four-scenario constructed-response test totaling 140 points was developed along with analytic rubrics covering comprehension, summarization, and paraphrasing. The test showed appropriate item difficulty (p = .43 – .70), acceptable discrimination (r = .20 –.34), perfect content validity (IOC = 1.00), and satisfactory internal consistency (Cronbach’s α = .731). In Phase 2, the Automated Scoring System for Writing Test (ASSWT) was developed as a web-based application with three modules: a teacher interface, an examinee interface, and a backend scoring engine. The system was built using C#, JavaScript, HTML, CSS, and ASP.NET Core MVC (.NET 9) with SQL Server 2022 Express, and it utilized the o3-mini language model from OpenAI. In Phase 3, the system was evaluated against human scoring. The automated system yielded more consistent scores (lower standard deviations) and, in some cases, higher discrimination. Pearson correlations between system and human raters ranged from moderate to very high (r = .496 – .819, p < .001). Intraclass correlation coefficients were in the good to excellent range (.815 – .945). Generalizability theory analysis (p × i × r) indicated improved reliability when using the system, with generalizability coefficients increasing from(ρ2δ)= .26 to .62 and from (ρ2Abs)= .17 to .51. The results suggest that the automated system enhances scoring reliability and reduces unwanted variance.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
หากผู้เสนอบทความมีความจำเป็นเร่งด่วนในการตีพิมพ์โปรดส่งลงตีพิมพ์ในวารสารฉบับอื่นแทน โดยกองบรรณาธิการจะไม่รับบทความหากผู้เสนอบทความไม่ปฏิบัติตามเงื่อนไขและขั้นตอนที่กำหนดอย่างเคร่งครัด ข้อมูลของเนื้อหาในบทความถือเป็นลิขสิทธิ์ของ Journal of Inclusive and Innovative Education คณะศึกษาศาสตร์ มหาวิทยาลัยเชียงใหม่
References
Apaikawee, D., Tuksino, P., & Tangdhanakanond, K. (2020). A Study of the Results of Subjective Test Scoring by Applying the Many-Facet Rasch Model and Generalizability Theory. Journal of Educational Measurement, Mahasarakham University, 26(1), 110–124. [in Thai]
Broadfoot, P. & Rockey, J. (2025). Generative AI and the social functions of educational assessment. Oxford Review of Education, 51(2), 283–299.
Bureau of Educational Testing. (2022). Manual for the use of standardized essay test instruments based on the Basic Education Core Curriculum B.E. 2551 (2008) (Revised B.E. 2560/2017) for primary education. Bangkok: Aksornthai Press. [in Thai]
Chansima, N. & Tuksino, P. (2019). Comparison of quality of scoring for essay test under different of scoring pattern and item characteristics and raters : application of generalizability theory (Master’s thesis). Faculty of Education, Khon Kaen University. [in Thai]
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). New Jersey: Lawrence Erlbaum Associates.
Cronbach, J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.
Crossley, S.A., Bradfleld, F. & Bustamante, A. (2019). Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. Journal of Writing Research, 11(2), 251- 270.
Department of Local Administration. (2023). Notification of the allocation of the annual budget expenditure for the fiscal year 2023 on the Reading Test (RT) for Grade 1 students and the National Test (NT) for Grade 3 students in the academic year 2022. Retrieved from https://www.dla.go.th/upload/document/type2/2023/1/28676_1_1673249130936.pdf?time= 1673249864225 [in Thai]
Dikli, S. (2006). An Overview of Automated Scoring of Essays. The Journal of Technology, Learning and Assessment, 5(1), 1-36.
Janprasert, B., Lawthong, N., & Ngadgratoke, S. (2020). Inter-rater Reliability of Alignment between Science Items and Indices. Journal of Education Studies, 48(3), 144–163. [in Thai]
Kanchanawasri, S. (2013). Classical test theory (7th ed.). Bangkok: Chulalongkorn University Press.
Koo, T.K. & Li, M.Y. (2016). A Guideline of Selecting and Reporting Intra-Class Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine, 15(2), 155 – 163.
Office of the Basic Education Commission. (2023). Number of students and classrooms by gender and grade level, academic year 2023. Bangkok: Office of the Basic Education Commission. [in Thai]
Pattani Provincial Education Office. (2023). The Office of the Basic Education Commission (OBEC) revised the Basic Education Core Curriculum B.E. 2551 (2008), updated version B.E. 2566 (2023). Retrieved from https://www.ptnpeo.go.th/ednews/7191/ [in Thai]
Rovinelli, R.J. & Hambleton, R.K. (1976). The use of content specialists in the assessment of criterion-referenced test item validity. Tijdschrift Voor Onderwijs Research, 2, 49-60.
Suchato, A., Pratanwanich, N., Chomphooyod, P., & Wiriyachaiphon, P. (2023). Complete research report on the study of applying artificial intelligence to develop reading skills for elementary school students. Retrieved from https://www.onec.go.th/th.php/book/BookView/2008 [in Thai]
Sukwichai, S., Junpeng, P., Tawarungruang, C., & Intharah, T. (2023). Designing Automated Scoring System of Open-Ended Test by Providing Automatic Feedback to Diagnose Mathematical Proficiency Levels through Machine Learning. Journal of Educational Measurement, Mahasarakham University, 29(1), 210–230. [in Thai]
Thitikanpodchana, W., & Tuksino, P. (2021). Comparisons of the generalizability coefficient scores of English writing ability test of mattayom 3 with different rater's linguistics background and scoring designs. Retrieved from https://app.gs.kku.ac.th/gs/th/publicationfile/item/22nd-ngrc-2021/HMO18/HMO18.pdf [in Thai]
Thongsilp, A., Tangdhanakanond, K., & Chaimongkol, N. (2020). Development of Automated Scoring System for Thai Writing Ability Test of Primary Education Level (Doctoral dissertation). Faculty of Education, Chulalongkorn University. [in Thai]
Wiboonsri, Y. (2013). Measurement and Achievement Test Contstruction (11st ed.). Bangkok: Chulalongkorn University Press. [in Thai]
Zhang, M. & Williamson, D.M. (2023). Reliability improvement in writing assessment: The complementary role of AI-enhanced scoring systems. Educational Measurement: Issues and Practice, 42(1), 12-24.