Benchmarking of Machine Learning for Predictive Model for Faculty Selection
DOI:
https://doi.org/10.55164/ecbajournal.v17i1.273896Keywords:
faculty selection, predictive modelling, gradient boosting, higher educationAbstract
This study employed the Gradient Boosted Trees Machines Algorithm and conducted benchmarking of machine learning techniques for predictive modeling in faculty selection among students in Southern Thailand. The dataset included 12,125 students with variables such as High School GPA, blood group, district, province, and parent background. Key factors influencing model performance encompassed academic history, province of residence, and parental attributes. The Gradient Boosted Trees model achieved an impressive accuracy of 85% and precision of 87%, effectively identifying chosen faculties. Precision and recall metrics were 0.594 and 0.460 respectively, with an F1 Score of 0.518, underscoring the model's robustness in predicting student choices. Analysis of the SVM model revealed significant coefficients for features such as "BEFOREGPA" and "BLOODGROUP", influencing predictions positively or negatively. The SVM model achieved an F1 score of 0.33, indicating moderate performance in predicting student choices. The outcomes of the Gradient Boosting model demonstrate its effectiveness in predictive tasks, leveraging an iterative tree-building approach to correct errors systematically. However, careful monitoring of model performance is crucial, particularly when significant errors occur, to mitigate potential issues such as overfitting. In addition, from our analysis, it's evident that students' decisions regarding faculty selection are influenced by a complex interplay of various factors. Among these, province of origin and Grade Point Average (GPA) stand out as pivotal determinants shaping students' educational journeys.
References
Aiken, J. M., De Bin, R., Hjorth-Jensen, M., & Caballero, M. D. (2020). Predicting Time to Graduation at a Large Enrollment American University. Plos One, 15(11), e0242334.
Al Ka'bi, A. (2023). Proposed Artificial Intelligence Algorithm and Deep Learning Techniques for Development of Higher Education. International Journal of Intelligent Networks, 4, 68-73.
Ben-Assuli, O., & Vest, J. R. (2022). Return Visits to the Emergency Department: An Analysis using Group based Curve Models. Health Informatics Journal, 28(2), 14604582221105444.
Bilquise, G., Abdallah, S., & Kobbaey, T. (2019). Predicting Student Retention among a Homogeneous Population using Data Mining. In Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. Cham: Springer International Publishing.
Çakıt, E., & Dağdeviren, M. (2022). Predicting the Percentage of Student Placement: A Comparative Study of Machine Learning Algorithms. Education and Information Technologies, 27(1), 997-1022.
Chertchom, P. (2023). Application of Data Mining in Studying Factors in the Selection of University Admission Through Clustering Modeling. Journal of Administration and Social Science Review, 6(4), 157-168.
Dalcı, I., Araslı, H., Tümer, M., & Baradarani, S. (2013). Factors that Influence Iranian Students’ Decision to Choose Accounting Major. Journal of Accounting in Emerging Economies, 3(2), 145-163.
Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? The Journal of Machine Learning Research, 15(1), 3133-3181.
Ghansah, B., Benuwa, B. B., Ansah, E. K., Ghansah, N. E., Magama, C., & Ocquaye, E. N. N. (2016). Factors that Influence Students' Decision to Choose a Particular University: A Conjoint Analysis. International Journal of Engineering Research in Africa, 27, 147-157.
Itani, A., Brisson, L., & Garlatti, S. (2018). Understanding Learner’s Drop-out in MOOCs. In Intelligent Data Engineering and Automated Learning–IDEAL 2018: 19th International Conference, Madrid, Spain, November 21–23, 2018, Proceedings, Part I 19 (pp. 233-244). Springer International Publishing.
Hew, K. F., Hu, X., Qiao, C., & Tang, Y. (2020). What Predicts Student Satisfaction with MOOCs: A Gradient Boosting Trees Supervised Machine Learning and Sentiment Analysis Approach. Computers & Education, 145, 103724.
Huber, S., Wiemer, H., Schneider, D., & Ihlenfeldt, S. (2019). DMME: Data Mining Methodology for Engineering Applications–a Holistic Extension to the CRISP-DM Model. Procedia Cirp, 79, 403-408.
Hutt, S., Gardener, M., Kamentz, D., Duckworth, A. L., & D'Mello, S. K. (2018). Prospectively Predicting 4-year College Graduation from Student Applications. In Proceedings of the 8th International Conference on Learning Analytics and Knowledge (pp. 280-289).
Kabra, R. R., & Bichkar, R. S. (2011). Performance Prediction of Engineering Students using Decision Trees. International Journal of Computer Applications, 36(11), 8-12.
Kamal, N., Sarker, F., & Mamun, K. A. (2020). A Comparative Study of Machine Learning Approaches for Recommending University Faculty. In 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI) (pp. 1-6). IEEE.
Kamal, M., & Talbert, D. (2024). Beyond Size and Accuracy: The Impact of Model Compression on Fairness. In The International FLAIRS Conference Proceedings (Vol. 37).
Ketui, N., Wisomka, W., & Homjun, K. (2019). Using Classification Data Mining Techniques for Students Performance Prediction. In 2019 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT-NCON) (pp. 359-363). IEEE.
Lee, S. B., Kim, Y. J., Hwang, S., Son, H., Lee, S. K., Park, K. I., & Kim, Y. G. (2022). Predicting Parkinson's Disease using Gradient Boosting Decision Tree Models with Electroencephalography Signals. Parkinsonism & Related Disorders, 95, 77-85.
Lottering, R., Hans, R., & Lall, M. (2020). A Machine Learning Approach to Identifying Students at Risk of Dropout: A Case Study. International Journal of Advanced Computer Science and Applications, 11(10), 417-422.
Lv, S., Zhu, Y., Cheng, L., Zhang, J., Shen, W., & Li, X. (2024). Evaluation of the Prediction Effectiveness for Geochemical Mapping using Machine Learning Methods: A Case Study from Northern Guangdong Province in China. Science of The Total Environment, 927, 172223.
Issah, I., Appiah, O., Appiahene, P., & Inusah, F. (2023). A Systematic Review of the Literature on Machine Learning Application of Determining the Attributes Influencing Academic Performance. Decision Analytics Journal, 7, 100204.
Nagy, M., & Molontay, R. (2018). Predicting Dropout in Higher Education based on Secondary School Performance. In 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES) (pp. 000389-000394). IEEE.
Natekin, A., & Knoll, A. (2013). Gradient Boosting Machines, a Tutorial. Frontiers in Neurorobotics, 7, 21.
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting Student's Dropout in University Classes using Two-Layer Ensemble Machine Learning Approach: A Novel Stacked Generalization. Computers and Education: Artificial Intelligence, 3, 100066.
Oztekin, A. (2016). A Hybrid Data Analytic Approach to Predict College Graduation Status and its Determinative Factors. Industrial Management & Data Systems.
Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and Prediction of Student Performance Data using Various Machine Learning Algorithms. Materials Today: Proceedings, 80, 3782-3785.
Patcharacharoenwong, C., Hernmek, K., & Kimpan, W. (2020). Arrival Time Prediction Model to a Pier for Public Transportation Boats. Journal of Science Ladkrabang, 29(2), 31-44.
Singh, W., & Kaur, P. (2016). Comparative Analysis of Classification Techniques for Predicting Computer Engineering Students' Academic Performance. International Journal of Advanced Research in Computer Science, 7(6).
Zhou, L., Fujita, H., Ding, H., & Ma, R. (2021). Credit Risk Modelling on Data with Two Timestamps in Peer-to-Peer Lending by Gradient Boosting. Applied Soft Computing, 110, 107672.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Faculty of Economics and Business Administration, Thaksin University

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
