Developing Data Handling Guidelines for Open-Source LLM Training in Compliance with Section 37 under Thailand’s PDPA and Related  Legal Provisions

Nattakrit Kaewjiboon; Peerapat Chokesuwattanaskul

Authors

Nattakrit Kaewjiboon Master of Laws in Business Law (International Program), Faculty of Law, Chulalongkorn University
Peerapat Chokesuwattanaskul Master of Laws in Business Law (International Program), Faculty of Law, Chulalongkorn University

Keywords:

Open-source LLM, PDPA Section 37, data handling, data security

Abstract

This study examines the application of Section 37 under Thailand’s Personal Data Protection Act (PDPA) to the training of open-source Large Language Model (LLM), a context often characterized by decentralization and limited institutional oversight. Beyond doctrinal and comparative legal analysis, the research incorporates semi-structured interviews with open-source AI developers in Thailand to ground legal findings in real-world practices. Drawing from structural risk analysis and practitioner feedback, the study proposes a conceptual framework for LLM-based data handling guidelines. The framework presents a modular, role-sensitive approach accompanied by practical tables to assist data controllers in operationalizing Section 37. Designed to support resource-constrained environments, the proposed guidelines aim to support legally compliant, scalable, and responsible development of open-source LLM in Thailand.

References

Andrus, M., Jia, A., Jia, R., Koh, P. W., Kummerfeld, J. K., Narayanan, A., & Zhang, J. (2024). Towards accountable foundation models through auditable model outputs. arXiv. Doi: https://doi.org/10.48550/arXiv.2504.15585

Ayyamperumal, S. G., & Ge, L. (n.d.). Current state of LLM risks and AI guardrails. Carnegie Mellon University. Doi: https://doi.org/10.48550/arXiv.2406.12934

Big Science Workshop. (2023). Bloom: A 176B-parameter open-access multilingual language model. arXiv. Doi: https://doi.org/10.48550/arXiv.2211.05100

British Standards Institution. (2023). Webinar: ISO/IEC 42001 – AI management system standard overview. BSI Group.

California State Legislature. (2018). California Consumer Privacy Act of 2018 (CCPA), Cal. Civ. Code § 1798.100–1798.199. Retrieved from https://leginfo.legislature.ca.gov

Carlini, N., Jagielski, M., Tang, L., Tramèr, F., Zhang, C., & Wallace, E. (2023). Extracting training data from diffusion models. arXiv. Doi: https://doi.org/10.48550/arXiv.2301.13188

Dark Reading. (2024). Hundreds of LLM servers expose corporate, health, and other online data. Retrieved from https://www.darkreading.com/application-security/hundreds-of-llm-servers-expose-corporate-health-and-other-online-data

European Parliamentary Research Service. (2020). The impact of the General Data Protection Regulation (GDPR) on artificial intelligence. European Parliament. Retrieved from https://www.europarl.europa.eu/thinktank/en/document/EPRS_STU(2020)641530

Fernandez, E. B., & Brazhuk, A. (2022). A critical analysis of Zero Trust Architecture (ZTA). SSRN. Doi: https://doi.org/10.2139/ssrn.4210104

Ghaleb, A., Traore, I., & Ganame, K. (2019). A generic agentless endpoint framework for security monitoring of cloud computing endpoints. In 2019 IEEE Conference on Communications and Network Security (CNS) (pp. 1–9). Doi: https://doi.org/10.1109/CNS.2019.8802828

Government of Thailand. (2019). Personal Data Protection Act, B.E. 2562 (2019). Royal Thai Government Gazette.

Manchanda, S., Gupta, K., Majumder, B. P., Shridhar, K., & Vig, L. (2024). The open-source advantage in large language models. arXiv. Doi: https://doi.org/10.48550/arXiv.2412.12004

National Institute of Standards and Technology. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1). U.S. Department of Commerce. Doi: https://doi.org/10.6028/NIST.AI.600-1

Organisation for Economic Co-operation and Development (OECD). (2019). OECD recommendation on artificial intelligence. Retrieved from https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449

Personal Data Protection Committee (PDPC). (2022). Guidelines on personal data protection measures, 2022. Royal Thai Government Gazette.

Royal Thai Government Gazette. (2019). Personal Data Protection Act B.E. 2562 (PDPA). Retrieved from https://www.ratchakitcha.soc.go.th/DATA/PDF/2562/A/069/T_0052.PDF

Singh, S., Singhania, P., Ranjan, A., Kirchenbauer, J., Geiping, J., Wen, Y., Jain, N., Hans, A., Shu, M., Tomar, A., Goldstein, T., & Bhatele, A. (2024). Democratizing AI: Open-source scalable LLM training on GPU-based supercomputers. arXiv. Doi: https://doi.org/10.48550/arXiv.2502.08145

Wang, Z., Zhong, W., Wang, Y., Zhu, Q., Mi, F., Wang, B., Shang, L., Jiang, X., & Liu, Q. (2024). Data management for training large language models: A survey. arXiv. Doi: https://doi.org/10.48550/arXiv.2312.01700

Zhou, X., Weyssow, M., Widyasari, R., Zhang, T., He, J., Lyu, Y., Chang, J., Zhang, B., Huang, D., & Lo, D. (2024). LessLeak-Bench: A first investigation of data leakage in LLMs across 83 software engineering benchmarks. arXiv. Doi: https://doi.org/10.48550/arXiv.2502.06215

Developing Data Handling Guidelines for Open-Source LLM Training in Compliance with Section 37 under Thailand’s PDPA and Related Legal Provisions

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Information

Language

Developed By

thaijo

Current Issue