Developing Data Handling Guidelines for Open-Source LLM Training in Compliance with Section 37 under Thailand’s PDPA and Related  Legal Provisions

ณัฐกฤตย์ แก้วใจบุญ; พีรพัฒ โชคสุวัฒนสกุล

ผู้แต่ง

ณัฐกฤตย์ แก้วใจบุญ หลักสูตรนิติศาสตรมหาบัณฑิต สาขากฎหมายธุรกิจ (หลักสูตรนานาชาติ) คณะนิติศาสตร์ จุฬาลงกรณ์มหาวิทยาลัย
พีรพัฒ โชคสุวัฒนสกุล หลักสูตรนิติศาสตรมหาบัณฑิต สาขากฎหมายธุรกิจ (หลักสูตรนานาชาติ) คณะนิติศาสตร์ จุฬาลงกรณ์มหาวิทยาลัย

คำสำคัญ:

โมเดลภาษาแบบโอเพ่นซอร์ส, PDPA มาตรา 37, แนวทางการจัดการข้อมูล, ความมั่นคงปลอดภัยของข้อมูล

บทคัดย่อ

งานวิจัยนี้ศึกษาการประยุกต์ใช้มาตรา 37 แห่งพระราชบัญญัติคุ้มครองข้อมูลส่วนบุคคล (PDPA) กับบริบทของการฝึกอบรมโมเดลภาษาแบบโอเพ่นซอร์ส (Open-Source Large Language Model: LLM) ซึ่งมักดำเนินการโดยบุคคลหรือกลุ่มผู้พัฒนาอิสระที่ไม่มีหน่วยงานกลางกำกับดูแลอย่างเป็นทางการ โดยเน้นวิเคราะห์ความเสี่ยงเชิงโครงสร้างและปัญหาการปฏิบัติตามกฎหมายในแต่ละขั้นตอนของการจัดการข้อมูลผ่านวิธีวิเคราะห์เชิงเอกสาร การเปรียบเทียบมาตรฐานสากล ตลอดจนการสัมภาษณ์เชิงคุณภาพกับผู้พัฒนาโมเดลภาษาแบบโอเพ่นซอร์สในประเทศไทย เพื่อสะท้อนข้อเท็จจริงในบริบทการปฏิบัติจริง จากผลการวิเคราะห์ ผู้วิจัยได้เสนอกรอบแนวคิดสำหรับแนวทางการจัดการข้อมูลที่สอดคล้องกับมาตรา 37 ที่สามารถนำไปใช้ในการปฏิบัติจริงได้ แนวทางนี้มิได้มุ่งหมายให้เป็นแนวทางที่ไม่สามารถเปลี่ยนแปลงได้ หากแต่เป็นเครื่องมือสนับสนุนให้ผู้ควบคุมข้อมูลสามารถตีความและดำเนินการตามกฎหมายได้อย่างมีประสิทธิภาพและสอดคล้องกับความเป็นจริงทางเทคนิคของการพัฒนาโมเดลภาษาแบบโอเพ่นซอร์สในประเทศไทย

เอกสารอ้างอิง

Andrus, M., Jia, A., Jia, R., Koh, P. W., Kummerfeld, J. K., Narayanan, A., & Zhang, J. (2024). Towards accountable foundation models through auditable model outputs. arXiv. Doi: https://doi.org/10.48550/arXiv.2504.15585

Ayyamperumal, S. G., & Ge, L. (n.d.). Current state of LLM risks and AI guardrails. Carnegie Mellon University. Doi: https://doi.org/10.48550/arXiv.2406.12934

Big Science Workshop. (2023). Bloom: A 176B-parameter open-access multilingual language model. arXiv. Doi: https://doi.org/10.48550/arXiv.2211.05100

British Standards Institution. (2023). Webinar: ISO/IEC 42001 – AI management system standard overview. BSI Group.

California State Legislature. (2018). California Consumer Privacy Act of 2018 (CCPA), Cal. Civ. Code § 1798.100–1798.199. Retrieved from https://leginfo.legislature.ca.gov

Carlini, N., Jagielski, M., Tang, L., Tramèr, F., Zhang, C., & Wallace, E. (2023). Extracting training data from diffusion models. arXiv. Doi: https://doi.org/10.48550/arXiv.2301.13188

Dark Reading. (2024). Hundreds of LLM servers expose corporate, health, and other online data. Retrieved from https://www.darkreading.com/application-security/hundreds-of-llm-servers-expose-corporate-health-and-other-online-data

European Parliamentary Research Service. (2020). The impact of the General Data Protection Regulation (GDPR) on artificial intelligence. European Parliament. Retrieved from https://www.europarl.europa.eu/thinktank/en/document/EPRS_STU(2020)641530

Fernandez, E. B., & Brazhuk, A. (2022). A critical analysis of Zero Trust Architecture (ZTA). SSRN. Doi: https://doi.org/10.2139/ssrn.4210104

Ghaleb, A., Traore, I., & Ganame, K. (2019). A generic agentless endpoint framework for security monitoring of cloud computing endpoints. In 2019 IEEE Conference on Communications and Network Security (CNS) (pp. 1–9). Doi: https://doi.org/10.1109/CNS.2019.8802828

Government of Thailand. (2019). Personal Data Protection Act, B.E. 2562 (2019). Royal Thai Government Gazette.

Manchanda, S., Gupta, K., Majumder, B. P., Shridhar, K., & Vig, L. (2024). The open-source advantage in large language models. arXiv. Doi: https://doi.org/10.48550/arXiv.2412.12004

National Institute of Standards and Technology. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1). U.S. Department of Commerce. Doi: https://doi.org/10.6028/NIST.AI.600-1

Organisation for Economic Co-operation and Development (OECD). (2019). OECD recommendation on artificial intelligence. Retrieved from https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449

Personal Data Protection Committee (PDPC). (2022). Guidelines on personal data protection measures, 2022. Royal Thai Government Gazette.

Royal Thai Government Gazette. (2019). Personal Data Protection Act B.E. 2562 (PDPA). Retrieved from https://www.ratchakitcha.soc.go.th/DATA/PDF/2562/A/069/T_0052.PDF

Singh, S., Singhania, P., Ranjan, A., Kirchenbauer, J., Geiping, J., Wen, Y., Jain, N., Hans, A., Shu, M., Tomar, A., Goldstein, T., & Bhatele, A. (2024). Democratizing AI: Open-source scalable LLM training on GPU-based supercomputers. arXiv. Doi: https://doi.org/10.48550/arXiv.2502.08145

Wang, Z., Zhong, W., Wang, Y., Zhu, Q., Mi, F., Wang, B., Shang, L., Jiang, X., & Liu, Q. (2024). Data management for training large language models: A survey. arXiv. Doi: https://doi.org/10.48550/arXiv.2312.01700

Zhou, X., Weyssow, M., Widyasari, R., Zhang, T., He, J., Lyu, Y., Chang, J., Zhang, B., Huang, D., & Lo, D. (2024). LessLeak-Bench: A first investigation of data leakage in LLMs across 83 software engineering benchmarks. arXiv. Doi: https://doi.org/10.48550/arXiv.2502.06215

ผู้แต่ง

คำสำคัญ:

บทคัดย่อ

เอกสารอ้างอิง

ดาวน์โหลด

เผยแพร่แล้ว

ฉบับ

ประเภทบทความ

สัญญาอนุญาต

Information

ภาษา

Developed By

thaijo

ฉบับปัจจุบัน