Forecasting health insurance premium using machine learning approaches

Main Article Content

Shawni Dutta
Payal Bose
Samir K. Bandyopadhyay

Abstract

A medical emergency can impact anybody at any moment and have a significant psychological and economic consequence. Health insurance encompasses different costs including hospital charges, medicine costs, physician consultation fees, etc. The importance of health insurance cannot be underestimated, given the exponential rise in healthcare expenses. This research has attempted to forecast the cost of health insurance premiums. To address this problem, an automated system can be built that analyzes an individual's health complications and forecasts associated costs. Machine learning-based techniques can be employed to design the automated system. This research work will use ensemble-based machine learning approaches to evaluate an individual's risk and anticipate premium costs. This will allow the insurance firms to set a minimal fee with a higher profit in order to attract more policyholders. Random Forest, CatBoost, Extra Trees, AdaBoost, Extreme Gradient Boost, and Gradient Boost models are popular ensemble-based algorithms that are applied to an individual's health information and used to estimate premium price. The automated model can be developed by applying the Random Forest model with a Mean Square Error (MSE) of 0.01 and Mean Absolute Error (MAE) of 0.0394, according to the comparative study of these employed models. The graphical representation of the comparative analysis is depicted in Figure 1A and 1B based on MAE and MSE respectively. The research finds relative ranking among the interfering factors for determining medical cost for an individual. According to the Random Forest model's findings, a person's age has the greatest impact on determining the related premium.

Article Details

How to Cite
Dutta, S., Bose, P., & Bandyopadhyay, S. K. (2023). Forecasting health insurance premium using machine learning approaches. Asia-Pacific Journal of Science and Technology, 28(06), APST–28. https://doi.org/10.14456/apst.2023.96
Section
Research Articles

References

Meenakshisundaram KS, Krishnekumaar ST. Age Factor-A Basic Parameter for Health Insurance-A Study with Special Reference to Chennai City among Standalone Health Insurers. Int J Manag Human. 2020;4(5):78-88.

Lee SH, Brown SL, Bennett AA. The relationship between insurance and health outcomes of diabetes mellitus patients in Maryland: a retrospective archival record study. BMC Health Serv Res. 2021;21:495.

Institute of Medicine (US). Committee on the Consequences of Uninsurance. Washington, D.C.: National Academy Press; 2002.

Su Z, McDonnell D, Cheshmehzangi A, Abbas J, Li X, & Cai Y. The promise and perils of Unit 731 data to advance COVID-19 research. BMJ Global Health. 2021;6(5):e004772.

Maqsood A, Abbas J, Rehman G, Mubeen R. The paradigm shift for educational system continuance in the advent of COVID-19 pandemic: Mental health challenges and reflections. Current Res Behav Sci. 2021;2:100011

Perc M, Ozer M, Hojnik. Social and juristic challenges of artificial intelligence. Palgrave Commun.2019;5:61.

Wernly B, Mamandipoor B, Baldia P, Jung C, Osmani V. Machine learning predicts mortality in septic patients using only routinely available ABG variables: a multi-centre evaluation. Int J Med Inform. 2021;145:104312.

Huang YC, Li SJ, Chen M, Lee TS. The Prediction Model of Medical Expenditure Appling Machine Learning Algorithm in CABG Patients. Healthcare (Basel). 2021;9:710.

Sharma DK, Sharma A. Prediction of health insurance emergency using multiple linear regression technique. Eur J Mol Clin Med. 2020;7:98-105.

Nidhi Bhardwaj, Rishabh Anand. Health Insurance Amount Prediction. Int J Eng Res. 2020;V9(05):1008-1011.

Panay B, Baloian N, Pino JA, Peñafiel S, Sanson H, Bersano N. Predicting health care costs using evidence regression. In: Bravo J, González I, editors. The 13th International Conference on Ubiquitous Computing and Ambient ‪Intelligence UCAmI 2019; 2019 Dec 2-5; Toledo, Spain. Basel; MDPI; 2019. p.74. ‬‬

Yang C, Delcher C, Shenkman E, Ranka S. Machine learning approaches for predicting high cost high need patient expenditures in health care. Biomed Eng. 2018;17:1-20.

Yan Y, Yang D. A stock trend forecast algorithm based on deep neural networks. Sci Program. 2021;2:1-7.

Ye R, An N, Xie Y, Luo K, Lin Y. An Empirical Study on the Equity Performance of China's Health Insurance Companies During the COVID-19 Pandemic-Based on Cases of Dominant Listed Companies. Front Pub Health. 2021;9:663189.

Rao PS, Srinivas K, Mohan AK. A survey on stock market prediction using machine learning techniques. In: Kumar A, Paprzycki M, Gunjan VK, editors. ICDSMLA 2019. Singapore: Springer; 2019. p. 601.

Nayak A, Pai MM, Pai RM. Prediction models for Indian stock market. Proc Comput Sci. 2016;89:441-449.

Shastri R, Rengarajan A. Prediction of Car Price using Linear Regression. Int J Trend Sci Res Dev. 2021;5:866-869.

Asghar M, Mehmood K, Yasin S, Khan ZM. Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan J Eng Technol. 2021;4:113-119.

Yoosefi LJ, Abbas J, Moradi F, Salahshoor MR, Chaboksavar F, Irandoost SF, et al. How the COVID-19 pandemic effected economic, social, political, and cultural factors: A lesson from Iran. Int J Soc Psychiatry. 2021;67:298-300.

Abbas J, Wang D, Su Z, Ziapour A. The Role of Social Media in the Advent of COVID-19 Pandemic: Crisis Management, Mental Health Challenges and Implications. Risk Manag Health Policy. 2021;14:1917-1932.

Boelaert J, Ollion É. The great regression. Rev Fr Sociol. 2018;59:475-506.

Onan, A. An ensemble scheme based on language function analysis and feature engineering for text genre classification. J Inf Sci. 2018;44:28-47.

Onan A. Classifier and feature set ensembles for web page classification. J Inf Sci. 2016;42(2):150-165.

Livingston F. Implementation of Breiman’s random forest machine learning algorithm. ECE591Q Mach Learn J Paper. 2005:1-13.

Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63:3-42.

Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013;7:21.

Schapire RE. Explaining adaboost. Στο: Empirical inference. Berlin: Springer; 2013:37-52.

Chen T, Guestrin C, editors. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016 Aug 13-17; San Francisco California: USA. New York: Association for Computing Machinery; 2016.

Hong J. An Application of XGBoost, LightGBM, CatBoost Algorithms on House Price Appraisal System. Hous Finance Res. 2020;4:33-64.

Rong S, Bao-wen Z. The research of regression model in machine learning field. InMATEC Web of Conferences 2018. EDP Sci. 2018;176:01033.

Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005;30:79-82.

TejasBard. Medical Insurance Premium Prediction Predict Yearly Medical Cover Cost (₹) Daily, Kaggle Data v2 [Internet]. 2021 [cited 2021 August 20]. Available from: https://www.kaggle.com/tejashvi14 /medical-insurance-premium-prediction.

Aqeel M, Abbas J, Shuja K.H, Rehna T, Ziapour A, Yousaf I, et al. The influence of illness perception, anxiety and depression disorders on students mental health during COVID-19 outbreak in Pakistan: a Web-based cross-sectional survey. Int J Human Rights Health. 2021;14:1-14.

NeJhaddadgar N, Ziapour A, Zakkipour G, Abbas J, Abolfathi M, Shabani M. Effectiveness of telephone-based screening and triage during COVID-19 outbreak in the promoted primary healthcare system: a case study in Ardabil province, Iran. Z Gesundh Wiss. 2020;29:1-6.