Decision of feature importance for blood test analysis by using SHAP value

Serap Ergün

Abstract


Users use machine learning methods to make sense of the data in the data model and determine what data should be considered. Providing interpretability is essential when developing a prediction model using machine learning. SHAP Value is an index for assessing the contribution of the input characteristics to model learning has been developed and garnered interest. Using decision tree-based models, which are frequently used to represent table data, it is demonstrated in this study that SHAP value may relatively accurately estimate the contribution of features to model learning. Game theory-based importance judgments is used to identify significant test items from blood test data. The stepwise procedure used to choose the test items resulted in consistent weights that are allocated regardless of the sequence in which they appeared; therefore they are not always appropriate in terms of importance. In this research, a game-theoretical-based important selection approach is offered for weighting test items chosen using the stepwise method. This approach is also used to extract test results that are deemed crucial from data from actual blood tests.


Keywords


Game Theory Feature importance; Machine learning; SHAP value

Full Text:

PDF

References


Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).

Rozemberczki, B., Watson, L., Bayer, P., Yang, H. T., Kiss, O., Nilsson, S., & Sarkar, R. (2022). The Shapley Value in Machine Learning. arXiv preprint arXiv:2202.05594.

Meng, Y., Yang, N., Qian, Z., & Zhang, G. (2020). What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values. Journal of Theoretical and Applied Electronic Commerce Research, 16(3), 466-490.

Futagami, K., Fukazawa, Y., Kapoor, N., & Kito, T. (2021). Pairwise acquisition prediction with SHAP value interpretation. The Journal of Finance and Data Science, 7, 22-44.

Wang, D., Thunéll, S., Lindberg, U., Jiang, L., Trygg, J., & Tysklind, M. (2022). Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods. Journal of Environmental Management, 301, 113941.

Marcílio, W. E., & Eler, D. M. (2020, November). From explanations to feature selection: assessing shap values as feature selection mechanism. In 2020 33rd SIBGRAPI conference on Graphics, Patterns and Images (SIBGRAPI) (pp. 340-347). Ieee.

Baptista, M. L., Goebel, K., & Henriques, E. M. (2022). Relation between prognostics predictor evaluation metrics and local interpretability SHAP values. Artificial Intelligence, 306, 103667.

Mangalathu, S., Hwang, S. H., & Jeon, J. S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927.

Lee, Y. G., Oh, J. Y., Kim, D., & Kim, G. (2022). SHAP Value-Based Feature Importance Analysis for Short-Term Load Forecasting. Journal of Electrical Engineering & Technology, 1-10.

Parsa, A. B., Movahedi, A., Taghipour, H., Derrible, S., & Mohammadian, A. K. (2020). Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accident Analysis & Prevention, 136, 105405.

Yang, C., Chen, M., & Yuan, Q. (2021). The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: An exploratory analysis. Accident Analysis & Prevention, 158, 106153.

Bi, Y., Xiang, D., Ge, Z., Li, F., Jia, C., & Song, J. (2020). An interpretable prediction model for identifying N7-methylguanosine sites based on XGBoost and SHAP. Molecular Therapy-Nucleic Acids, 22, 362-372.

Chalkiadakis, G., Elkind, E., & Wooldridge, M. (2011). Computational aspects of cooperative game theory. Synthesis Lectures on Artificial Intelligence and Machine Learning, 5(6), 1-168.

Alparslan Gök, S. Z., Branzei, R., & Tijs, S. (2010). The interval Shapley value: an axiomatization. Central European Journal of Operations Research, 18(2), 131-140.

Mitchell, R., Frank, E., & Holmes, G. (2022). GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles. PeerJ Computer Science, 8, e880.

Rozemberczki, B., Watson, L., Bayer, P., Yang, H. T., Kiss, O., Nilsson, S., & Sarkar, R. (2022). The Shapley Value in Machine Learning. arXiv preprint arXiv:2202.05594.

Merrick, L., & Taly, A. (2020, August). The explanation game: Explaining machine learning models using shapley values. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction (pp. 17-38). Springer, Cham.

Mitchell, R., Frank, E., & Holmes, G. (2022). GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles. PeerJ Computer Science, 8, e880.

Johansson, E. K., Bergström, A., Kull, I., Melén, E., Jonsson, M., Lundin, S., ... & Ballardini, N. (2022). Prevalence and characteristics of atopic dermatitis among young adult females and males—report from the Swedish population‐based study BAMSE. Journal of the European Academy of Dermatology and Venereology, 36(5), 698-704.

Barberis, E., Khoso, S., Sica, A., Falasca, M., Gennari, A., Dondero, F., ... & Manfredi, M. (2022). Precision Medicine Approaches with Metabolomics and Artificial Intelligence. International Journal of Molecular Sciences, 23(19), 11269.

Altaf, I., Butt, M. A., & Zaman, M. (2022). Disease detection and prediction using the liver function test data: A review of machine learning algorithms. In International Conference on Innovative Computing and Communications (pp. 785-800). Springer, Singapore.

Jiménez, S., Angeles-Valdez, D., Delgado, A. R., Fresán, A., Miranda, E., Alcalá-Lozano, R., ... & Garza-Villarreal, E. A. (2021). Machine learning detects predictors of clinical change after Dialectical Behavior Therapy in Borderline Personality Disorder.


Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Selcuk University Journal of Engineering Sciences (SUJES) ISSN:2757-8828

Abstracting and indexing

Index Copernicus International

scholar_logo_64dp.png

Selcuk university journal of engineering sciences (Online)

ICI World of Journals

ResearchBib