Abstract

This study outlines the machine learning-based prediction methodology for subsurface lithology and groundwater quality using the K-Nearest Neighbors algorithm on well data collected in the study region. The well data set includes spatial coordinates, well depth, lithology, and important water quality parameter values, such as Total Dissolved Solids, calcium, magnesium, turbidity, chloride, and pH. The machine learning-based prediction methodology using the K-Nearest Neighbors algorithm on well data collected in the study region was found to have prediction accuracy of 92.4% for lithology classification and 89.1% for groundwater quality parameter classification. Absolute prediction values were also obtained for the water quality parameter values, with Total Dissolved Solids varying from 420 to 980 mg/L and pH varying from 7.1 to 8.2, matching well with the observed values. On comparing with the WHO and BIS standards for drinking water, some of the well values were found to be above the maximum permissible limits for Total Dissolved Solids, calcium, and magnesium. This is due to the spatial variations in groundwater quality. The study proves that the K-Nearest Neighbors algorithm is effective in capturing spatial and feature-based similarities, thus being useful for hydrogeology-based prediction problems in relatively homogeneous regions.

Keywords

Machine Learning, Regression, Classification Modes, Neighbored Nearest Network KNN, Euclidean Distance, Soil Profile, Water Table Level,

Downloads

Download data is not yet available.

References

  1. U.B.P.S. Rathore, B. Sajan, S.K. Singh, S. Kanga, Urbanization and Water Stress: Analyzing the Impact of Rapid Urbanization on Local Water Resources and Proposing Sustainable Management Strategies. In Agri-informatics and Eco-friendly Innovations for a Secure Food Future Cham: Springer Nature Switzerland, 14, (2025) 353-374. https://doi.org/10.1007/978-3-032-02118-2_14
  2. R.K. Mishra, Fresh Water Availability and its Global Challenge. British Journal of Multidisciplinary and Advanced Studies, 4(3), (2023) 1-78. https://doi.org/10.37745/bjmas.2022.0207
  3. T. Pointet, The United Nations World Water Development Report 2022 on Groundwater, a Synthesis. Lhb, 108(1), (2022) 2090867. https://doi.org/10.1080/27678490.2022.2090867
  4. K.J. Hokanson, C.A. Mendoza, K.J. Devito, Interactions between Regional Climate, Surficial Geology, and Topography: Characterizing Shallow Groundwater Systems in Subhumid, Low‐Relief Landscapes. Water Resources Research, 55(1), (2019) 284-297. https://doi.org/10.1029/2018WR023934
  5. S.U. Wali, A.A. Usman, A.B. Usman, U. Abdullahi, I.U. Mohammed, J.M. Hayatu, Impact of Geology on Hydrogeological and Hydrochemical Characteristics of Groundwater in Tropical Environments: A Narrative Review. International Journal of Hydrology, 8(6), (2024) 202-221. https://doi.org/10.15406/ijh.2024.08.00392
  6. A. Binley, S.S. Hubbard, J.A. Huisman, A. Revil, D.A. Robinson, K. Singha, L.D. Slater, The emergence of hydrogeophysics for improved understanding of subsurface processes over multiple scales. Water resources research, 51(6), (2015) 3837-3866. https://doi.org/10.1002/2015WR017016
  7. M. Hasan, L. Su, Novel Insights into Deep Groundwater Exploration by Geophysical Estimation of Hard Rock Permeability. EGUsphere, 30(5), (2026) 1309-1332. https://doi.org/10.5194/egusphere-2024-4191
  8. F. Yang, M. Hasan, Y. Shang, A Novel Geophysical Approach for 2D/3D Fresh-Saline Water Assessment toward Sustainable Groundwater Monitoring. Sustainability, 18(1), (2026) 517. https://doi.org/10.3390/su18010517
  9. O. Davis lMUERE, Electromagnetic (EM) Methods in Exploration: Advantages and Challenges. Multi-Disciplinary Research and Development Journals Int'l, 8(1), (2026) 97-106.
  10. A. Hussain, A.H. Sakhaei, M. Shafiee, Machine Learning-Based Constitutive Modelling for Material Non-Linearity: A review. Mechanics of Advanced Materials and Structures, 33(1), (2026) 2439557. https://doi.org/10.1080/15376494.2024.2439557
  11. X. Feng, L. Liu, M. Ye, O. Masek, S. Gouda, K. Chang, Q. Huang, Unveiling and Interpreting the Relationships Among Multi-Pollutant Emission Factors in Municipal Solid Waste Incineration by Machine Learning. Waste Management, 210, (2026) 115256. https://doi.org/10.1016/j.wasman.2025.115256
  12. S.A. Boateng, J. Xi, M.P. Fumey, J.K. Kumi, Exploring the Nonlinear Relationship of Environmental Sustainability Factors and Economic Growth in West Africa: Novel Machine Learning Evidence. Sustainable Development, 34, (2026) 1197-1220. https://doi.org/10.1002/sd.70216
  13. A.G. Usman, H.M. Almongy, I.A. Mahmoud, A.M. Jibrin, J. Usman, M.S. Samsudin, S.I. Abba, E.M. Almetwally, Optimized Ensemble Techniques for Nitrate Concentration Modelling from Groundwater Integrated with Mrmr Extraction Algorithm. Journal of Radiation Research and Applied Sciences, 19(1), (2026) 102217. https://doi.org/10.1016/j.jrras.2026.102217
  14. R. Narsing, S.C. Konnoju, Generalized Reciprocal Based Tversky Indexive Support Vector Extreme Boost Classification for Water Quality Prediction Analysis. Water Resources Management, 40(4), (2026) 166. https://doi.org/10.1007/s11269-025-04465-3
  15. M. Siena, M. Riva, Impact of Geostatistical Reconstruction Approaches on Model Calibration for Flow in Highly Heterogeneous Aquifers. Stochastic Environmental Research and Risk Assessment, 34(10), (2020) 1591-1606. https://doi.org/10.1007/s00477-020-01865-2
  16. S. Misra, H. Li, J. He, Machine Learning for Subsurface Characterization. Gulf Professional Publishing. (2019).
  17. Y. Wang, C. Shi, X. Li, Machine Learning of Geological Details from Borehole Logs for Development of High-Resolution Subsurface Geological Cross-Section and Geotechnical Analysis. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards, 16(1), (2022) 2-20. https://doi.org/10.1080/17499518.2021.1971254
  18. O.H. Kombo, S. Kumaran, Y.H. Sheikh, A. Bovim, K. Jayavel, Long-Term Groundwater Level Prediction Model based on Hybrid KNN-RF Technique. Hydrology, 7(3), (2020) 59. https://doi.org/10.3390/hydrology7030059
  19. T. Xie, L. Chen, B. Yi, S. Li, Z. Leng, X. Gan, Z. Mei, Application of the Improved K-Nearest Neighbor-Based Multi-Model Ensemble Method for Runoff Prediction. Water, 16(1), (2024) 69. https://doi.org/10.3390/w16010069
  20. T.N. Navya, G. Ramkumar, A Methodical Outlook of Early Floods in an Uncertain Weather Forecasts using Igneous K-Nearest Neighbor Classifier. In AIP Conference Proceedings, 3383(1), (2026) 020017. https://doi.org/10.1063/5.0308580
  21. S. Ethaib, M. Fahs, H. Mishbak , M.N. Fares, J.S. Makki, A.Alhello, H. Abbood, S.N. Abdel Hassan, A.A. Alrijabo, M. Azaroual, H.M.Baalousha, N. Baghdadi, P. Blanc, J. Duclos, L. Drapeau, N. Hariri, H. Hussein, W.J. Hassan, T.E. Hussien, F. Lehmann, F.Le Ber, M.S. Mizel, R. Mohsin, A. Nasser, T. Nasser, A.F. Al-Ma’athedi, A. Raeis, R, Toussaint, A.W. Ngnien, A. Younes, K. Del Vecchio, A. Al Bitar, Water Resources in South of Iraq: Current State, Future Evolutions, Challenges, and Potential Solutions. Hydrology, 13(3), (2026) 87. https://doi.org/10.3390/hydrology13030087
  22. B. Saaeidi, Assessment of Groundwater Contamination by Heavy Metals (Cu, Pb, Cd, Cr, Co) in the Eastern Region of Wasit Governorate. Dijlah Journal of Agricultural Sciences, 5(1), (2026) 52-63.
  23. H.I.Z. Al-Sudani, Groundwater Utilization and Water Quality in Khanaqin District, Diyala Governorate, Northeast of Iraq. Resources Environment and Information Engineering, 6(1), (2024) 305-312. https://doi.org/10.25082/REIE.2024.01.004
  24. V.B. Prasath, H.A.A. Alfeilat, A. Hassanat, O. Lasassmeh, A.S. Tarawneh, M.B. Alhasanat, H.S.E. Salman, (2017) Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier-a Review. arXiv preprint arXiv:1708.04321
  25. M. Sakizadeh, R. Mirzaei, A comparative Study of Performance of K-Nearest Neighbors and Support Vector Machines for Classification of Groundwater. Journal of Mining and Environment, 7(2), (2016) 149-164. https://doi.org/10.22044/jme.2016.480
  26. R.K. Halder, M.N. Uddin, M.A. Uddin, S. Aryal, A. Khraisat, Enhancing K-Nearest Neighbor Algorithm: A Comprehensive Review and Performance Analysis of Modifications. Journal of Big Data, 11(1), (2024) 113. https://doi.org/10.1186/s40537-024-00973-y
  27. J. Yu, J. Amores, N. Sebe, P. Radeva, Q. Tian, Distance Learning for Similarity Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), (2008) 451-462. https://doi.org/10.1109/TPAMI.2007.70714
  28. X. Lin, Z. Tian, A. Chong, Y. Lu, J. Niu, N. Deng, A Data Informativeness Evaluation Method for Grey-Box Modeling of Building Thermal Dynamics. Energy and Buildings, (2026) 117103. https://doi.org/10.1016/j.enbuild.2026.117103
  29. Y.R. Lin, H.M. Wu, Image Generator For Tabular Data based on Non-Euclidean Metrics for CNN-Based Classification. PLoS One, 21(1), (2026) e0340005. https://doi.org/10.1371/journal.pone.0340005
  30. M.N. Koliaraki, N. Smyrnis, P. Asvestas, G.K. Matsopoulos, E.C. Ventouras, Saccadic Eye Movements Based Classification of Patients with Obsessive-Compulsive Disorder, Patients with Schizophrenia and Healthy Controls using Artificial Neural Networks. Cognitive Neurodynamics, 20(1), (2026) 41. https://doi.org/10.1007/s11571-026-10414-6
  31. I.D. Mienye, Y. Sun, A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access, 10, (2022) 99129-99149. https://doi.org/10.1109/ACCESS.2022.3207287
  32. Biswakalyani, C., Samantaray, S., & Satpathy, D. P. (2026). Application of Hybrid Machine Learning for Groundwater Level Prediction: A Comprehensive Review. Archives of Computational Methods in Engineering, 1-59.
  33. A. Kumar, M.L. Nehdi, Data-Driven Approaches to Groundwater Modelling: Methods, Applications, and Challenges. Hydrological Insights, (2026) 1-10. https://doi.org/10.1007/s11831-025-10447-w
  34. J. Tian, X. Zeng, D. Wang, J. Wu, A data‐driven approach coupled with physical constraints to improve groundwater models with structural error. Water Resources Research, 62(3), (2026) e2025WR040247. https://doi.org/10.1029/2025WR040247
  35. S. Rustum, U. Habib, S. Ahmed, M. Usman, M.A. Qureshi, Clustering-Assisted Channel Estimation for Free-Space Optical Satellite Communication. Optics Communications, (2026) 133011. https://doi.org/10.1016/j.optcom.2026.133011
  36. E.C.Y. Yuan, Y. Liu, J. Chen, P. Zhong, S. Raja, T. Kreiman, S. Vargas, W. Xu, M. Head-Gordon, C. Yang, S.M. Blau, Foundation Models for Atomistic Simulation of Chemistry and Materials. Nature Reviews Chemistry, (2026) 10, 212–230. https://doi.org/10.1038/s41570-025-00793-5
  37. G. Piras, F. Muzi, Z. Ziran, Assessment of the Reliability of AI Models in Predicting Urban Energy Consumption Under Conditions of Small or Incomplete Data. Applied Sciences, 16(3), (2026) 1457. https://doi.org/10.3390/app16031457