Combining Knowledge Representation and Machine Learning for Improved Healthcare Claims Fraud Detection
Abstract
Healthcare claims fraud detection has become increasingly vital as healthcare systems grow in scale and complexity. The sheer volume of claims, alongside the heterogeneous nature of clinical, administrative, and billing records, creates a challenging environment for conventional rule-based or purely data-driven methods. This paper introduces an integrative framework that leverages knowledge representation structures together with advanced machine learning techniques to enhance fraud detection outcomes. By encoding essential domain knowledge in a structured format, our approach captures the semantic relationships and logical constraints inherent in healthcare claims. These representations guide data preprocessing, feature engineering, and explainability to complement powerful predictive models that target various fraud patterns, from upcoding and unbundling to fictitious billing. We discuss how domain-specific ontologies, rule-based inference engines, and first-order logic statements can interact with supervised and unsupervised learning approaches to capture both explicit and implicit indicators of fraudulent behavior. We detail an empirical evaluation that tests the proposed integrated system on a large-scale claims dataset, comparing performance metrics against baseline methods. The results underscore the value of a knowledge-infused pipeline, indicating superior detection accuracy, reduced false positives, and heightened interpretability of fraudulent cases. By bridging symbolic knowledge representation with robust machine learning algorithms, the proposed method promises a more reliable and comprehensible means of addressing healthcare claims fraud detection.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 author

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.