What Should Data Scientists Do to Save on Your Healthcare Costs?
When I was a little child, I frequently saw my grandmother pausing over her shoemaking work, looking for a needle that fell down from her hand to the floor. There was only a tiny window in the big wooden house and no electricity, so the room was very dark. Even now, with computers and other advanced technologies available for finding fraud in the healthcare field, I still feel like my grandmother looking for a needle in the dark.
Is it because healthcare fraud is a rare event? No! There are about four billion health insurance claims processed in the United States, and the National Health Care Anti-Fraud Association estimates a loss of $56.7 to $170 billion annually, leading to higher health care costs and taxes. There are many reasons for this complexity, and when fraud analysts are new to the field, they may not be fully aware of billing intricacies that contribute to the challenges in analyzing this data.
The most challenging component for fraud analysts is how to define an outlier. From a statistical perspective, it is relatively easy: they are extreme observations that are very dissimilar to the rest of population. Statisticians calculate the minimum and maximum values, and review tools like histograms, box plots, and z-scores, all of which focus on univariate outliers. But the approach to identify outliers can also be a little broader, beyond relying only on a univariate analysis.
Multivariate outliers can be detected by fitting regression lines and inspecting the observations with large errors, but it is not typically considered in many modeling exercises, due to the typical marginal impact on model performance. One can also perform other analyses, such as peer group analysis and social network analysis to assess other contributing variables.
For healthcare fraud detection, all these analytical approaches apply and allow for increased identification of vulnerabilities. As is widely known, Florida has the reputation for the highest rate of fraud and identity theft. Why is this? Home to the largest percentage of retirees, Florida remains the hotspot for identity theft and has been long known for Medicare fraud. Coupled with the targeting of older adults for fraud and we have an alarming situation where more analysis should be done.
Nowadays, machine learning techniques such as decision trees, neural networks, and SVMs can be used and are fairly robust with respect to outliers. Others techniques, such as linear and logistic regression are more sensitive. Clinician consultation is also important to fraud detection and analysis, but it can be costly. The outliers you find might be because the providers have just developed new treatment strategies or conducted clinical trials. An intermediary step may be to search their clinical trial information on their websites and their publications through PubMed to make sure additional tests or other services reflect medical necessity and justify the treatment and clinical rationale.
The requirements for a fraud data scientist include solid quantitative skills, strong programming, communication, and visualization skills — those who can tell good stories and explain the accuracy and relevancy of their models to those who may not know science or precision scoring.
Analysts with clinical backgrounds are desirable when it comes to working with unstructured data, which requires knowledge and experience in electronic medical record (EMR), pharmacy, and claim data. This may allow you to save on costs as compared to a clinician, and control the false positive and negative rates. A true number for healthcare fraud is somewhat difficult to estimate, because if the fraudster is really successful, they won’t be detectable. False negatives can also be boring for daily work and give the feeling of just spinning wheels. False positives for healthcare, on the other hand might cost you a job, if there are too many complaints on your unacceptable false positive rate (i.e., too many accusations that prove to be untrue).
The fraud data scientist should also have a solid understanding of the business, in addition to a clinical background, which would allow them to understand the claim data, such as whether a test, a treatment, or a medicine is medically necessary. This background will also help the data scientist identify a false diagnosis. For example, it is not uncommon to find a man with breast cancer, but it should raise a flag to find a Medicare beneficiary in her eighties receiving medications for nausea associated with pregnancy. The analyst should have knowledge of proper coding and be aware of the criterion for the various coding levels in order to detect “upcoding” (indicating that a more complex visit took place than what actually occurred.)
In addition, there are many rule-based models that need background knowledge in policies from county to state levels in order to identify fraud and abuse. These include any practices that are inconsistent with acceptable medical practices and unnecessarily increase costs. Fraudsters might also compare the policies between different insurance companies and take advantage of loopholes. Since Medicare began covering home sleep testing in 2009, one company has received nearly $9 million from Medicare, almost all of it the result of fraud and kickbacks. They submitted claims for Medicare recipients’ second and third nights of home sleep testing when they knew that only a single night of testing was needed to effectively diagnose obstructive sleep apnea. They routinely tested and claimed only the one night for non-Medicare beneficiaries. Data scientists can, after identifying problems, provide feedback to policy makers.
A strong fraud data scientist should be creative, skilled, and curious. This creativity must be a continuous process as once the fraud model is developed, it could work for some time before fraudsters change their behavior to avoid getting caught, requiring the model to be retired or modified. The data scientist should detect and intercept new fraud in a relatively short period of time, and have the flexibility to apply analytic scrutiny on different fraud types.
Ideally, an analyst should be fluent in different programming languages (e.g. R, SAS, Matlab, Python, SPSS) but also understand the economic determinants of further investigation. Analysts need to make the connection between statistical relevance and usefulness on the market. For example, there are many causes why healthcare cost is high in the United States and this has been a serious problem. Israel’s healthcare outpaces the U.S.A, with high quality service and good outcomes, but the percentage of the country’s gross domestic product going to healthcare is less than half that of the United States and coverage is universal. Understanding the economics that underline this helps to understand the statistics.
Scientific fraud detection is a challenging career, but also very rewarding. Besides the feeling of accomplishing something for the greater good, the Centers for Medicare and Medicaid Services (CMS) says every dollar that was invested in the agency’s Medicare “integrity efforts” (fraud analysis) saved $12.40 for the Medicare program.
(with Marquis Payne and Cal Zemelman)