Can Machine Learning Improve Tax Enforcement Without Human Teachers?

TaxVox

Janet Holtzblatt

July 14, 2022

When it comes to tax enforcement, I used to believe the Internal Revenue Service could never have too much information. But the agency already collects an incredible amount of information that it underuses or does not use at all. Blame decade-long budget cuts, decades-long underinvestment in technology, and an overly complex tax code.

Last year, Treasury offered a beacon for hope—assuming Congress approves the Biden Administration’s dormant proposal to boost the IRS’s budget. Buried in its tax compliance plan is a pledge that a portion of the additional money would be invested in developing machine learning capabilities—a set of statistical tools that can detect complex relationships in Big Data. The goal: To leverage existing information to better identify tax returns for compliance review.

The IRS is not waiting for the money to begin research and development. At last month’s IRS-TPC joint research conference on tax administration, researchers described how machine learning could be used to tackle two of the most complicated areas of the tax code—partnerships and profit-shifting by businesses.

The research is promising but still at a very early stage. The following week, TPC hosted a panel of experts from the IRS, the Massachusetts Institute of Technology, and the Brookings Institution to discuss some of the benefits and the challenges of using machine learning to improve tax enforcement.

There were two important takeaways: First, compared to current methods, machine learning could make better use of data by finding subtle patterns that can indicate noncompliance. Second, the name “machine learning” is misleading. Machines can’t do it alone. Humans teach the machines, and humans grade their output. And then it’s up to humans to correct the machines if their predictions are wrong.

Information overload?

Every fall, the IRS matches information returns, such as W-2s and 1099s, to individual income tax returns. In 2018, the IRS received about 3 billion information returns and detected 22.3 million discrepancies. But it selected only 2.9 million for further review.

Maybe that’s good enough. Taxpayers are more likely to comply with the tax code when they know the IRS has reliable information about their income from third parties. The evidence? They accurately report about 99 percent of wages and salaries on their tax returns.

Let’s face it though: W-2 wage reporting sets a very high bar. Often, third-party information has holes. And taxpayers figure out how to game those gaps. For example, the newish 1099-K shows payments received by independent contractors but not their deductible expenses such as gas purchases by ridesharing drivers. Some taxpayers have responded by switching from understating gross receipts to overstating expenses on their returns.

Machine learning for dummies (like me)

To help non-experts understand how machine learning works, Alex Engler from Brookings and I prepared a primer on the tool and its potential for tax enforcement.

Here are a few quick lessons.

Two prominent types of machine learning are supervised or unsupervised. In the first instance, computers would be fed examples of noncompliant taxpayers who were identified in past audits. The computer would then observe relationships between noncompliance and other variables, such as high losses or excessive expenses. The results would become the basis for a predictive model that could kick out suspicious incoming returns for more review by IRS staff.

In contrast, unsupervised learning does not require examples. The computer would pick up anomalies among newly filed returns to identify emerging trouble areas. For example, it might point to a return that, relative to others with similar characteristics, shows much less taxable income. Here too, IRS staff would check out those trends.

Under either approach, computers could absorb and analyze more data than the conventional approaches. Because they can identify relationships and patterns, they don’t need a precise link between a specific data point and a line on a tax return.

But machine learning won’t solve every problem. By building on historical data, supervised learning might be unduly influenced by past audits. Some may have been settled in the noncompliant taxpayers’ favor solely because the IRS did not have sufficient IRS resources to pursue the case. Others may have been settled in the IRS’s favor solely because compliant taxpayers were too scared to respond to an audit notice. Neither will help show true compliance patterns.

A second challenge: the complexity of the law and of the computer models themselves. Will computers make any more sense of the gray areas of the tax code than talented lawyers? Machine learning can deploy very complicated models based on relationships between many variables, making it difficult to isolate which factors contribute to noncompliant behavior—important to know when designing strategies to improve compliance.

In the end, machines still need humans to help them make sense of our complex world. And that will require the expertise of an experienced and skilled IRS staff to feed the computers the right evidence, wisely evaluate their output, and—when necessary—to finetune the predictive models.