Brief Description of the Tax Model
Updated on April 4, 2017.
The Urban-Brookings Tax Policy Center’s large-scale microsimulation model produces revenue and distribution estimates of the U.S. federal tax system. The model is similar to those used by the Congressional Budget Office (CBO), the Joint Committee on Taxation (JCT), and the Treasury's Office of Tax Analysis (OTA).
The TPC model produces estimates for each individual year from 2011 through 2027—the end of the ten-year budget window. We recently expanded our model by developing a long-run module that produces revenue and distribution estimates at ten-year intervals for the period from 2030 through 2090.
I. Tax Model Database
The model’s primary data source is the 2006 public-use file (PUF) produced by the Statistics of Income (SOI) Division of the Internal Revenue Service (IRS). The PUF contains 145,858 records with detailed information from federal individual income tax returns filed in the 2006 calendar year.[i] Beginning with the 2006 data, we employ a two-step process to create a file that is representative of the tax filing population for the 2011 tax year.[ii] In the first step of the process, we use published tax data to calculate per-return average growth rates for income, deduction, and other items between 2006 and 2011 by adjusted gross income (AGI) class. We then use these growth rates to adjust the dollar amounts on each PUF record. In the second step of the process, we use a constrained optimization algorithm to reweight the records in order to match an extensive set of about 100 targets for both return counts and dollar amounts. We refer to the resulting file as the 2011 “Look-Alike Public Use File” or LAPUF.
We next use cross-tabulations of age, filing status, and income sources provided to us by SOI and implement a raking algorithm to impute the ages of taxpayers and their dependents onto the LAPUF. We add information on other demographic characteristics and sources of income that are not reported on tax returns through a constrained statistical match of the LAPUF with data for 2011 from the March 2012 Current Population Survey (CPS) of the U.S. Census Bureau. That match also generates a sample of individuals who do not file individual income tax returns (“non-filers”). The data set combining filers from the LAPUF (augmented by demographic and other information from the CPS) and non-filers from the CPS provides us with a representative sample of the entire population rather than just the segment that files income tax returns. This allows us to estimate the revenue and distributional impact of tax proposals that would potentially affect current non-filers.
We then augment the tax model database by imputing wealth, education, consumption, health, and retirement-related variables for each record in the matched LAPUF-CPS file.
A. Wealth Imputations
Because the income tax data in our model contain no direct information about wealth holdings, we rely on information from the Survey of Consumer Finances (SCF) to develop imputations of assets and liabilities. Specifically, we impute assets and liabilities to each record in the income-tax file based on probits and ordinary least squares (OLS) regressions of those wealth components against explanatory variables that exist on both the SCF and SOI data sets. To mitigate the problem of the SCF’s small sample size —it contains fewer than 5,000 observations—we pool data from the 2010 and 2013 surveys. In addition to roughly doubling the sample size, combining data from the two years mitigates some of the temporal variation in asset values. We then calibrate the imputed number of individuals owning each type of asset (and liability) and their aggregate values to match SCF totals, augmented by the net worth of the Forbes 400.[iii] We further adjust the imputed distribution of each asset and liability by income class to more closely resemble those reported in the SCF.
B. Education Imputations
In order to model tax incentives for education and their interaction with Pell Grants, we impute student characteristics to the tax model. First, we use data from the 2011-2012 National Postsecondary Student Aid Study (NPSAS)[iv] combined with an indicator from the PUF as to whether a particular tax unit reported education tax incentives (such as the Lifetime Learning Credit or the above-the-line deduction for education expenses) to impute the presence of post-secondary students to each record in the database. We then use the NPSAS to impute student characteristics, such as enrollment intensity, class year, and institution type, as well as education expenses, including tuition and fees, books, room and board, and transportation. We use these imputed characteristics to calculate potential education tax incentives and Pell Grants and assign take-up rates in order to match actual tabulations by income from SOI and the Department of Education. Imputing receipt of both education tax incentives and Pell Grants allows us to examine in detail all federal assistance to post-secondary students.
C. Consumption Imputations
In order to model the distributional impact of federal excise taxes and a variety of other indirect taxes, including broad-based consumption taxes (e.g., a value-added tax, or VAT) and environmental taxes, we impute consumption spending to each record in the tax model database. We use data from the Consumer Expenditure Survey (CEX) to produce estimates of consumption expenditures across 16 categories of goods and services for each household in our model. We also use the Urban Institute’s Dynamic Simulation of Income Model (DYNASIM) to estimate the amount of future consumption financed out of current wealth, which allows us to analyze transitional issues for options that move the tax system from an income base to a consumption base. This allows us to estimate the distributional impact of hybrid income-consumption tax systems and other comprehensive reform options, such as the plans endorsed by the President’s Advisory Panel for Federal Tax Reform in 2005 and, more recently, by the Bipartisan Policy Center’s Debt Reduction Task Force.
D. Health Imputations
In order to analyze tax subsidies for health insurance and medical expenses, we impute health insurance status and employer-provided health benefits to each record in our database. We begin by assigning initial health insurance status using data from our statistical match with the 2012 CPS. We then modify coverage to be consistent with CBO projections of coverage after implementation of the Affordable Care Act (ACA). We impute employer-provided health benefits by statistically matching tax units with employer-sponsored health insurance to employers offering health coverage in the 2010 and 2011 Kaiser/HRET employer surveys.[v] The health benefits we impute include premiums for health insurance, dental insurance, and vision insurance, as well as contributions to Health Savings Accounts, Health Reimbursement Arrangements, and Medical Flexible Spending Accounts. These imputations allow us to analyze tax expenditures for employer-provided health benefits and ACA tax provisions including the high premium excise tax, the penalty on individuals with insufficient coverage, the penalty on employers offering insufficient coverage, and tax credits for non-group insurance purchased through health insurance exchanges.
E. Retirement Imputations
In order to analyze the revenue and distributional implications of tax measures related to retirement savings, we impute a comprehensive set of pension and savings variables for each household in our database. These variables include each taxpayer’s eligibility for a defined benefit pension, a defined contribution pension, and an Individual Retirement Arrangement (IRA) as well as contribution amounts, accrued benefits, and asset balances. We rely on information from the SCF to impute pension characteristics as well as pension and IRA asset balances. We use SOI data to impute IRA characteristics. We supplement and calibrate these imputations to match publicly available administrative data by incorporating information from other sources such as the Department of Labor, Treasury, Census, and DYNASIM.
F. Other Imputations
To complete the tax model database, we perform a number of other imputations. First, we use tabulations from the Urban Institute’s TRIM3 microsimulation model to adjust the reported values of certain non-taxable transfer payments.[vi] The reported values obtained through our statistical match with the CPS generally undercount both the number of recipients and total dollar amounts for food stamps (SNAP), Temporary Assistance for Needy Families (TANF), and Supplement Security Income (SSI). We therefore adjust our counts and amounts to match the TRIM3 reported values more closely.
This latest version of the tax model also includes improved imputations of mortgage interest on second homes and of deductible interest on home equity loans. The model also contains imputations for all itemizable deductions—including charitable contributions, medical expenses, and home mortgage interest—for “non-itemizers,” people who claim only the standard deduction on their tax return. These imputations allow us to model the distribution and revenue implications of proposals to replace certain deductions with credits that would be available to all taxpayers regardless of itemization status.
II. Aging and Extrapolation Process
The full tax model database is a representative national sample of the population for calendar year 2011. In order to carry out revenue and distribution analysis for future years, we “extrapolate” or age the 2011 data.
For the years from 2012 to 2027, we “age” the data based on CBO forecasts and projections for growth of various types of income; CBO and JCT baseline revenue projections; IRS estimates of future growth in the number of tax returns; JCT estimates of the distribution of tax units by income; and Census data on the size and age-composition of the population. We use the actual 2012 through 2014 tax data available at the time we developed the database (early 2017). A two-step process produces a representative sample of the filing and non-filing population in years beyond 2011. We first inflate the dollar amounts of income, adjustments, deductions, and credits on each record by their appropriate forecasted per capita growth rates. We use the CBO’s forecast for per capita growth of each major income source, such as wages, capital gains, and non-wage income (interest, dividends, Social Security benefits, and others). We assume that most other items grow at CBO’s projected growth rate for per capita personal income. In the second stage of the extrapolation, we use a linear programming algorithm to adjust the weights on each record so that the major income items, adjustments, and deductions match aggregate targets. We also attempt to adjust the overall distribution of income to match published information from the Statistics of Income (SOI) division of the Internal Revenue Service (IRS) for 2012 and published estimates of the 2016 distribution from JCT. We extrapolate recent trends to obtain projected distributions for other years beyond 2016 and modify those distributions in order to hit CBO's published forecasts for baseline individual income tax revenue.
We use a similar two-stage technique in the long-run module to age the data for 2030, 2040, and each ten-year increment through 2090. For 2030 and beyond, we rely primarily on projections from CBO and from the Urban Institute’s DYNASIM3 model. DYNASIM3 is a dynamic microsimulation model that is designed specifically to project the population and analyze the long-run distributional effects of retirement and other aging issues [vii].
In the first stage of the long-run aging process, we use CBO’s long-run inflation assumptions together with DYNASIM3 projections for the real growth in major income items such as wages and salaries, business income, capital income, pension income, and Social Security benefits, to grow the dollar amounts on record in the tax model database.
In the second stage of the long-run extrapolation, we use our linear programming algorithm to adjust the weights on each record so that the major income amounts and certain other items match aggregate targets derived from the DYNASIM3 and CBO forecasts. For example, we determine long-run targets for health insurance coverage and the number of post-secondary students by applying the demographic trends from DYNASIM to the health insurance status and student counts generated by the tax model for 2027. Similarly, we derive long-run targets for retirement coverage and contributions from a combination of DYNASIM and CBO projections and the baseline imputations in the tax model.
We also use the second-stage reweighting algorithm to hit DYNASIM3 targets for the age distribution of the population and other demographic characteristics, including the number of married and single tax units.
Finally, we use the reweighting process to target the distribution of tax return units by income as projected by DYNASIM3, adjusted in order to hit CBO’s projected individual income tax revenue through 2040. For years after 2040, we rely exclusively on DYNASIM3’s projection of changes in the income distribution. For years after 2060, we employ a further across-the-board weight adjustment primarily to correct for a population undercount and to ensure that our model generates the level of individual income tax revenue implied by the CBO long-run projections.
III. Tax Calculators
The tax model consists of a set of detailed tax calculators that: (a) compute individual income tax liability for all filers in the sample under current law and under alternative policy proposals; (b) compute the employee and employer shares of payroll taxes for Social Security and Medicare; (c) assign the burden of the corporate income tax and excise taxes to tax units; and (d) determine the expected value of estate tax liability for each tax unit in the sample using an estate tax calculator in combination with age-specific mortality rates.
A. Individual Income Tax Calculator
Based on the extrapolated data set, we can simulate policy options using a detailed tax calculator that captures most features of the federal individual income tax system, including the alternative minimum tax (AMT). The model's current law baseline reflects major income tax legislation enacted through January of 2017, including the Protecting Americans from Tax Hikes Act of 2015 (PATH). The PATH Act made permanent many "tax extenders" including the deduction for state and local sales taxes and the Research and Experimentation tax credit. It also extended bonus depreciation for businesses and made permanent the enhancements to several individual income tax credits originally enacted in the 2009 stimulus legislation. Our model also includes the provisions in the American Taxpayer Relief Act of 2012 (ATRA) signed into law in January of 2013. ATRA made permanent most of the provisions enacted in the 2001 and 2003 tax acts (EGTRRA and JGTRRA) and permanently patched the alternative minimum tax.[viii]
In our distribution tables, we assume that the burden of the individual income tax falls on the payer. CBO, JCT, and Treasury all use the same assumption.
B. Payroll Tax Calculator
Using the extrapolated data set, we also calculate federal payroll taxes for Social Security and Medicare. One complication is that for married couples, our tax return data only provide information on combined earnings whereas payroll taxes are based on individual earnings. This is important because the amount of earnings subject to the Social Security portion of payroll taxes is capped at $127,200 for 2017, a limit that is indexed annually based on wage growth. For married couples, we therefore assign earnings to each individual based on the split in wages observed on the CPS record to which the LAPUF record was matched.
In our distribution tables, we assume that the worker bears the burden of both the employer and employee portions of payroll taxes. This premise is widely accepted among economists. CBO, JCT, and Treasury all make the same assumption for their distributional analyses.
C. Assigning Corporate Tax Burden to Individuals
Although firms pay the corporate income tax, the economic incidence of the tax falls on individuals. TPC’s tax model therefore distributes the burden of the tax to individuals. The incidence of the corporate tax, however, is an unsettled theoretical issue. The tax could be borne by the owners of corporate stock, or passed on in part to labor in the form of lower real wages, to consumers in the form of higher prices, or to the owners of some or all capital in the form of lower real rates of return.
In September 2012, we updated the assumptions we use to distribute the corporate income tax: we now estimate that 60 percent is borne by shareholders, 20 percent by all capital owners, and 20 percent by labor. Based on our review of research on the issue, we do not assign any of the burden to consumers. Previously, we assumed that the entire burden fell on all owners of capital. Our current assumptions are similar to those now made by CBO, Treasury, and JCT.
We rely on CBO for our projections of baseline corporate tax liability and, when available, on JCT estimates of changes in corporate tax liability that would result from tax proposals.
D. Estate Tax
Our modeling of the estate tax begins with our SCF-based wealth imputations, which we adjust using SOI data so that they align more closely with the assets and liabilities actually reported on estate tax returns. We then assign values for most estate tax deductions and credits based on averages calculated from SOI estate tax data. Our estate tax calculator then determines potential estate tax liability for each record in the database, based on the values for gross estate, deductions, and credits and the relevant estate tax rates and brackets. Finally we multiply the calculated tax liabilities by age-specific mortality rates to estimate each record's expected value of gross estate and net estate tax liability. We employ a linear programming algorithm to reweight the records to ensure that our baseline estimates of the distribution and aggregate values for gross estate and its components match the most recent published estate tax data from SOI.[ix]
In our distribution tables, we assume the estate tax is borne by decedents, the same assumption that Treasury used in the past when it distributed the burden of estate taxes. Neither CBO nor JCT includes the estate tax in their incidence analyses.
E. Excise Taxes
Beginning in 2015, TPC includes federal excise taxes in its distribution tables. We include all federal excise taxes, the largest of which are those assessed on motor fuels, alcohol, tobacco, air transportation, certain health insurance providers and prescription drug manufacturers, and, effective in 2020, certain high-cost employer-sponsored health insurance plans (the so-called “Cadillac tax”).[x] We also include the excise taxes on individuals without essential health insurance coverage (“individual mandate”) and employers that fail to meet minimum essential coverage (“employer mandate”) associated with the Affordable Care Act.
We rely on CBO for our projections of baseline excise tax revenues and assume excise taxes are borne by individuals following the methodology of Toder, Nunns, and Rosenberg (2011). That is, we assume excise taxes lower real incomes in proportion to each tax unit’s share of labor income plus the portion of capital income that exceeds the normal rate of return. In addition, we assume that excise taxes paid or passed through to the retail level change the relative prices consumers face (i.e., raise the cost of taxed goods and services relative to others). We assign this burden to tax units based on our consumption imputations from the CEX. The exception to this methodology is that we estimate three of the health insurance related excise taxes—the individual mandate, employer mandate, and the tax on high-cost employer plans—using the TPC model’s health module. We assume the burden of these taxes is borne by the individual and/or employee.
F. Income Classifier
In 2013, TPC developed an income concept called “expanded cash income” (ECI) for the purpose of distributional analysis. We construct ECI to be a broad measure of pre-tax income, and we use it both to rank tax units in our distribution tables and to calculate effective tax rates. We define ECI to be adjusted gross income (AGI) plus: above-the-line adjustments (e.g., IRA deductions, student loan interest, self-employed health insurance deduction, etc.), employer-paid health insurance and other nontaxable fringe benefits, employee and employer contributions to tax-deferred retirement savings plans, tax-exempt interest, nontaxable Social Security benefits, nontaxable pension and retirement income, accruals within defined benefit pension plans, inside buildup within defined contribution retirement accounts, cash and cash-like (e.g., SNAP) transfer income, employer’s share of payroll taxes, and imputed corporate income tax liability.[xi]
IV. Estimating Revenue, Distributional, and Incentive Effects of Tax Proposals
We use the TPC tax model to estimate the revenue, distributional, and incentive effects of tax policy proposals. We measure the incentive effects of a policy proposal by calculating the effective marginal individual income tax rate on various forms of income.
A. Revenue Estimates
TPC incorporates several forms of microdynamic behavioral responses in its revenue estimates.[xii] First, we assume that reported taxable income on individual tax returns responds to changes in the statutory marginal income tax rate. Based on estimates in the academic literature, we generally assume an elasticity of taxable income (with respect to the net of tax rate) equal to 0.25.[xiii] For proposals that expand the tax base significantly—such as proposals that repeal, or significantly limit, itemized deductions—we adjust the elasticity downward. Second, we assume that sales of capital assets respond to changes in the tax rate on capital gains. For long-term capital gains realizations, our elasticity varies with the tax rate and is approximately -0.7 at a tax rate of 20 percent. We use a higher elasticity for the first two years after a change in the capital gains rate; the short-term elasticity is approximately -1.1 at a tax rate of 20 percent. These elasticities match those that JCT describes in an early publication outlining its estimating methodology.[xiv] Although JCT has not published the specific taxable income or capital gains elasticities that it now uses, TPC's behavioral assumptions appear broadly similar to those that JCT currently uses.[xv] In the case of certain policy proposals, different behavioral assumptions would be a source of difference between TPC and JCT revenue estimates.
B. Distribution Estimates
There is no perfect measure of the distributional impact of a tax policy change. The TPC model, therefore, calculates several different distributional measures. Of all the metrics we calculate and report in our standard distribution tables, we believe the most informative may be the percentage change in after-tax income. A tax cut that gives everyone the same percentage increase in after-tax income leaves the relative distribution of after-tax income unchanged. A tax cut that increases after-tax income proportionately more for lower- than for higher-income taxpayers will make the tax system more progressive (or less regressive). One that increases after-tax income more for higher-income taxpayers than for lower-income taxpayers will make the tax system less progressive (or more regressive). Our distribution tables also show the share of the total tax change, the average size of the tax change in dollars and as a percentage of tax paid, and the average tax rate before and after incorporating the proposal.[xvi]
Most TPC distribution tables include federal individual and corporate income taxes, payroll taxes for Social Security and Medicare, federal excise taxes, and the estate tax. TPC also has the capability to include the distributional impact of broad-based consumption taxes such as a value added tax (VAT). Note, however, that the distribution tables produced before June 2015 did not include excise taxes and those produced before March 2004 generally included only the individual income tax.
By convention, TPC distributes only the static impacts of tax changes. The issue of including behavioral responses to tax changes is particularly important when dealing with changes to tax rates on realized capital gains. A reduction in the marginal rate on capital gains causes increased realizations and could lead to an increase in taxes paid. But higher realizations and the consequent increase in taxes paid are voluntary and therefore do not indicate an actual increase in tax burden—investors would not have realized the gains if doing so made them worse off. Because of this, TPC distributes only the change in taxes paid on the realizations that would have occurred in the absence of the rate change.
TPC's distribution tables do allow for what tax economists refer to as "tax-form behavior." For example, a proposal to repeal certain itemized deductions could cause taxpayers who were itemizing under current law to take the standard deduction instead. We would include the impact of such a switch in our distributional analysis.
C. Effective Marginal Tax Rate Estimates
A taxpayer’s effective marginal tax rate (EMTR) is the percentage of an additional dollar of income that he or she would pay in tax. Individuals might alter their behavior in response to changes in their EMTR because marginal tax rates measure the additional taxes or benefits of working, saving, engaging in tax avoidance, and realizing capital gains. A higher EMTR on wages reduces the after-tax reward for working more hours and therefore might encourage people to work less. It also raises the reward for engaging in tax avoidance, such as the restructuring of compensation packages away from taxable wages and salaries and into untaxed fringe benefits. Both the reduction in hours worked and additional tax avoidance resulting from a higher EMTR on wages would reduce taxable income and government revenues, and have the potential to reduce economic output. A higher EMTR on capital gains could discourage individuals from selling assets, possibly reducing market liquidity (the “lock-in” effect) and reducing economic output if capital is allocated less efficiently.
We typically use the TPC tax model to calculate the EMTR on wages and salaries as well as several forms of capital income (realized capital gains, interest income, and qualified dividends). We generally restrict our analysis to the effective marginal individual income tax rate. For the households in the tax model database, we determine the EMTR on an income source by first calculating the household’s individual income tax based on the household’s actual income. We then add $1,000 to the income source (for example, wages and salaries) and recalculate the household’s individual income tax liability. We calculate the effective marginal tax rate to be the resulting change in tax divided by the $1,000 increase in income. These estimates are static in the sense that we do not allow the higher wages to affect any other form of reported income or deduction. When we calculate the average effective marginal tax rate across income classes, we weight each household’s EMTR by the original amount of the income source that the household reported.
[i] We choose to use the 2006 PUF because it is more representative of a steady-state economy than the “boom” year of 2007 or the recession year of 2008.
[ii] The information for 2011 was the latest available data at the time we began the model update process.
[iii] The SCF specifically omits data on the Forbes 400. We need to add them to the file to account for the substantial share of assets that they own. For 2011, we add approximately $1.7 trillion in net worth to the $90 trillion implied by the SCF.
[iv] The NPSAS is produced by the National Center for Education Statistics.
[v] The Kaiser/HRET annual survey of employer sponsored health benefits is sponsored by the Kaiser Family Foundation and Health Research & Educational Trust.
[vi] TRIM3 is maintained and developed by the Urban Institute, under primary funding from the Department of Health and Human Services, Office of the Assistant Secretary for Planning and Evaluation (HHS/ASPE). Information presented here is derived in part from the Transfer Income Model, Version 3 (TRIM3) and associated databases. TRIM3 requires users to input assumptions and/or interpretations about economic behavior and the rules governing federal programs. Therefore, the conclusions presented here are attributable only to the authors of this report.
[vii]For a detailed description of the projection methods employed by DYNASIM, see Smith, Karen E. (2012). “Projection Methods Used in the Dynamic Simulation of Income Model (DYNASIM3).”
[viii] EGTRRA is the Economic Growth and Tax Relief Reconciliation Act of 2001; JGTRRA is the Jobs and Growth Tax Relief Reconciliation Act of 2003.
[vix] For a detailed description of TPC's estate tax methodology, see Burman, Lim, and Rohaly (2008). "Back from the Grave: Revenue and Distributional Effects of Reforming the Federal Estate Tax."
[x] Our model incorporates the provisions in the Consolidated Appropriations Act of 2016 that delayed the Cadillac tax for two years and made the tax deductible for employers.
[xi] For further information about ECI, see "Income Measure Used in Distributional Analyses by the Tax Policy Center."
[xii] Prior to 2008, almost all TPC revenue estimates showed only the static impact on tax liability.
[xiii] For a summary of the academic literature, see Emmanuel Saez, Joel Slemrod, and Seth H. Giertz, “The Elasticity of Taxable Income with Respect to Marginal Tax Rates: A Critical Review,” Journal of Economic Literature, 50 (1): 3-50, 2012.
[xiv] See “Explanation Of Methodology Used To Estimate Proposals Affecting The Taxation Of Income From Capital Gains.” available at https://www.jct.gov/publications.html?func=startdown&id=3157.
[xv] For a detailed description of JCT’s modeling methodology, see “Estimating Changes in the Federal Individual Income Tax: Description Of The Individual Tax Model”, available at https://www.jct.gov/publications.html?func=startdown&id=4776. JCT states that they assume the taxable income elasticity varies with income but they do not report their actual elasticities.