TPC's Microsimulation Model FAQ
Answers to common questions about TPC's tax simulation model, the estimates it generates, and the organization of tables in our database.
What is the TPC model's primary data source?
The TPC microsimulation model’s primary data source is the 2006 public-use file (PUF) produced by the Statistics of Income (SOI) Division of the Internal Revenue Service (IRS). The 2006 PUF contains 145,858 records with detailed information from federal individual income tax returns from tax years 2003 to 2006 filed in the calendar year 2007.
Beginning with the 2006 data, we employ a two-step process to create a file that is nationally and state representative of the tax filing population for the 2011 tax year, the latest year for which information was available at the time we began our most recent comprehensive model update. In the first step of this process, we use published tax data to calculate per-return average growth rates for income, deductions, and other items between 2006 and 2011 by adjusted gross income (AGI) class. We then use these growth rates to adjust the dollar amounts on each PUF record.
In the second step of the process, we use a constrained optimization algorithm to reweight the records in order to match an extensive set of about 100 national targets and 39 to 51 state targets for both return counts and dollar amounts. We refer to the resulting file as the 2011 “Look-Alike Public Use File” or LAPUF.
What other data sources do you use in the TPC tax model?
We use cross-tabulations of age, filing status, and income sources provided to us by SOI and implement a raking algorithm to impute the ages of taxpayers and their dependents onto the 2006 PUF. We add information on other demographic characteristics and sources of income that are not reported on tax returns through a constrained statistical match with 2011 data from the US Census Bureau’s March 2012 Current Population Survey (CPS).
Because the income tax data in our model contain no direct information about wealth holdings, we rely on information from the 2016 Survey of Consumer Finances (SCF) to develop imputations for 18 categories of assets and debt. In certain cases, such as farm assets and debt, we rely on alternative data sources such as the Department of Agriculture. Because the SCF, by design, excludes the 400 wealthiest individuals – the “Forbes 400” – we impute their wealth using published information for the 2016 Forbes 400.
Our education module relies on data from the 2011-12 National Postsecondary Student Aid Study (NPSAS) as well as administrative data on Pell Grants. We impute various categories of consumption to our tax model households using data from the Consumer Expenditure Survey (CEX). We also use the 2015, 2017, and 2018 Kaiser/HRET employer surveys in our imputations of employer-sponsored health insurance premiums and certain other health-related variables.
In addition, we impute a comprehensive set of pension and retirement savings variables to each tax unit in the tax model database, relying on information from the SCF to impute pension characteristics as well as pension and IRA asset balances, and we use SOI data to impute IRA characteristics. We supplement and calibrate these retirement imputations based on information from the Bureau of Economic Analysis, Bureau of Labor Statistics, Census Bureau, Congressional Budget Office, Department of Defense, Department of Treasury’s Office of Tax Analysis, the Federal Reserve Board, the Internal Revenue Service’s Statistics of Income program, Joint Committee on Taxation, Office of Personnel Management, and Thrift Savings Fund.
The model's primary data source is a data file from 2006 that you have adjusted to be consistent with 2011 tax data. How does TPC model the impact of policies in years after 2011?
For the years from 2012 to 2031, we age the data based on:
- CBO forecasts and projections for growth of various types of income;
- CBO and JCT baseline revenue projections;
- IRS estimates of future growth in the number of tax returns;
- CBO and JCT estimates of the distribution of tax units by income;
- Census data on the size and age-composition of the population;
- Department of Education projections for growth in tuition and the number of post-secondary students;
- and CBO projections for growth in health care costs and personal consumption expenditures.
A two-step process produces a representative sample of the filing and non-filing populations in years beyond 2011. We first inflate the dollar amounts of income, adjustments, deductions, and credits on each record by their appropriate forecasted per capita growth rates. We use CBO’s forecast for per capita growth of each major income source, such as wages, capital gains, and other forms of non-wage income (interest and dividends, business income, taxable pensions, Social Security benefits, and others). We assume that most other items grow at CBO’s projected growth rate for per capita personal income.
In the second stage of the extrapolation, we use a linear programming algorithm to adjust the weights on each record so that the major income items, adjustments, and deductions match aggregate targets. We also attempt to adjust the overall distribution of adjusted gross income (AGI) to match published information from the Statistics of Income (SOI) division of the Internal Revenue Service (IRS) for 2012 through 2018, and projections from CBO for years from 2019 through 2031.
We use a similar two-stage technique in the long-run module to age the data for each ten-year increment between 2040 and 2090. For 2040 and beyond, we rely primarily on projections from CBO and from DYNASIM3, a long-run dynamic simulation model maintained by The Urban Institute’s Income and Benefits Policy Center.
Does the TPC model include the effects on the entire population or just those who file federal individual income tax returns?
TPC's distribution tables show the impact of policies on the entire population, both those who file federal individual income tax returns and "non-filers." After a constrained statistical match between the LAPUF and the CPS, some low-income CPS records remain that are not matched to PUF records. We use these records to create a sample of non-filers.
By combining the dataset of filers from the LAPUF (augmented by demographic and other information from the CPS) with the dataset of non-filers generated by the statistical match with the CPS, we are able to carry out distribution analysis on the entire population rather than just the subset of the population that files individual income tax returns. Including the non-filing population is required for analyses of tax proposals that might affect members of that group.
Does the TPC model produce analysis at the state level?
TPC can produce distributional estimates of federal tax changes by state, including the distributional effects by income groups within a state. Because the IRS does not release a microdata file that is representative at the state level, TPC creates state weights for the tax model database for each state and the District of Columbia using a method that guarantees each state’s weighted totals of chosen observed characteristics match state targets.
For years in which published IRS tax return data are available by state, targets come from these publications. When IRS data are not available by state, targets are detailed projections based on available IRS data and macroeconomic assumptions. Because of data limitations, however, TPC cannot model effects at the state level for all tax proposals.
How does TPC decide which proposals and policy options to simulate?
TPC’s goal is to analyze all major individual income tax bills, particularly those that are considered by the tax-writing committees as well as other proposals that are of particular interest or garner substantial attention. During election years, TPC strives to work with representatives from the major campaigns to obtain enough information about the candidates' proposals in order to analyze their impact. Occasionally, TPC provides estimates to outside individuals or groups. Finally, TPC scholars use the model to pursue their independent research.
TPC's distribution tables show the impact on "tax units." What is a tax unit? Is it the same as a family or a household?
A tax unit is an individual, or a married couple, that files a tax return or would file a tax return if their income were high enough, along with all dependents of that individual or married couple. A tax unit is therefore different than a family or a household in certain situations.
For example, a cohabiting couple constitutes one household but if the individuals are not legally married, they would file separate tax returns and thus be considered two tax units. A family could consist of a married couple and the wife's elderly mother who lives with them. That family would be considered two tax units since, if the elderly mother had a large enough income, she would be required to file a federal income tax return on her own.
In general, the number of tax units tends to be larger than the number of families or households reported elsewhere.
What is TPC's preferred measure of the distributional impact of a tax proposal?
There is no perfect measure of distributional impact. TPC therefore reports several different measures in its distributional tables. The most informative may be the percentage change in after-tax income. A tax cut that gives everyone the same percentage increase in after-tax income leaves the relative distribution of after-tax income unchanged. A tax cut that increases after-tax income proportionately more for lower- than for higher-income taxpayers will make the tax system more progressive (or less regressive). One that increases after-tax income more for higher-income taxpayers than for lower-income taxpayers will make the tax system less progressive (or more regressive). Tables also show the share of the total tax change, the average size of the tax change in dollars and as a percentage of tax paid, and the average tax rate before and after incorporating the proposal. See Measuring the Distribution of Tax Changes for more information.
What taxes are included in TPC's distribution tables?
Most TPC tables include federal individual and corporate income taxes, payroll taxes for Social Security and Medicare, federal excise taxes, and the estate tax. TPC also has the capability to include the distributional impact of broad-based consumption taxes such as a value added tax (VAT). Note, however, that the distribution tables produced before June 2015 did not include excise taxes and those produced before March 2004 generally included only the individual income tax.
What incidence assumptions does TPC use in its distribution tables?
A key insight from economics is that taxes are not always borne by the individual or business writing the check to the IRS. Sometimes taxes are shifted. For example, most economists believe that the employer portion of payroll taxes translates into lower wages and is thus ultimately borne by workers. There is less agreement, however, on the economic incidence of other taxes, such as the corporate income tax.
The Tax Policy Center's incidence assumptions generally follow those adopted by the Congressional Budget Office and the Department of the Treasury. In particular, our tables assume that: 1) the individual income tax is borne directly by individual income taxpayers; 2) both the employee and employer shares of payroll taxes are borne by the employee; and 3) the estate tax is borne by decedents (as opposed to heirs).
In September 2012, we implemented a new methodology for distributing the corporate income tax. We now assume that 60 percent of the tax is borne by corporate shareholders in proportion to their shares of reported dividends and the portion of capital gains attributable to corporate equity; 20 percent is borne by all capital owners in proportion to their shares of capital income (interest, dividends, capital gains, a portion of flow-through business income, and investment returns earned within tax-preferred retirement accounts and defined benefit pension plans); and 20 percent is borne by labor in proportion to their shares of earnings. We previously assumed that the corporate income tax was borne entirely by owners of capital in proportion to their shares of total reported interest, dividends, capital gains, and rents. See How TPC Distributes the Corporate Income Tax for more information.
We assume excise taxes lower real incomes in proportion to each tax unit’s share of burdened income sources. Burdened income sources include labor income, plus the portion of capital income that exceeds the normal rate of return, and wage-indexed cash transfer payments. In addition, we assume that excise taxes paid or passed through to the retail level change the relative prices consumers face (i.e., raise the cost of taxed goods and services relative to other goods). We assign this burden to tax units based on their consumption as imputed from the CEX. The exception to this methodology is that we assume the burden of the Affordable Care Act health insurance employer mandate falls exclusively on employees of firms offering inadequate health insurance coverage.
Do TPC's revenue estimates include behavioral responses to tax changes?
For almost all tables produced after mid-2008, the answer is yes.
Based on estimates in the academic literature, and JCT’s published methodology, we generally assume the elasticity of taxable income with respect to the net-of-tax rate (ETI) rises with income and equals 0.25 for those in the top 1/10th of one percent of the income distribution. For proposals that expand the tax base significantly—such as proposals that repeal, or significantly limit, itemized deductions—we adjust the elasticity downward. For example, because of the base-broadening in TCJA, we reduce our elasticities by one-fifth so that the ETI for those at the top of the income distribution equals 0.2 for years in which TCJA is in effect.
Second, we assume that sales of capital assets respond to changes in the tax rate on capital gains. For long-term capital gains realizations, our elasticity varies with the tax rate and is approximately -0.7 at a tax rate of 20 percent. We use a higher elasticity for the first two years after a change in the capital gains rate; the short-term elasticity is approximately -1.1 at a tax rate of 20 percent. These elasticities match those that JCT describes in an early publication outlining its estimating methodology. Although JCT has not published the specific taxable income or capital gains elasticities that it now uses, TPC's behavioral assumptions appear broadly similar to those that JCT currently uses. In the case of certain policy proposals, different behavioral assumptions would be a source of difference between TPC and JCT revenue estimates.
Prior to 2008, almost all TPC revenue estimates showed the static impact on tax liability.
Do TPC's distribution tables include behavioral responses to tax changes?
No. By convention, we distribute only the static impacts of tax changes.
Whether or not to include behavioral responses to tax changes is particularly important when dealing with changes to tax rates on realized capital gains. A reduction in the marginal rate on capital gains causes increased realizations and could lead to higher taxes being paid. But higher realizations and the consequent increase in taxes paid are voluntary and therefore do not indicate an actual increase in tax burden—investors would not have realized the gains if doing so made them worse off. Because of this, TPC distributes only the change in taxes paid on the realizations that would have occurred in the absence of the rate change.
TPC's distribution tables do allow for what tax economists refer to as "tax-form behavior." For example, a proposal to repeal certain itemized deductions could cause taxpayers who were itemizing under current law to take the standard deduction instead. Our distributional analysis would include the impact of such a switch.
Are the amounts in TPC tables in current or constant dollars?
To be consistent with revenue estimators at JCT, Treasury, and CBO, TPC reports revenue estimates in current dollars. We also report the average tax cut in our distribution tables in current dollars, unless otherwise noted. The income classifier in distribution tables produced by the latest version of our tax model is in 2020 dollars, however. The use of constant dollars for our classifier ensures that users of our tables can more directly compare the tax units in the $40,000 - $50,000 income range in a 2031 distribution table, for example, to those in that same income class in a 2022 distribution table.
How does TPC define income for distributional analysis?
As of July 2013, TPC classifies tax units by an income concept we call "expanded cash income" (ECI) for the purpose of distributional analysis. We construct ECI to be a broad measure of pre-tax income, and we use it both to rank tax units in our distribution tables and to calculate effective tax rates.
We define ECI as adjusted gross income (AGI) plus: above-the-line adjustments (e.g., IRA deductions, student loan interest, self-employed health insurance deduction, etc.), employer-paid health insurance and other nontaxable fringe benefits, employee and employer contributions to tax-deferred retirement savings plans, tax-exempt interest, nontaxable Social Security benefits, nontaxable pension and retirement income, accruals within defined benefit pension plans, inside buildup within defined contribution retirement accounts, cash and cash-like (e.g., SNAP) transfer income, employer's share of payroll taxes, and imputed corporate income tax liability. See Measuring Income for Distributional Analysis for more detailed information on the definition and construction of ECI.
Most distribution tables TPC produced between 2004 and July 2013 used "cash income" as a classifier. Cash income is equal to ECI minus: employer paid health insurance and other nontaxable fringe benefits, accruals within defined benefit pension plans, inside buildup within defined contribution retirement accounts, and SNAP benefits. Prior to 2004, TPC used AGI as the income classifier in our distribution tables.
TPC reports some distribution tables "by percentile." What are percentiles?
In its distribution tables, TPC groups tax units into categories by either their dollar income (for example, all tax units with incomes between $30,000 and $40,000) or by where their income ranks relative to the income of all other tax units. For example, tax units in the "Top 1 Percent" have incomes that are higher than at least 99 percent of the population.
In the tables that rank tax units by percentiles, we sort the tax units by their income from lowest to highest, determine the number of people in each tax unit, and then split the population into five groups—or "quintiles"— that contain equal numbers of people. Each quintile thus contains 20 percent of the total population. For example, the "Top Quintile" contains the tax units consisting of the 20 percent of the population with the highest incomes. We exclude tax units with negative adjusted gross income or negative expanded cash income from the bottom income class although we do include these tax units in the totals. Prior to May of 2008, we placed equal numbers of tax units, rather than people, in each quintile.
We further subdivide the top income quintile into four groups: those in the 80th to 90th percentile; those in the 90th to 95th percentile; those in the 95th to 99th percentile; and those in the top 1 percent. We separately identify tax units in the top one-tenth of one percent (i.e., the richest 1 in 1,000).
Why are there different numbers of tax units in each quintile?
Quintiles contain equal numbers of people, not equal numbers of tax units. In particular, lower quintiles contain more tax units than higher quintiles because smaller tax units generally have lower income than larger ones and thus are more common in the lower quintiles. For example, our 2022 baseline shows 47.4 million tax units in the lowest quintile, 36.6 million in the middle quintile, and just 24.9 million in the top quintile.
Does TPC adjust the income classifier in its distribution tables for family size?
TPC’s standard tables rank people by the total amount of income in a tax unit, regardless of the number of people the unit represents. We thus place an individual with $50,000 of income in the same income group as, for example, a family of four with income of $50,000.
In the tables in which the classifier is dollar income level (for example, all tax units with incomes between $30,000 and $40,000), we always rank people by total income of the unit and never adjust the classifier for family size. In the tables in which the classifier is income percentile, we produce alternative tables that adjust the classifier for family size.
When TPC adjusts for family size, we use the same methodology as CBO: divide the income of the tax unit by the square root of the number of members of the tax unit. Thus, we classify a four-person family composed of a married couple with two children and income of $100,000 the same as a single individual with no dependents and an income of $50,000 ($100,000 divided by the square root of 4 equals $50,000).
Adjusting incomes for family size moves larger families lower in the income distribution and smaller families up. That movement results in fewer tax units in the lower quintiles and more in the higher quintiles, relative to the distribution based on unadjusted income. Thus, for example, the lowest quintile defined using adjusted family income contains 39.4 million tax units in 2022, compared with 47.4 million when the incomes are not adjusted. In contrast, the top quintile contains 31.4 million tax units based on adjusted income but only 24.9 million tax units when income is not adjusted.
What are the definitions of the tax terms used in the tables?
What is a tax table number?
We assign a tax table number to every TPC estimate. Each number is composed of a T followed by a two-digit year and a four-digit number assigned sequentially through the year. For example, we designated the first table we produced in 2021 to be T21-0001.
How does TPC organize the tables on its website?
We divide tables into several types: by dollar income level, income percentile, and, in certain cases, the size of tax change for a tax unit. The table displayed on our website shows a summary of the distributional effect of the proposal for all tax units. The downloadable Excel and PDF files contain that table as well as tables that provide supplementary distributional information and information about the baseline. The Excel and PDF files also include the distributional impact on subgroups of the population including separate tables for each filing status, for all tax units with children, and for all tax units in which the head (or spouse) is age 65 or over.
In addition to our distribution tables, we post other tables, including those showing the impact of policies on federal tax revenue and on effective marginal tax rates. We organize the tables chronologically and allow users to filter by tax topics (e.g., alternative minimum tax or estate tax), and by the date we produced the table.