Saturday, April 23, 2016

Should we reconsider how we collect income distribution data?

Should we change the way we collect inequality data and the way we measure inequality? For indeed we do not collect the data and use methodologies independently of what we are interested in and what our view of the world is.
            To make this understandable let me explain the three ways that have historically characterized collection of income distribution data. They emphasized alternatively (1) horizontal inequality, (2) the middle class and the distribution of all incomes but the top, and (3) the top of income distribution.
            The first approach (horizontal inequality) consists of being interested in the mean incomes of various groups, checking how they differ and how they evolve in time. Distributions around these means are considered of secondary importance: they are collected but not much worried about.  Historically the first approach originated in the Soviet Union where household surveys, conducted by the state at regular intervals, began in the early 1920s. The Soviet concern was not with distributions but with how a “typical”, “average” worker was doing, compared to an “average” peasant, and later to an “average” collective farmer. So, the focus of surveys was on the “averages”: worker of an average skill, with a partner (wife) of equally average (“normal”, “usual”) skill, with kids of “typical” ages etc.
            Obviously, the data were not collected only for such average households but also for those that were a bit less common. Yet, the surveys were uninterested in the extremes: neither poor, nor atypical nor rich were included. This had led, in all socialist countries, by the 1960s to the plethora of (published) data contrasting mean incomes of workers vs. farmers’ households or of the employees vs. pensioners. But such approach left both ends of the distribution (the top and the bottom) under-represented and the distribution itself truncated. It was not very useful for inequality nor for poverty statistics.
            Before you think how limited this approach was, think twice. Because it is exactly the same approach that is today urged by many who care only (or almost only) about gender or racial or other “type-against-type” inequality. They focus on whether women on average are paid less than men, and while this concern is, like the old-fashioned Soviet concern of how workers are doing compared to farmers, legitimate, it leaves the entire distribution out.
I have criticized this approach in my recent bookby pointing out that if we were to equalize the means that would still leave unfinished the job of very inequality that could remain among women or men. The distributions can be vastly unequal while their means are the same. Thus getting the means equal is only the beginning.
            Enter the second approach that characterized collection of income distribution statistics since many countries started collecting standardized data in the 1950s and 1960s. We are concerned here with entire distributions, with the poor, the less poor, the middle class, the upper middle class etc. But we are mostly concerned with the “dominant”, large groups, the middle classes and not much with the top of income distribution. This for two reasons.  

The first is confidentiality. Since the time when many Western countries started allowing access to micro data, their statistical offices were concerned that very rich people, who are few in numbers, could be  identified if researchers had access to their age, education,  number of children and place of residence. Thus anonymity, in principle guaranteed by the surveys, and on which surveys depend if they wish to ensure people’s participation, would be severely compromised.Accordingly, the rich were undersurveyed.
The second reason is that extremely rich people are very rare, and if they (one or two of them) happened to be included in this year’s survey (remember, surveys are samples), it could push inequality statistics unusually high, and make the results look out of line with historical data. When a researcher then studies inequality, or newspapers publish the results, it would appear that inequality went up for some fundamental reason while it happened simply because a few rich people were included in the sample. Many statistical offices thus decided to censor the top of the income distribution by using the so-called “top coding” which sets the maximum incomes that could be reported either by category or in total. So, if you for example told the enumerator that your capital income was $5 million, and the top code for that category was $1 million, the survey would register your income as $1 million.  
            Here we come to the third approach. As the rich economies have become more unequal, and the gap between the top of the income distribution and everybody else had grown, the popular as well as research interest has shifted toward the top. Notice the progression which responds to the progression in societal interest: from how is a typical worker is doing compared to a typical farmer, to how unequal is a society and how large is the middle class, to how rich are the top 1%.
The use of fiscal data, popularized by Piketty first on the French data, and later by others, responded precisely to that interest (or perhaps contributed to create such interest).  Even the statistical measures used changed: instead of an overall distributional statistic like the Gini coefficient, the focus was on top income shares. The fiscal data indeed give a better picture of top incomes than household surveys. IRS in the sample it gives to researchers nevertheless still does some intentional “blurring” at the top, but we surely have much better data on household pre-tax income of the top 1% than with household surveys. (For a recent comparison of US survey and tax data, see here.)
Still, there are at least two problems. First, the rich especially, but everybody else as well, have a clear interest in minimizing their incomes to reduce taxes they pay. Second, the rich engage, as we have seen in the Panama Papers, in massive schemes to hide their assets and income.  Thus, despite our best efforts to uncover the full extent of top incomes we are only at the beginning of a long road.
So it is perhaps the right time to think how fiscal data should be improved, how fiscal and household survey should be made more compatible, and most ambitiously, whether better administrative data (like the world register of wealth proposed by Piketty and Zucman)  should be created, both to tax wealth and to combat fiscal evasion. We are already moving to the next stage of methodological development where the concern with incomes of the rich, partly because they have become so much richer than the others, partly because they wield huge political power, and partly because they are hiding their assets, may take center stage.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.