Should we change the way we collect inequality
data and the way we measure inequality? For indeed we do not collect the data
and use methodologies independently of what we are interested in and
what our view of the world is.
To make this
understandable let me explain the three ways that have historically characterized collection of income distribution data.
They emphasized alternatively (1) horizontal inequality, (2) the middle class and the distribution
of all incomes but the top, and (3) the top of income distribution.
The first
approach (horizontal inequality) consists of being interested in
the mean incomes of various groups, checking how they differ and how they
evolve in time. Distributions around these means are considered of secondary importance:
they are collected but not much worried about. Historically the first approach originated in
the Soviet Union where household surveys, conducted by the state at regular
intervals, began in the early 1920s. The Soviet concern was not with distributions
but with how a “typical”, “average” worker was doing, compared to an
“average” peasant, and later to an “average” collective farmer. So, the focus of
surveys was on the “averages”: worker of an average skill, with a partner
(wife) of equally average (“normal”, “usual”) skill, with kids of “typical”
ages etc.
Obviously, the
data were not collected only for such average households but also for those
that were a bit less common. Yet, the surveys were uninterested in the extremes:
neither poor, nor atypical nor rich were included. This had led, in all socialist
countries, by the 1960s to the plethora of (published) data contrasting mean
incomes of workers vs. farmers’ households or of the employees vs. pensioners.
But such approach left both ends of the distribution (the top and the bottom) under-represented
and the distribution itself truncated. It was not very useful for inequality nor
for poverty statistics.
Before you
think how limited this approach was, think twice. Because it is exactly the
same approach that is today urged by many who care only (or almost only) about gender
or racial or other “type-against-type” inequality. They focus on
whether women on average are paid less than men, and while this concern is, like the old-fashioned Soviet concern of how workers are doing compared to
farmers, legitimate, it leaves the entire distribution out.
I have criticized this approach in my recent bookby pointing out that if we were to equalize the means that
would still leave unfinished the job of very inequality that could remain among women or men. The distributions can be vastly unequal while their
means are the same. Thus getting the means equal is only the beginning.
Enter the
second approach that characterized collection of income distribution statistics
since many countries started collecting standardized data in the 1950s and
1960s. We are concerned here with entire distributions, with the poor, the less
poor, the middle class, the upper middle class etc. But we are mostly concerned
with the “dominant”, large groups, the middle classes and not much with the top
of income distribution. This for two reasons.
The first is confidentiality. Since
the time when many Western countries started allowing access to micro data, their statistical offices
were concerned that very rich people, who are few in numbers, could be identified if researchers had access to their age, education, number of children and place of residence. Thus anonymity, in principle guaranteed
by the surveys, and on which surveys depend if they wish to ensure people’s
participation, would be severely compromised.Accordingly, the rich were undersurveyed.
The second reason is that extremely rich
people are very rare, and if they (one or two of them) happened to be included
in this year’s survey (remember, surveys are samples), it could push inequality
statistics unusually high, and make the results look out of line with historical data. When
a researcher then studies inequality, or newspapers publish the results, it
would appear that inequality went up for some fundamental reason while it
happened simply because a few rich people were included in the sample. Many statistical
offices thus decided to censor the top of the income distribution by using the so-called
“top coding” which sets the maximum incomes that could be reported either by
category or in total. So, if you for example told the enumerator that your capital income was
$5 million, and the top code for that category was $1 million, the survey would
register your income as $1 million.
Here we come
to the third approach. As the rich economies have become more unequal, and the
gap between the top of the income distribution and everybody else had grown,
the popular as well as research interest has shifted toward the top. Notice the progression which responds to the progression in
societal interest: from how is a typical worker is doing compared to a typical farmer,
to how unequal is a society and how large is the middle class, to how rich are
the top 1%.
The use of fiscal data, popularized
by Piketty first on the French data, and later by others, responded precisely
to that interest (or perhaps contributed to create such interest). Even the statistical measures used changed: instead of
an overall distributional statistic like the Gini coefficient, the focus was on top income shares. The fiscal data indeed give a better picture of top
incomes than household surveys. IRS in the sample it gives to researchers nevertheless
still does some intentional “blurring” at the top, but we surely have much better data on household pre-tax income of the top 1% than
with household surveys. (For a recent comparison of US survey and tax data, see here.)
Still, there are at least two
problems. First, the rich especially, but everybody else as well, have a clear interest
in minimizing their incomes to reduce taxes they pay. Second, the rich engage,
as we have seen in the Panama Papers, in massive schemes to hide their assets
and income. Thus, despite our best
efforts to uncover the full extent of top incomes we are only at the beginning
of a long road.
So it is perhaps the right time to
think how fiscal data should be improved, how fiscal and household survey should be made more compatible, and most ambitiously, whether better administrative
data (like the world register of wealth proposed by Piketty and Zucman) should be created, both to tax wealth and to
combat fiscal evasion. We are already moving to the next stage of methodological
development where the concern with incomes of the rich, partly because they
have become so much richer than the others, partly because they wield huge political
power, and partly because they are hiding their assets, may take center stage.