Sunday, January 8, 2017

Pareto, Taleb and the tails of income distributions

I am reading Nassim Taleb’s Antifragile and then I went back to rereading parts of his extraordinary Black Swan. The Black Swan’s blurb by Daniel Kahneman, “The Black Swan changed my view of how the world works” is fully justified. It will remain one of absolutely indispensable books, a huge epistemological advance. Antifragile is, perhaps, an even more ambitious book because it aims to make systems (including people) antifragile, that is thriving in conditions of (what Taleb calls) “opaque randomness”. So, it is broader in scope and has a  prescriptive part that The Black Swan  does not.
Here I would like to address one of the two themes of Taleb’s that find immediate resonance among people who work on income inequality and globalization: the former one. I  leave globalization for another post.
Taleb’s contribution here is absolutely crucial. Distributions that have long right-end tails, as distributions of income, and especially wealth, do, have both their mean values and inequalities heavily dependent on extremes. Moreover for the extreme events, as Taleb keeps on writing, standard deviations are all but irrelevant. So the distributions cannot be fully described by the mean and variance as we generally tend to do in inequality work. (If the variance, as Taleb writes, reflects lack of knowledge about the mean, then irrelevance of the variance implies lack of knowledge about the lack of knowledge, Rumsfeld’s “unknown unknown”.) 
Statisticians who work on inequality have long known, heuristically, how much our results, from Gini to Theil to income shares, depend on the extreme values, and they tried to “solve” this by getting rid of extremes by truncating the values above some maximum.  This so-called “top coding” was done until a few years ago by the US Census Bureau in its Current Population Surveys, the prime source of income distribution data for the United States, and a methodological primer for many other countries. Since household surveys are random samples, the idea was not to let extreme values that may be sampled in one year, but  not in many others, “unreasonably” affect both the mean and inequality statistics. For example, if you surveyed Bill Gates this year but never before and never again (which is  very likely since people with his wealth are extremely rare), this year’s US mean income and  inequality will turn out extremely high. You would then have hundreds of research papers and doctoral dissertations written about what special economic policy made US inequality shoot up while, of course, the true reason was the sampling.
Nassim is a big fan of power laws that apply to the distributions after a certain point (“crossover point”). This too has been used in practice by combining lognormal distributions that would hold up to a certain (high) income level with a Pareto, power law, distribution that would hold afterwards. (A distribution follows the Pareto law if for every x percentage increase in income, there is αx decrease in the percentage of recipients of such high incomes. α is the Pareto “constant”.) Alternatively, some people (most notably Viktor Yakovenko here) have combined exponential and Pareto distributions: the first would be a distribution that applies to wage earners (the bottom 95% of the population), the second, to capitalists (the top 5%). Anwar Shaikh in his monumental Capitalism used that “combined” distribution to discuss the relationship between income inequality and financial crises.
But where I part ways with Nassim is on the constancy of the α, which is implied in Mandelbrot’s fractile approach on which Nassim builds. I am not writing here about Pareto’s idea of constancy of α across places and time. This has been disproved enough (to see how empirical distributions look, pl. check my earlier post on Pareto, here). I am writing about the empirical fact that when we look at income distributions, depending on what part of the top we consider, α changes. I was interested in this several years ago, did quite a number of runs on empirical distributions, but never followed this up. What I did was to plot a relationship ln H(y) = A –α ln y where H is the inverse cumulative distribution, y = income, α = Pareto “constant” or “guillotine” (because it “cuts” the number of recipients as income is raised), over gradually smaller parts of the top: first for the top 20%, then for the top 19%, then for the top 18% etc. all the way to the top 1%.  I do not expect that the guillotine would stay equally steep.
In some cases the slope of the line may be negative (= α increases in absolute terms) if the distributions near the very top tend to be more “rarefied” than the distributions across the entire top 20%.  As shown in Figure below, this is the case of the United States. If you have only 100  people with incomes above several hundred million dollars, then increasing the threshold by (say) 10%, would make perhaps 20 or 30 of them fall below the new threshold. In the middle, very “dense”, part of income distribution, you can increase the threshold by 10% and very few people, percentage-wise, will drop out: perhaps only 1%. So, around the top, the guillotine would be -2 or -3, in the middle it would be -0.1.

But in the cases of Germany and the UK, whose distributions look almost identical except for the top 1%, the Pareto guillotine becomes less in absolute amount as we approach the top. The distribution around the top is more dense.      

In other cases, like Egypt, shown in the graph below with Italy and the same data for Germany, the Pareto constant is lower (in absolute amounts) almost throughout the entire top 20%. Egypt’s distribution is “denser” than German and Italian (almost) throughout. Weirdly, Italy comes closest to what Pareto would have believed.

The key point is that the power law works with unequal intensity even over the portion of the distribution where we believe it should apply (that is, above the “crossover point”). Thus in the equation above, we should write α(y) rather than a plain α.
The difference in the levels of the guillotines between Egypt and Germany/Italy illustrates another point: less sharp guillotines (if they were to hold throughout the entire distribution) like Egyptian will be associated with higher overall inequality because they imply thicker tails. Sharper guillotines imply less thickness in the tails and thus less inequality. This is expressed in the fact that Gini of a Pareto distribution is equal to 1/(2α-1) and that as α in absolute amount increases (i.e., distribution gets rarified), Gini goes down.
Where does it leave us? At a modified fractal approach: as we slice income and wealth distributions further and further, the same phenomenon does not repeat itself  with equal intensity but, depending on the distribution, with greater or lesser intensity. The Pareto constant varies so much that one wonders how the term “constant” can be applied to it at all.      

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.