[This was originally posted on May 23, 2014 on a different Website and is being reposted here]
I think the FT case is blown out of proportion. It is well-known that wealth data are uncertain. I for one do not know where Piketty's wealth data come from and I am sure very few people do. There is also a myriad choices you have to make re. wealth estimations (e.g. capitalization or not; forward-looking or backward-looking) which you do not have to do when you use income.
(Although there are there, that is re. income too, many issues and many choices. If one were to go through my data, point by point, he could also detect a number of problems or inconsistencies: treatment of zero and negative incomes, imputation for housing, imputation of home consumption, what prices do you use for home consumption, how to get correct self-employment income etc etc. And many of these decisions vary from survey to survey and are not well documented, or the documentation is so immense that you cannot go through it or figure it out.)
The situation with wealth data --that much I know-- is much worse. I was a referee twice for Davies et al. global wealth inequality papers: there were many assumptions used in their papers, and there are even many more things you have no idea about, e.g. how is wealth defined in India, who is covered or not, how reliable it is, what prices are used etc. You just have to accept the numbers they (Indian statistical office or Davies et al) come up with. People may not realize that behind one such summary number there are 1000s of household-level data or even hundreds of thousands and no one can go through hundreds of surveys and 1000s of individual data to verify them all.
And if you create (as Piketty did) bunch of data for a bunch of countries, there are bound to be issues. The question is, was there intentional data manipulation to get the answer one desires. I do not know it but it strikes me as unlikely that if one wanted to do it, he would have posted all the data, complete with formulas, on the Internet. And Thomas's data are not there since the book was published but were there for months or even years.
Now. consider FT points one by one:
"One apparent example of straightforward transcription error in Prof Piketty’s spreadsheet is the Swedish entry for 1920. The economist appears to have incorrectly copied the data from the 1908 line in the original source."
Okay, quite likely. When you transcribe hundreds of data, transcribing some wrongly is very likely. They give only one example. Are there more?
"A second class (sic!) of problems relates to unexplained alterations of the original source data. Prof Piketty adjusts his own French data on wealth inequality at death to obtain inequality among the living. However, he used a larger adjustment scale for 1910 than for all the other years, without explaining why."
Piketty has to explain why he used a a different adjustment scale. Let's wait to hear from him."In the UK data, instead of using his source for the wealth of the top 10 per cent population during the 19th century, Prof Piketty inexplicably adds 26 percentage points to the wealth share of the top 1 per cent for 1870 and 28 percentage points for 1810."
Same thing."A third problem is that when averaging different countries to estimate wealth in Europe, Prof Piketty gives the same weight to Sweden as to France and the UK – even though it only has one-seventh of the population."
This is neither here nor there. Perhaps the weights should be country wealth shares, not population shares. At times, you want to have unweighted averages and at times population- or income- or wealth-weighted. The question is whether one or another averaging makes more sense for the issue at hand and whether you stick to whatever you have chosen.
"There are also inconsistencies with the years chosen for comparison. For Sweden, the academic uses data from 2004 to represent those from 2000, even though the source data itself includes an estimate for 2000."
I do not understand this well. I have sometimes used (say) a 2003 survey to stand for the benchmark year 2000, sometimes for the benchmark year 2005. It just depends for what countries you have what data and also when. My data for (say) benchmark year 2011 improve as time goes by and I get more countries and more recent surveys. So if you compare my global inequality estimate for a given year in the first draft of the paper and in the final version, they would often differ a bit.
In conclusion, the only real issue is why Piketty adjusted the data for several years differently, whether it is explained in the files, whether that explanation is reasonable, and if it is not explained, whether he can provide one. Out of the three "classes" of issues raised by FT, only the second has some validity. So far.