Wednesday, December 15, 2010

Hans Rosling's Chart is SERIOUSLY MISLEADING

I guess I am the last person with an internet connection to see this video.

Yes he's an entertaining and enthusiastic guy. Yes, the rise of China and India is amazing and fantastic. Yes there is a large "middle class" of countries.

But people, have you looked at the horizontal axis of his chart? The distance between $400 and $4000 is the same as the distance between $4000 and $40000.

That is incredibly misleading.

Properly plotted on a linear scale, it would be clear that there was way way way LESS income inequality in 1810 or that magic year of 1948 than there is in 2010.

We are NOT living in an "age of convergence" with respect to per-capita incomes.

It is NOT only Sub-Saharan African countries stuck at the bottom.

Latin America is NOT catching up to the USA.

The whole thing is pretty much horse hockey.

Then the press grabs ahold and starts to exaggerate the exaggerations. Here's David Brooks:

"Then, over the last few decades, the social structure of the world changes. The Asian and Latin American countries begin to catch up. With the exception of the African nations, living standards start to converge. Now most countries are clumped toward the top end of the chart..."

Just keep repeating this to yourself people: "The left half of the chart covers a range of $3,600, but the right half of the chart covers a range of $36,000.

Holy Crap!

18 comments:

siredge said...

Generally I live your posts. please keep it up. But I do have a bone to pick with this one.

For this video, the creator selected a logarithmic scale. It makes much more sense than a linear scale when looking at items like median income where the person is looking for how significant the changes are at the margins. A $200 per year raise makes a huge difference when you only make $400 per year, but is barely noticeable at $40k per year. Using this scale makes such changes obvious in ways that a linear scale doesn't, where a $200 change would hardly be noticed at either end of the scale.

siredge said...

*like rather than live. Sorry, it's early yet out west.

David said...

I agree with your point on the relative importance, but not in terms of actual income convergence, which is the inference many people are making and what Angus seems to be responding to.

This hearkens to Ed Leamer's brilliant JEL review of The World is Flat. Figure 7 in particular

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.1226&rep=rep1&type=pdf

Dr. Tufte said...

I noticed this too Angus ...

But I didn't post about it ... and there's a decent reason (actually two - perhaps not good, but merely decent).

1) Income inequality really wasn't the point of the chart. Health outcomes equality is. I see this chart like a comparison of per capita income to per capita consumption: we really shouldn't be worried as much about inequality in the former as in the latter.

2) I think the vertical axis is far more likely to have an upper bound, and therefore a declining growth rate, than the horizontal axis. In our lingo, longevity is unlikely to be I(1), while per capital income is likely to be I(1). And, if it's I(1) in a log-linear form, then we'd probably do a log transform first. The base 10 log isn't what we'd normally use in macro, but it still works. Now, it would also be pretty standard to difference the logged I(1) variable when plotting two series like that: it's why we plot inflation - instead of the price level - against interest rates. In this case, Rosling might have done per capita real income growth rates against longevity, and I don't think he would have ended up with just-so story that he did. If he did though, I think many people would have walked away with the impression that low growth rates are a good thing. Too many people make that assumption casually for me to advocate encouraging it.

Angus said...

I agree there has been real convergence in life expectancy, mostly because of the upper bound.

I disagree that a log transform is the way to go.

He's explicitly talking about convergence while his graph misrepresents the amount of convergence by an order of magnitude.

His scale makes divergence look like convergence and that is just plain dishonest.

PeeDub said...

I also disagree with your quibble. I do think a semi-log plot (I'd say a log-log plot would probably be better, but with regards to the scale in life expectancy, probably isn't that important) is the right way to go. The convergence that you see is not that the width is decreasing. It's that the overlap of the groups has increased (relative to post-industrial revolution).

One quibble I would have with the plot is that he uses dollars rather than some other time-independent value, such as 2010 dollars or percent of world gdp or something similar, because the great shift to the right as time goes on is the artificial thing in *my* mind.

Todd S. said...

You're suggesting the inequality of 2009 is visually misleading b/c the area between 4k & 40k is perhaps compressed. But did they compress just the horizontal labeling or both the labeling & data?

If only the labels are compressed, the $4k label needs to move left and we'd find the visual wasn't too misleading afterall since the data were properly plotted in relation to each other to begin with.

Although, if the labels AND data are compressed then the 4k label & much of the data would need to move left. This would have resulted in what you're suggesting and also make more visually obvious the huge advancements in wealth by those ending on the right.

...if I understand correctly. :)

mobile said...

You can find the applet that Rosling used for his video at http://www.gapminder.org/world . You can then recreate the video using a linear income scale.

Anonymous said...

Saw this same graphic on TED over a year ago here: http://www.ted.com/talks/lang/eng/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

It's much longer and goes into much more detail. I didn't notice then but as I re-watch it he continually skews the data in this same exact way that Angus mentions.

He does make the point toward the end that stats folks always seem to have a problem with his visualizations but that they are meant to be hypothesis generators.

He has a number of talks using all kinds of interesting ways of depicting data including storage tubs. I'm not kidding...

http://www.ted.com/talks/lang/eng/hans_rosling_on_global_population_growth.html

Matt said...

Errr....I have no idea why you think the chart is misleading. Is this the first time you've ever seen a log scale?

I also think your claim that a linear scale would be more enlightening is incredibly misguided. The effect of increasing income on human happiness is much closer to logarithmic than linear, and growth is much more meaningful in percentage terms than it is in absolute terms (because growth is an exponential process).

Anonymous said...

Essentially you're saying that if Tom and Harry are earning $200 and $400 respectively and a year later they are earning $3,000 and $4,000, then they're income equality has increased. I think most people would disagree with that. A log scale is the way to go.

Anonymous said...

Exactly. If a country moves from $400 to $4000 over 50 years, and you project the same growth rate over the next 50 years, then they would end up at $40,000. Using a linear scale would make a country whose income is growing at a constant rate look like it is accelerating across the screen - that would be misleading.

Or, if two countries start at $400 and $440 and both grow at 10% per year, is the poorer country getting further behind? Sure, in terms of dollars. But the poorer country also is always exactly one year behind the richer one. Seems like their difference on the chart shouldn't change, since time is the independent variable here.

Ethan Dennison said...

Hans is far smarter than people who can't interpret a logarithmic scale and understand when it is appropriate to use one.

Whats interesting is that when this was presented at TED, albeit in less dramatic fashion, the point he was making was that data visualization could transform the way people understand the world - the data was just an example. What your post makes clear is that "the world" is not ready for data - visualized or otherwise - because it will be wildly misunderstood.

PLW said...

Another way of making the commenters' point is to say there is a lot of ways do define "Converge". It looks like you want to say that two series converge if the absolute difference between them approaches zero. Is sounds like the commenters want to say that that two series converge when the ratio of one to the other approaches one.

We need to talk about what concept of convergence is appropriate to the question at hand.

I'd support the commenters' definition, because I think it makes sense to say that incomes of n+100 and n converge when n gets big.

To be fair, I think a lot of people take the log of the LHS variables without having any idea why they are doing it. So it's absolutely right to argue about whether it is appropriate. Here, I think it is.

Anonymous said...

Its completely standard form to treat income as non-linear in these kinds of statistics. There are multiple reasons for this, but the intuition is that what you are interested in is not absolute changes in income, but relative. For instance, if a country's per capita income grows by $1,000, that is a huge change if its current per capita income is $2,000. Not as impressive if it's $20,000. So, by making the bottom of the scale non-linear, you are making it much easier to compare what's going on at the top of the chart to whats happening at the bottom.

Moreover, income growth simply isn't linear. Once you have wealth, its much easier to build wealth. That is as true in the middle of the pack as it is at the top; so once a middling country like, say, Pakistan, starts to build wealth, it will grow at roughly the same (non-linear) rate as the countries that came before it. So while in absolute terms, Pakistan may be much further behind, the key isn't how far behind it is, but how quickly it is catching up. By making the income scale exponential, the chart is actually helping people focus on the rate without having to intuit the calculus behind "rate of change".

Anonymous said...

You don't have to choose between them! Rosling's Trendalyzer/Gapminder version of the same chart lets you switch between log and linear scale as often as you like...

It's here: http://www.gapminder.org/world/#$majorMode=chart$is;shi=t;ly=2003;lb=f;il=t;fs=11;al=30;stl=t;st=t;nsl=t;se=t$wst;tts=C$ts;sp=5.59290322580644;ti=2009$zpv;v=0$inc_x;mmid=XCOORDS;iid=phAwcNAVuyj1jiMAkmq1iMg;by=ind$inc_y;mmid=YCOORDS;iid=phAwcNAVuyj2tPLxKvvnNPA;by=ind$inc_s;uniValue=8.21;iid=phAwcNAVuyj0XOoBL_n5tAQ;by=ind$inc_c;uniValue=255;gid=CATID0;by=grp$map_x;scale=lin;dataMin=295;dataMax=79210$map_y;scale=lin;dataMin=19;dataMax=86$map_s;sma=49;smi=2.65$cd;bd=0$inds=;modified=75

Anonymous said...

I have so many issues with this one, it's hard to know where to begin.

From my view, it seems apparent the income scale cannot have been adjusted for real income (inflation and other adjustments). A quick and dirty input of the rough 1810 upper and lower bounds into current dollars shows this to have not changed a great deal over the interval in question. Of course our quality of life has dramatically improved. And here is why.

Income distributions are far more important comparisons than medians and averages and they are local. That is where there had been vast improvement in the West, and with a vast acceleration after WW2. That is, the middle class ranks grew. That was a very big, very different change from the past. Emerging and developing economies are going through a similar transformation now.

Beyond quibbling about the basics of the data, my real problem is with the comparison at all. Life span improvements have no doubt been helped by access to medical care. But we have to fully appreciate the leap that antiseptic, antibiotics, and modern surgical techniques have made in improving span in this very period. Along with, again in the West, a shift from dangerous labor professions (farm, factory) to white collar which has reduced accidental deaths and thus improved the statistics.

And all income brackets have benefited from these (relatively) shared and in-common improvements of technology. Of course, not absolutely equally. But even the worst care today is still vastly better than the best care in 1810. That's science, not income. So why are we even looking at them in this manner in the first place?

As an aside, another danger in looking at income averaging over time is that it can conceal a host of real issues. If you are curious, this analysis at Oregon State on minimum wages, nominal and real, makes this apparent.

http://oregonstate.edu/instruct/anth484/minwage.html

For the poor in the US, the post-WW2 leap in real wages is in the distant past, having peaked in the 1960s. If we looked only at nominal wages, we would miss that point entirely and draw very spurious conclusions about life at the bottom.

Not signing this one as the recent Gawker hacks have made me wary.

Cheers

J. A. Ginsburg said...

I have a somewhat different question concerning Rosling's selective choice of data. During the time-span that the collective health of the human species has been on the upswing, many other species have gone extinct—a large number the result of human action. In fact, so many species have bitten the dust over the last few hundred years, it has been dubbed the "Sixth Great Extinction." I would like to see a chart that also includes extinction data as well as data resource depletion. It would provide both a clearer picture of the true cost of human progress, and clues as to whether that progress can continue.