Sunday, July 15, 2012

All dressed up but nowhere to go

There are a lot of things that drive me crazy about the current practice of econometrics. People who think over-identification tests validate their indentifying assumptions. People who think that if you fail to reject the null at the 0.05 level, it's fine to proceed in your analysis as if the null was true (i.e. people who don't believe in type II error).

But one of the biggest is the practice of thinking we do no harm by using estimators we know to be inappropriate for the data at hand and thinking we somehow fully fix that issue by using robust standard errors.

I annually beat my head against the wall trying to get my students to appreciate these issues (only to often have my work undone by their reading papers/books that make these mistakes), but now on this last point, I have some help!

Continue reading below the fold

There is a new working paper by King & Roberts that spells out and gives examples of the kinds of problems that arise when we think robust standard errors somehow fix specification problems. Here's a fun quote:

According to the contrary philosophy of science implied by the most common use of robust standard errors, if it looks like a duck and smells like a duck, it could well be delicious peanut butter sandwich.

This paper is gold, and will be required reading in my 'metrics class.

The other highly relevant paper here is Leamer's short, classic 2010 piece in the Journal of Economic Perspectives (ungated version here).

Here's the most relevant snippet:

An earlier generation of econometricians corrected the heteroskedasticity problems with weighted least squares using weights suggested by an explicit hetero- skedasticity model. These earlier econometricians understood that reweighting the observations can have dramatic effects on the actual estimates, but they treated the effect on the standard errors as a secondary matter. A “robust standard” error completely turns this around, leaving the estimates the same but changing the size of the confifidence interval. Why should one worry about the length of the confifidence interval, but not the location? This mistaken advice relies on asymp- totic properties of estimators.5 I call it “White-washing.” Best to remember that no matter how far we travel, we remain always in the Land of the Finite Sample, infifinitely far from Asymptopia. Rather than mathematical musings about life in Asymptopia, we should be doing the hard work of modeling the heteroskedasticity and the time dependence to determine if sensible reweighting of the observations materially changes the locations of the estimates of interest as well as the widths of the confifidence intervals.

Amen, Ed!


Unknown said...

"Best to remember that no matter how far we travel, we remain always in the Land of the Finite Sample, infifinitely far from Asymptopia."


Brad said...

Neal Beck recently posted a similar paper to Pol Meth, specifically on "cluster" robust errors:

Dr. Tufte said...

Awww ... c'mon Angus ... everyone has to use robust standard errors because there's a button to click on the menu that will do the work for you. All that weighted least squares sounds like work.

P.S. I blame EViews for starting this trend!

Luis Enrique said...

people are going to find that aid example very hard to swallow, and it might just serve to cast doubt on their arguments/technique. You can just look at the data and see smaller countries get more aid (per capita / relative to GDP)

a lot hangs on whether you include countries like India and China in your dataset (they are often excluded) and what you have done with big recipients like Egypt, Iraq, Afghanistan etc. Presumably their transformation places more weight on certain observations than others - something weird is going on to give a very hard-to-believe result - there's no way that if you take two countries of
similar characteristics but different populations, the right model will predict the larger one will get more aid - just look at the data.

Anonymous said...

I have been reading the King & Roberts paper, and I noticed that they appear to advocate choosing a model, testing to see if RSE and the classical errors cases are similar. If they are, then all is good, but if not, they seem to suggest that one should go back to finding a new model specification.

I was wondering though about the degrees of freedom problem this could entail. If you're using using this method to derive your model, aren't you implicitly using up your degrees of freedom, requiring one to consider a more stringent significance test?

While I understand it would be on a case by case basis, would you say that in general it might be a better choice to perform this model searching than to accept the problems of RSE and moving on? Of course this is assuming FGLS isn't an option for whatever reason.