Choosing the Right Regressors

Talk at PIDE Nurturing Minds Seminar on 29th Nov 2017. Based on “Lessons in Econometric Methodology: Axiom of Correct Specification”, International Econometric Review, Vol 9, Iss 2. Serious mistakes in modern econometrics arise from adoption of wrong methodology. For more discussion of this aspect, see this same post, with a preface, on my new blog: An Islamic WorldView

Conventional econometric methodology, as taught in textbooks, creates serious misunderstandings about applied econometrics. Econometricians try out various models, select one according to different criteria, and then interpret the results. The significance of the fact that interpretations are only valid if the model is CORRECT are not highlighted in textbooks. The result is that everyone presents and interprets their models as if the model was correct. This relaxed assumption – that we can assume correct any model that we put down on paper, subject to minor checks like high R-squared and significant t-stats – leads to dramatically defective inferences. In particular, ten different authors may present 10 different specifications for the same variable, and each may provide an interpretation based on the assumption that his model is correctly specified. What is not realized is that there is only one correct specification, which must include all the determinants as regressor, and also exclude all irrelevant variables (though this is not so important). This means that out of millions of regressions based on different possible choices of regressors, only one is correct, while all the rest are wrong. Thus all 10 authors with 10 different specifications cannot be right – at most one of them can be right. In this particular case, we could see at least 90% of the authors are wrong. This generally applies to models published in journals – the vast majority of different specification must be wrong.
Now the question arises as to how much difference this Axiom of Correct Specification makes. If we can get approximately correct results, then perhaps the current relaxed methodology is good enough as a beginning point. Here the talk/paper demonstrates that if one major variable is omitted from the regression model, than anything can happen. Typically, completely meaningless regressors will appear to be significant. For instance, if we regress the consumption of Australia on the GDP of China, we find a very strong regression relationship with R-squared above 90%. Does this means that China’s GDP determines 90% of the variation in Australian consumption. Absolutely not. This is a nonsense regression, also known as a spurious regression. The nonsense regression is cause by the OMISSION of an important variable – namely Australian GDP, which is the primary determinant of Australian Consumption. A major and important assertion of the paper is that the idea that nonsense regressions are caused by INTEGRATED regressors is wrong. This means that the whole theory of integration and co-integration, developed to resolve the problem of nonsense regression, is searching for solutions in the wrong direction. If we focus on solving the problem of selecting the right regressors – ensuring inclusion of all major determinants – then we can resolve the problem of nonsense or meaningless regressions.
Next we discuss how we can ensure the inclusion of all major determinants in the regression equation. Several strategies currently in use are discussed and rejected. One of these is Leamer’s strategy of extreme bounds analysis, and some variants of it. These do not work in terms of finding the right regressors. Bayesian strategies are also discussed. These work very well in the context of forecasting, by using a large collection of models which have high probabilities of being right. This works by diversifying risk – instead of betting on any one model to be correct, we look at a large collection. However, it does not work well for identifying the one true model that we are looking for.
The best strategy currently in existence for finding the right regressors is the General-to-Simple modeling strategy of David Hendry. This is the opposite of standard simple-to-general strategy advocated and used in conventional econometric methodology. There are several complications in applying this strategy, which make it difficult to apply. It is because of these complications that this strategy was considered and rejected by econometrician. For one thing, if we include a large number of regressors, as GeTS required, multicollinearities emerge which make all of our estimates extremely imprecise. Hendry’s methodology has resolved these, and many other difficulties, which arise upon estimation of very large models. This methodology has been implemented in Autometrics package within the PC-GIVE software for econometrics. This is the state-of-the-art in terms of automatic model selection, based purely on statistical properties. However, it is well established that human guidance, where importance of variables is decided by human judgment about real-world causal factors, can substantially improve upon automatic procedures. It is very possible, and happens often in real world data sets, that a regressor which is statistically inferior, but is known to be relevant from either empirical or theoretical considerations, will outperform a statistically superior regressor, which does not make sense from a theoretical perspective. A 70m video-lecture on YouTube is linked below. PPT Slides for the talk, which provide a convenient outline, are available from SlideShare: Choosing the Right Regressors. The paper itself can be downloaded from “Lessons in Econometric Methodology: The Axiom of Correct Specification

SHORTLINK for this page

Dr. Asad Zaman | Choosing the Right Regressors
Short URL:
Traffic stats: