Monday, September 21, 2015

Whose Expectations Augment the Phillips Curve?

My first economics journal publication is now available online in Economics Letters. This link provides free access until November 9, 2015. It is a brief piece (hence the letter format) titled "Whose Expectations Augment the Phillips Curve?" The short answer:
"The inflation expectations of high-income, college-educated, male, and working-age people play a larger role in inflation dynamics than do the expectations of other groups of consumers or of professional forecasters."

Sunday, September 6, 2015

Which Measure of Inflation Should a Central Bank Target?

"Various monetary proposals can be viewed as inflation targeting with a nonstandard price index: The gold standard uses only the price of gold, and a fixed exchange rate uses only the price of a foreign currency."
That's from a 2003 paper by Greg Mankiw and Ricardo Reis called "What Measure of Inflation Should a Central Bank Target?" At the time, the Federal Reserve had not explicitly announced its inflation target, though an emphasis on core inflation, which excludes volatile food and energy prices (the blue line in the figure below), arose under Alan Greenspan's chairmanship. Other central banks, including the Bank of England and the European Central Bank, instead focus on headline inflation. In 2012, the Fed formalized PCE inflation as its price stability target, but closely monitors core inflation as a key indicator of underlying inflation trends.. Some at the Fed, including St. Louis Fed President James Bullard, have argued that the Fed should focus more on headline inflation (the red line) and less on core inflation (the blue line).

Source: FRED
Mankiw and Reis frame the issue more generally than just a choice between core and headline inflation. A price index assigns weights to prices in each sector of the economy. A core price index would put zero weight on food and energy prices, for example, but you could also construct a price index that put a weight of 0.75 on hamburgers and 0.25 on milkshakes, if that struck your fancy. Mankiw and Reis ask how a central bank can optimally choose these weights for the price index it will target in order to maximize some objective.

In particular, they suppose the central bank's objective is to minimize the volatility of the output gap. They explain, "We are interested in finding the price index that, if kept on an assigned target, would lead to the greatest stability in economic activity. This concept might be called the stability price index." They model how the weight on each sector in this stability price index depend on certain sectoral properties: the cyclical sensitivity of the sector, the proclivity of the sector to experience idiosyncratic shocks, and the speed with which the prices in the sector can adjust.

The findings are mostly intuitive. If a particular sector's prices are very procyclical, do not experience large idiosyncratic shocks, or are very sticky, that sector should receive relatively large weight in the stability price index. Each of these characteristics makes the sector more useful from a signal-extraction perspective as an indicator of economic activity.

Next, Mankiw and Reis do a "back-of-the-envelop" exercise to calculate the weights for a stability price index for the United States using data from 1957 to 2001. They consider four sectors: food, energy, other goods and services, and nominal wages. They stick to four sectors for simplicity, but it would also be possible to include other prices, like gold and other asset prices. To the extent that these are relatively flexible and volatile, they would probably receive little weight.  The inclusion of nominal wages is interesting, because it is a price of labor, not of a consumption good, so it gets a weight of 0 in the consumer price index. But nominal wages are procyclical, not prone to indiosyncratic shocks, and sticky, so the result is that the stability price index weight on nominal wages is near one, while the other sectors get weights near zero. This finding is in line with other results, even derived from very different models, about the optimality of including nominal wages in the monetary policy target.

More recently, Josh Bivens and others have proposed nominal wage targets for monetary policy, but they frame this as an alternative to unemployment as an indicator of labor market slack for the full employment component of the Fed's mandate. In Mankiw and Reis' paper, even a strict inflation targeting central bank with no full employment goal may want to use nominal wages as a big part of its preferred measure of inflation. (Since productivity

If we leave nominal wages out of the picture, the results provide some justification for a focus on core, rather than headline, inflation. Namely, food and energy prices are very volatile and not very sticky. Note, however, that the paper assumes that the central bank has perfect credibility, and can thus achieve whatever inflation target it commits to. In Bullard's argument against a focus on core inflation, he implicitly challenges this assumption:
"One immediate benefit of dropping the emphasis on core inflation would be to reconnect the Fed with households and businesses who know price changes when they see them. With trips to the gas station and the grocery store being some of the most frequent shopping experiences for many Americans, it is hardly helpful for Fed credibility to appear to exclude all those prices from consideration in the formation of monetary policy."
Bullard's concern is that since food and energy prices are so visible and salient for consumers, they might play an oversized role in perceptions and expectations of inflation. If the Fed holds core inflation steady while gas prices and headline inflation rise, maybe inflation expectations will rise a lot, becoming unanchored, and causing feedback into core inflation. There is mixed evidence on whether this is a real concern in recent years. I'm doing some work on this topic myself, and hope to share results soon.

As an aside to my students in Senior Research Seminar, I highly recommend the Mankiw and Reis paper as an example of how to write well, especially if you plan to do a theoretical thesis.

Note: The description of the Federal Reserve's inflation target in the first paragraph of this post was edited for accuracy on September 25.

Sunday, August 30, 2015

False Discoveries and the ROC Curves of Social Science

Diagnostic tests for diseases can suffer from two types of errors. A type I error is a false positive, and a type II error is a false negative. The sensitivity or true positive rate is the probability that a test result will be positive when the disease is actually present. The specificity or true negative rate is the probability that a test result will be negative when the disease is not actually present. Different choices of diagnostic criteria correspond to different combinations of sensitivity and specificity. A more sensitive diagnostic test could reduce false negatives, but might increase the false positive rate. Receiver operating characteristic (ROC) curves are a way to visually present this tradeoff by plotting true positive rates or sensitivity on the y-axis and false positive rates (100%-specificity) on the x-axis.


As the figure shows, ROC curves are upward sloping-- diagnosing more true positives typically means also increasing the rate of false positives. The curve goes through (0,0) and (100,100), because it is possible to either diagnose nobody as having the disease and get a 0% true positive rate and 0% false positive rate, or to diagnose everyone as having the disease and get a 100% true positive rate and 100% false positive rate. The further an ROC is above the 45 degree line, the better the diagnostic test is, because for any level of false positives, you get a higher level of true positives.

Rafa Irizarry at the Simply Statistics blog makes a really interesting analogy between diagnosing disease and making scientific discoveries. Scientific findings can be true or false, and if we imagine that increasing the rate of important true discoveries also increases the rate of false positive discoveries, we can plot ROC curves for scientific disciplines. Irizarry imagines the ROC curves for biomedical science and physics (see the figure below). Different fields of research vary in the position and shape of the ROC curve--what you can think of as the production possibilities frontier for knowledge in that discipline-- and in the position on the curve.

In Irizarry's opinion, physicists make fewer important discoveries per decade and also fewer false positives per decade than biomedical scientists. Given the slopes of the curves he has drawn, biomedical scientists could make fewer false positives, but at a cost of far fewer important discoveries.

Source: Rafa Irizarry
A particular scientific field could move along its ROC curve by changing the field's standards regarding peer review and replication, changing norms regarding significance testing, etc. More critical review standards for publication would be represented by a shift down and to the left along the ROC curve, reducing the number of false findings that would be published, but also potentially reducing the number of true discoveries being published. A field could shift its ROC curve outward (good) or inward (bad) by changing the "discovery production technology" of the field.

The importance of discoveries is subjective, and we don't really know numbers of  "false positives" in any field of science. Some never go detected. But lately, evidence of fraudulent or otherwise irreplicable findings in political science and psychology point to potentially high false positive rates in the social sciences. A few days ago, Science published an article on "Estimating the Reproducibility of Psychological Science." From the abstract:
We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects.
As studies of this type hint that the social sciences may be far to the right along an ROC curve, it is interesting to try to visualize the shape of the curve. The physics ROC curve that Irizarry drew is very steep near the origin, so an attempt to reduce false positives further would, in his view, sharply reduce the number of important discoveries. Contrast that to his curve for biomedical science. He indicates that biomedical scientists are on a relatively flat portion of the curve, so reducing the false positive rate would not reduce the number of important discoveries by very much.

What does the shape of the economics ROC curve look like in comparison to those of other sciences, and where along the curve are we? What about macroeconomics in particular? Hypothetically, if we have one study that discovers that the fiscal multiplier is smaller than one, and another study that discovers that the fiscal multiplier is greater than one, then one study is an "important discovery" and one is a false positive. If these were our only two macroeconomic studies, we would be exactly on the 45 degree line with perfect sensitivity but zero specificity.

Thursday, August 6, 2015

Macroeconomics Research at Liberal Arts Colleges

I spent the last two days at the 11th annual Workshop on Macroeconomics Research at Liberal Arts Colleges at Union College. The workshop reflects the growing emphasis that liberal arts colleges place on faculty research. There were four two-hour sessions of research presentations--international, banking, information and expectations, and theory--in addition to breakout sessions on pedagogy. I presented my research in the information and expectations session.

I definitely recommend this workshop to other liberal arts macro professors. The end of summer timing was great. I got to think about how to prioritize my research goals before the semester starts and to hear advice on teaching and course planning from a lot of really passionate teachers. It was very encouraging to witness how many liberal arts college professors at all stages of their careers have maintained very active research agendas while also continually improving in their roles as teachers and advisors.

After dinner on the first day of the workshop, there was a panel discussion about publishing with undergraduates. I also attended a pedagogy session on advising undergraduate research. Many of the liberal arts colleges represented at the workshop have some form of a senior thesis requirement. A big part of the discussion was how to balance the emphasis on "product vs. process" for undergraduate research. In other words, how active of a role should a faculty member take in trying to ensure a high-quality final product of a senior thesis project versus ensuring that different learning goals are met. What should those learning goals be? Some possibilities include helping students decide if they want to go to grad school, teach independence, writing skills, econometric techniques, the ability to for an economic argument. And relatedly, how should grades or honors designations reflect the final product and the learning goals that are emphasized?

We also discussed the relative merits of helping students publish their research, either in an undergraduate journal or a professional journal. There was a lot of lack of clarity about how it affects an assistant professor's tenure case if they have very low-ranked publications with undergraduate coauthors, and a general desire for more explicit guidelines about whether that is considered a valuable contribution.

These discussions of research by or with undergraduates left me really curious to hear about others' experiences doing or supervising undergraduate research. I'd be very happy to feature some examples of research with or by undergraduates as guest posts. Send me an email if you're interested.

At least two other conference participants have blogs, and they are definitely worth checking out. Joseph Joyce of Wellesley blogs about international finance at "Capital Ebbs and Flows." Bill Craighead of Wesleyan blogs at "Twenty-Cent Paradigms." Both have recent thoughtful commentary on Greece.

Friday, July 31, 2015

Surveys in Crisis

In "Household Surveys in Crisis," Bruce D. Meyer, Wallace K.C. Mok, and James X. Sullivan describe household surveys as "one of the main innovations in social science research of the last century." Large, nationally representative household surveys are the source of official rates of unemployment, poverty, and health insurance coverage, and are used to allocate government funds. But the quality of survey data is declining on at least three counts.

The first and most commonly studied problem is the rise in unit nonresponse, meaning fewer people are willing to take a survey when asked. Two other growing problems are item nonresponse-- when someone agrees to take the survey but refuses to answer particular questions-- and inaccurate responses. Of course, the three problems can be related. For example, attempts to reduce unit nonresponse by persuading reluctant households to take a survey could raise item nonresponse and inaccurate responses if these reluctant participants rush through a survey they didn't really want to take in the first place.

Unit nonresponse, item nonresponse, and inaccurate responses would not be too troublesome if they were random enough that survey statistics were unbiased, but that is unlikely to be the case. Nonresponse and misreporting may be systematically correlated with relevant characteristics such as income or receipt of government funds. Meyer, Mok, and Sullivan look at survey data about government transfer programs for which corresponding administrative data is also available, so they can compare survey results to presumably more accurate administrative data. In this case, the survey data understates incomes at the bottom of the distribution, understates the rate of program receipt and the poverty reducing effects of government programs, and overstates measures of poverty and of inequality. For other surveys that cannot be linked to administrative data, it is difficult to say which direction biases will go.

Why has survey quality declined? The authors discuss many of the traditional explanations:
"Among the traditional reasons proposed include increasing urbanization, a decline in public spirit, increasing time pressure, rising crime (this pattern reversed long ago), increasing concerns about privacy and confidentiality, and declining cooperation due to 'over-surveyed' households (Groves and Couper 1998; Presser and McCullogh 2011; Brick and Williams 2013). The continuing increase in survey nonresponse as urbanization has slowed and crime has fallen make these less likely explanations for present trends. Tests of the remaining hypotheses are weak, based largely on national time-series analyses with a handful of observations. Several of the hypotheses require measuring societal conditions that can be difficult to capture: the degree of public spirit, concern about confidentiality, and time pressure...We are unaware of strong evidence to support or refute a steady decline in public spirit or a rise in confidentiality concerns as a cause for declines in survey quality."
They find it most likely that the sharp rise in the number of government surveys administered in the US since 1984 has resulted in declining cooperation by "over-surveyed" households. "We suspect that talking with an interviewer, which once was a rare chance to tell someone about your life, now is crowded out by an annoying press of telemarketers and commercial surveyors."

Personally, I have not received any requests to participate in government surveys and rarely receive commercial survey requests. Is this just because I moved around so much as a student? Am I about to be flooded with requests? I think I would actually find it fun to take some surveys after working with the data so much. Please leave a comment about your experience with taking (or declining to take) surveys.

The authors also note that since there is a trend toward greater leisure time, it is unlikely that increased time pressure is resulting in declining survey quality. However, while people have more leisure time, they may also have more things to do with their leisure time (I'm looking at you, Internet) that they prefer to taking surveys. Intuitively I would guess that as people have grown more accustomed to doing everything online, they are less comfortable talking to an interviewer in person or on the phone. Since I almost never have occasion to go to the post office, I can imagine forgetting to mail in a paper survey. Switching surveys to online format could result in a new set of biases, but may eventually be the way to go.

I would also guess that the Internet has changed people's relationship with information, even information about themselves. When you can look up anything easily, that can change what you decide to remember and what facts you feel comfortable reporting off the top of your head to an interviewer.

Wednesday, July 8, 2015

Trading on Leaked Macroeconomic Data

The official release times of U.S. macroeconomic data are big deals in financial markets. A new paper finds evidence of substantial informed trading before the official release time of certain macroeconomic variables, suggesting that information is often leaked. Alexander Kurov, Alessio Sancetta, Georg H. Strasser, and Marketa Halova Wolfe examine high-frequency stock index and Treasury futures markets data around releases of U.S. macroeconomic announcements:
These announcements are quintessential updates to public information on the economy and fundamental inputs to asset pricing. More than a half of the cumulative annual equity risk premium is earned on announcement days (Savor & Wilson, 2013) and the information is almost instantaneously reflected in prices once released (Hu, Pan, & Wang, 2013). To ensure fairness, no market participant should have access to this information until the official release time. Yet, in this paper we find strong evidence of informed trading before several key macroeconomic news announcements....Prices start to move about 30 minutes before the official release time and the price move during this pre-announcement window accounts on average for about a half of the total price adjustment.
They consider the 30 macroeconomic announcements that other authors have shown tend to move markets, and find evidence of:

  • Significant pre-announcement price drift for: CB consumer confidence index, existing home sales, GDP preliminary, industrial production, ISM manufacturing index, ISM non-manufacturing index, and pending home sales.
  • Some pre-announcement drift for: advance retail sales, consumer price index, GDP advance, housing starts, and initial jobless claims.
  • No pre-announcement drift for: ADP employment, durable goods orders, new home sales, non-farm employment, producer price index, and UM consumer sentiment.
The figure below shows mean cumulative average returns in the E-mini S&P 500 Futures market from 60 minutes before the release time to 60 minutes after the release time for the series with significant evidence of pre-announcement drift.

Source: Kurov et al. 2015, Figure A1, panel c. Cumulative average returns in the E-mini S&P 500 Futures market .
Why do prices start to move before release time? It could be that some traders are superior forecasters, making better use of publicly-available information, and waiting until a few minutes before the announcement to make their trades. Alternatively, information might be leaked before the official release. Kurov et al. note that, while the first possibility cannot be ruled out entirely, the leaked information explanation appears highly likely. The authors conducted a phone and email survey of the organizations responsible for the macroeconomic data in their study to find out about data release procedures:
The release procedures fall into one of three categories. The first category involves posting the announcement on the organization’s website at the official release time, so that all market participants can access the information at the same time. The second category involves pre-releasing the information to selected journalists in “lock-up rooms” adding a risk of leakage if the lock-up is imperfectly guarded. The third category, previously not documented in academic literature, involves an unusual pre-release procedure used in three announcements: Instead of being pre-released in lock-up rooms, these announcements are electronically transmitted to journalists who are asked not to share the information with others. These three announcements are among the seven announcements with strong drift.
I wish I had a better sense of who was obtaining the leaked information and how much they were making from it.

Wednesday, June 24, 2015

Forecasting in Unstable Environments

I recently returned from the International Symposium on Forecasting "Frontiers in Forecasting" conference in Riverside. I presented some of my work on inflation uncertainty in a session devoted to uncertainty and the real economy. A highlight was the talk by Barbara Rossi,  a featured presenter from Universitat Pompeu Fabra, on "Forecasting in Unstable Environments: What Works and What Doesn't." (This post will be a bit more technical than my usual.)

Rossi spoke about instabilities in reduced form models and gave an overview of the evidence on what works and what doesn't in guarding against these instabilities. The basic issue is that the predictive ability of different models and variables changes over time. For example, the term spread was a pretty good predictor of GDP growth until the 1990s, and the credit spread was not. But in the 90s the situation reversed, and the credit spread became a better predictor of GDP growth while the term spread got worse.

Rossi noted that break tests and time varying parameter models, two common ways to protect against instabilities in forecasting relationships, do involve tradeoffs. For example, it is common to test for a break in an empirical relationship, then estimate a model in which the coefficients before and after the break differ. Including a break point reduces the bias of your estimates, but also reduces the precision. The more break points you add, the shorter are the time samples you use to estimate the coefficients. This is similar to what happens if you start adding tons of control variables to a regression when your number of observations is small.

Rossi also discussed rolling window estimation. Choosing the optimal window size is a challenge, with a similar bias/precision trade-off. The standard practice of reporting results from only a single window size is problematic, because the window size may have been selected based on "data snooping" to obtain the most desirable results. In work with Atsushi Inoue, Rossi develops out of sample forecast tests that are robust to window size. Many of the basic tools and tests from macroeconomic forecasting-- Granger casualty tests, forecast comparison tests, and forecast optimally tests-- can be made more robust to instabilities. For details, see Raffaella Giacomini and Rossi's chapter in the Handbook of Research Methods and Applications on Empirical Macroeconomics and references therein.

A bit of practical advice from Rossi was to maintain large-dimensional datasets as a guard against instability. In unstable environments, variables that are not useful now may be useful later, and it is increasingly computationally feasible to store and work with big datasets.