Counterfactual <- data.frame(Case=c(1,1,2,2), Time=c(1,2,1,2), x0=c(0,0,0,0), xT=c(1,1,1,0))
Counterfactual Case Time x0 xT
1 1 1 0 1
2 1 2 0 1
3 2 1 0 1
4 2 2 0 0
Robert W. Walker
My discussion of two-way fixed effects and the Mundlak estimator on Friday merits backfilling and rendering clear some things I have previously said or noted throughout the last week and to some degree for the two weeks that I should have been more explicit in combining.
The focus of the first week is stationarity. Weak stationarity is sufficient. We noticed that virtually all panel data studies ignore this. This is in part because it is complicated. Panel unit root tests are a viable option for assessing this.
Economists generally adopt a different view of the underlying problem though it is subtly hidden in many discussions. Suppose that I have a process, ignoring heterogeneity within units for now, with
\[ y_{it} = \alpha_{t} + X_{it}\beta + \epsilon_{it} \]
No matter how the mean of \(y_{it}\) varies with respect to time, we can approximate that process with the series of time fixed effects or time dummies, whichever language you prefer, that render \(y_{it}\) with a constant conditional mean given \(t\). NB: With a constant, it is \(t-1\) fixed effects that measure deviations from the reference period – the omitted dummy.
Stepping back, in the context of a single time series, there is no way to identify these time dummies/fixed effects because there are no degrees of freedom. Time smooths and such things can reduce them [as can lagged dependent variables approximate them with all the implications therein] but it is only when we may pool information from two or more time series can we use this method for rendering a[n at least mean] stationary series.
Recalling the Total as the sum of the Within and Between in sum of squares language, we need now apply this along dimension T. The total variation in \(y_{it}\) decomposes exactly the same way, algebraically.
Two parts. The within time deviations – a centered cross-section or notationally \(C(t)\) – with deviations described by \(y^{C(t)} = y_{it} - \overline{y}_{t}\). These are the deviations of each unit from the time mean. Applied to all t in the same fashion, these must be mean zero. There is also the between time construct – the variation in those time averages – that are deviations from the overall mean of \(y\) as \(y^{B(t)} = \overline{y}_{t} - \overline{y}\) that are also mean zero.
The criticism is that over time changes – the between in this case – are aggregated away and that only static cross-sectional variance is explicitly modeled. The other problem is that the dynamic implications are ignored; what happens because our estimates are applied the time-averaged data which may be impacted? This is accounting. It interacts with the first day arguments about random sampling versus convenience sampling; there are only so many countries. If one country undergoes some permanent change, we might should account for the \(\frac{1}{N}\) impact on the mean. With enough countries, it is negligible, or thought so anyway.
The one-way Mundlak model applied to the unit-heterogeneity case establishes that
\[ y_{it} = \alpha + X_{it}\beta + \beta^{B_i}\overline{x}^{B(i)}_{i} + (\nu_{i} + \epsilon_{it}) \] can be estimated using either OLS or GLS. The only problem is the standard errors which will differ because of the induced correlation structures if the variance of the \(\nu_{i}\) is non-zero.
The same argument applies in exactly the same way when we reverse the indices.
\[ y_{it} = \alpha + X_{it}\beta + \beta^{B_{t}}\overline{x}_{t}^{B(t)} + (\nu_{t} + \epsilon_{it}) \] The coefficients \(\beta^{B_{t}}\) describe how the mean of \(y_{it}\) varies over time as a function of the (unit-) averaged variables X.
The model is instructive. If we suppose a counterfactual in some variable \(x\), if permanent over time, has a combined impact of \(\beta + \beta^{B_{t}}\) whereas a temporary one-unit shock is \(\beta\).
The most common formulation and estimated functional form of the two-way Mundlak model is given by Baltagi.
Wooldridge showed that once all the time averaged regressors and all the individual averaged regressors are included in the regression along with the original regressors, the resulting OLS estimator is the two-way Öxed effects or Within estimator.
The effect we are estimating within is net of the over time and across-unit averages.
\[ y_{it} = \alpha + \beta^{B(t)}\overline{x}_{t} + \beta^{B(i)}\overline{x}_{i} + \beta^{W}x_{it} + (\nu_{i} + \nu_{t} + \epsilon_{it}) \]
The very tricky part of this model is to think about the counterfactual that we wish the model to estimate. The reason why is that we must think about precisely how it manifests in \(y_{it}\) because counterfactuals surrouding \(x_{it}\) induce variation in two averages and the remaining within.
If I want to examine how \(x\) changes, I must be precise about what is happening. For example, suppose that one country [in a country-year data structure] experiences a time permanent change in \(x\) of one unit whereas another country experiences a change of one unit in a given period that reverts immediately thereafter. Beneath the surface, stationarity type considerations are still paramount, what is going to happen to the over time and over country averages because we have a direct measure of \(\beta^{W}\). If we suppose that the data are stationary across units and time generically, then the impact for the first country is given by \(\beta^{W}\) plus \(\beta^{B(i)}\) because there is a permanent change in \(\overline{x}_{i}\) by stipulation. But algebraically, there is also a consequence induced by \(\beta^{B(t)}\).
In the counterfactual scenario for my first country, \(\overline{x}_{t}\) changes by \(\frac{1}{N}\) and the impact of this change is \(\beta^{B(t)}\). For the second country, \(\overline{x}_{t}\) changes by \(\frac{1}{NT}\) with impact \(\beta^{B(t)}\). \((T-1)*\beta^{B(t)}\) is the net difference. A complete analysis is explicit about this second effect as well as the first in accounting what we wish to estimate from data.
Equations 13-15 of Baltagi make this clear. The sixth edition of his text contains this information: Baltagi, B.H., 2021, Econometric Analysis of Panel Data, 6th edition (Springer, Switzerland).
The two-way within estimator is a transformed regression of
\(y_{it} = y_{it} - \overline{y}_{i} - \overline{y}_{t} + \overline{y}\) and \(x_{it} = x_{it} - \overline{x}_{i} - \overline{x}_{t} + \overline{x}\)
and changes in \(x\) have to be accounted precisely. We will typically (though implicitly) invoke stationarity or limit type arguments to ignore the \((T-1)*\beta^{B(t)}\) or the \((N-1)*\beta^{B(i)}\) that counterfactuals induce. Most important to note, they are temporal or spatial dependence of a form that we have largely controlled rather than modeled in some explicit form. There are temporal distributed lag models, there are spatial lags and STADL! The question should almost surely determine which are of greatest interest.
Counterfactual <- data.frame(Case=c(1,1,2,2), Time=c(1,2,1,2), x0=c(0,0,0,0), xT=c(1,1,1,0))
Counterfactual Case Time x0 xT
1 1 1 0 1
2 1 2 0 1
3 2 1 0 1
4 2 2 0 0
Suppose a 2 way Mundlak regression or 2 way fixed effects with a regression model setting all regression parameters to 1, the expected value in the first case is 0. When the counterfactual is implemented, we can account for it as follows. What changes? The pooled regression coefficient will increase y by 1. That’s really the easy part.
The new unit mean – expected value of \(x_i\) – should be 1 because of the change for case 1. As an algerbraic matter, Case 2 is just a shock so perhaps no change in expectation. The new time mean for both increases proportionally by the same \(\frac{1}{N}\) – expected value of x – in that period. The change continues for Case 1 but vanishes for Case 2.
I am not suggesting that adjustment by the \(\frac{1}{N}\) is justified, justifiable, or otherwise, but am pointing out that the identities have implications and it is worthwhile to think them through.