use "https://github.com/robertwwalker/Essex-Data/raw/main/ISQ99-Essex.dta"
xtset
drop if AINEW==.
xtset
* Give consideration to what we should do about the Lagged DV
drop if AILAG==.
* A note about long versus flong
mi set flong
* Note that it adds some variables; what are they?
mi describe
mi misstable summarize
* There are some logical restrictions that we are going to want.
* For example, we have some changes that need to be consistent
* with levels. We can deal with that; just calculate the changes
* after imputation.
mi register imputed DEMOC3
mi impute regress DEMOC3 PERCHPCG PERCHPOP LPOP CWARCOW IWARCOW2, add(10) dots force
* mi impute regress PCGTHOU DEMOC3 PERCHPOP LPOP CWARCOW IWARCOW2, add(20)
mi estimate: xtreg AINEW AILAG DEMOC3, fe
The Data: Poe, Tate, Keith 1999
Two objectives. Load the data and transform it into a pdata.frame for now.
Will allow R to answer many questions that stata’s xt commands make available. First, some basic summaries to get to balance.
summary(ISQ99_Essex)
IDORIGIN YEAR AI SD POLRT
Min. : 2.0 Min. :1976 Min. :1.000 Min. :1.000 Min. :1.00
1st Qu.:290.0 1st Qu.:1980 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:2.00
Median :435.0 Median :1984 Median :3.000 Median :2.000 Median :3.00
Mean :446.7 Mean :1984 Mean :2.753 Mean :2.241 Mean :3.81
3rd Qu.:640.0 3rd Qu.:1989 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:6.00
Max. :990.0 Max. :1993 Max. :5.000 Max. :5.000 Max. :7.00
NA's :1061 NA's :587 NA's :382
MIL2 LEFT BRIT PCGNP
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 52
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 390
Median :0.0000 Median :0.0000 Median :0.0000 Median : 1112
Mean :0.2725 Mean :0.1764 Mean :0.3554 Mean : 3592
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.: 3510
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :36670
NA's :382 NA's :393 NA's :290 NA's :443
AINEW SDNEW IDGURR AILAG SDLAG
Min. :1.000 Min. :1.000 Min. : 2.0 Min. :1.00 Min. :1.000
1st Qu.:1.000 1st Qu.:1.000 1st Qu.:290.0 1st Qu.:1.00 1st Qu.:1.000
Median :2.000 Median :2.000 Median :450.0 Median :2.00 Median :2.000
Mean :2.443 Mean :2.262 Mean :455.8 Mean :2.45 Mean :2.247
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:663.0 3rd Qu.:3.00 3rd Qu.:3.000
Max. :5.000 Max. :5.000 Max. :990.0 Max. :5.00 Max. :5.000
NA's :468 NA's :468 NA's :644 NA's :644
PERCHPCG PERCHPOP LPOP PCGTHOU
Min. :-95.500 Min. :-48.450 Min. :11.00 Min. : 0.050
1st Qu.: -2.545 1st Qu.: 0.910 1st Qu.:14.51 1st Qu.: 0.390
Median : 4.615 Median : 2.220 Median :15.59 Median : 1.110
Mean : 4.614 Mean : 2.193 Mean :15.48 Mean : 3.592
3rd Qu.: 11.760 3rd Qu.: 2.940 3rd Qu.:16.64 3rd Qu.: 3.510
Max. :128.570 Max. :126.010 Max. :20.89 Max. :36.670
NA's :618 NA's :293 NA's :115 NA's :443
DEMOC3 CWARCOW IWARCOW2
Min. : 0.000 Min. :0.00000 Min. :0.00000
1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.:0.00000
Median : 0.000 Median :0.00000 Median :0.00000
Mean : 3.682 Mean :0.09201 Mean :0.08621
3rd Qu.: 9.000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :10.000 Max. :1.00000 Max. :1.00000
NA's :793 NA's :407 NA's :380
no time variation: IDORIGIN BRIT IDGURR AILAG SDLAG PERCHPCG PERCHPOP
no individual variation: YEAR AI SD POLRT MIL2 LEFT BRIT PCGNP AINEW SDNEW AILAG SDLAG PERCHPCG PERCHPOP LPOP PCGTHOU DEMOC3 CWARCOW IWARCOW2
all NA in time dimension for at least one individual: AI SD POLRT MIL2 LEFT BRIT PCGNP AINEW SDNEW AILAG SDLAG PERCHPCG PERCHPOP LPOP PCGTHOU DEMOC3 CWARCOW IWARCOW2
all NA in ind. dimension for at least one time period: AI SD POLRT MIL2 LEFT BRIT PCGNP AINEW SDNEW AILAG SDLAG PERCHPCG PERCHPOP LPOP PCGTHOU DEMOC3 CWARCOW IWARCOW2
mplot <-ggplot(ISQ99_Essex, aes(x = DEMOC3, y = POLRT)) +geom_point()mplot
library(naniar)mplot <-ggplot(ISQ99_Essex, aes(x = DEMOC3, y = POLRT)) +geom_miss_point()mplot
Multiple Imputation: Amelia II
A simple multivariate normal is easy as long as the data are well behaved. NB: This uses none of the time series or cross-sectional dimensionality for identification.
Warning: There are observations in the data that are completely missing.
These observations will remain unimputed in the final datasets.
-- Imputation 1 --
1 2 3 4 5 6 7
-- Imputation 2 --
1 2 3 4 5 6 7 8 9
-- Imputation 3 --
1 2 3 4 5 6 7 8
-- Imputation 4 --
1 2 3 4 5 6 7 8 9 10
-- Imputation 5 --
1 2 3 4 5 6
Without simplifying, it crashes because there are id variables of different sorts and other things hiding in there, perfect multicollinearities exist. Even with them, we need a bit more work.
Transforms are Key
Using the cs and ts information with polytime/splinetime and intercs