A Factor Analysis of Gender Role Attitudes in the NLSY79
Summary
The measurement properties of gender role attitude questions in the NLSY79 are examined. There seems to be a clear singlefactor structure, that is (arguably) stable across administrations of the questions.
Introduction
I used the gender role attitude questions from the NLSY79 for my thesis, but always meant to do a more thorough analysis. These variables seem to have good measurement properties, but I always wondered how they held up across various waves of data.
The questions are four point scales:
We are interested in your opinion about the employment of wives. I will read a series of statements and after each one I would like to know whether you strongly agree, agree, disagree, or strongly disagree:
 A woman's place is in the home, not in the office or shop.
 A wife who carries out her full family responsibilities doesn't have time for outside employment.
 A working wife feels more useful than one who doesn't hold a job.
 The employment of wives leads to more juvenile delinquency.
 Employment of both parents is necessary to keep up with the high cost of living.
 It is much better for everyone concerned if the man is the achiever outside the home and the woman takes care of the home and family.
 Men should share the work around the house with women, such as doing dishes, cleaning, and so forth.
 Women are much happier if they stay at home and take care of their children
These questions were asked in 1979, 1982, 1987, and again in 2004 of the same individuals. In preparation for doing some other analyses, I wanted to evaluate if the factor loadings were consistent over time, that is, did they pretty much mean the same thing in 2004 as they did in 1979? A SEM seems the way to go. These variables are coded 14, with 4=strongly agree (Items 3, 5, and 7 need to be recoded to have higher values be more patriarchal). They are actually ordinal rather than continuous, but for this quickanddirty analysis, treating them as continous seems reasonable, and is how they have been treated in the literature in the past. They all have moderate skewness and kurtosis, but compared to many variables used in Sociology and related fields, are pretty good.
In preparation, I downloaded the raw data and did some exploring and recoding in Stata. See ouput file here for full results. In brief, I discovered that items three and five simply did not hang well with the other six items. This story was consistent across perusing raw correlation matrices, Principle Factor EFAs, and computing alphas on scales with items left out. This phenomenon was also consistent across years. Below as an example is the table of alphas for the 1987 wave, which is probably more interpretable at a glance than the EFAs or raw correlations. These two items were discarded for further analyses. Item seven was a bit iffy as well, but it did contribute to the overall reliability assuming a single factor.
As to why these two (or three) items did not perform well, one might argue that most of the items seem to be normative statements, about how things should be, while the two that don't fit seem to tap into something else related to employment realities.
An interesting supposition might also be that since these three variables are coded in the opposite direction than the others, respondents may have simply made mistakes in their answers.
Reliability Analysis, 1987 Administration 

Item 
Obs 
Sign 
itemtest correlation 
itemrest correlation 
average interitem covariance 
alpha 
1 
home87 
10329 
+ 
0.7500 
0.6287 
.11812 
0.6873 
2 
notime87 
10311 
+ 
0.7159 
0.5864 
.1226325 
0.6966 
3 
useful87 
10050 
+ 
0.3789 
0.1467 
.1620956 
0.7838 
4 
juvdel87 
10130 
+ 
0.6516 
0.5071 
.1307341 
0.7124 
5 
employ87 
10313 
+ 
0.4584 
0.2675 
.1523274 
0.7559 
6 
trade87 
10227 
+ 
0.7379 
0.6069 
.1181095 
0.6908 
7 
share87 
10423 
+ 
0.5176 
0.3616 
.147459 
0.7383 
8 
happy87 
9754 
+ 
0.6543 
0.5127 
.1305566 
0.7121 









Test scale 




.1352586 
0.7502 
Bringing the data into Lisrel, I first computed a simple model with a four latent variables, one for each administration of the scale. Next, given that the same question asked multiple times should have autocorrelated error terms, I added these, see path diagram. Finally, the full model placed equality constraints on the loadings, e.g. the loading for home79 is equal to the loading for home87, home89, and home2004. You can look at the full output for the respective models here:
The fit for these models was quite reasonable. The first was not actually expected to fit well, since it ignored the known autocorrelations. The second fit the best, of course, but the full, final model had a reasonable enough fit that we can be comfortable in considering the loadings to be comparable across administrations of the scale. Based on the change in Chi Square (182 with 15 df), the lambdasequal model fits worse than the unconstrained one, but with a sample size of over 4,000, even the slightest change is likely to be significant. The confidence intervals for the RMSEAs overlap for the two models, and overall, it has a good fit to the data.
Examining the modification indices, there were two major places in the model for improving fit. First, there is a relationship between HOME and NOTIME that is not captured by the latent variable; given that the substantive meaning of these two questions is so similar (see wording above), in retrospect this is not surprising. A second place was that the loadings for the 2004 administration differ more from the previous waves than the others do with each other. Again, this is unsurprising given the much longer gap in the time of administering the scale. A model with these areas of malfit freed up (e.g. freeing the error covariances between HOME and NOTIME within a year, and the loadings for the 2004 administration allowed to vary) has a ChiSquare of 763 with 216 degrees of freedom, and the other fit indices are also better. Since this was an exploratory model driven by the modification indices, the results are somewhat suspect, but upon examiniation of the original, raw correlations, the relationship between HOME and NOTIME is much stronger than others (see Stata output). Researchers might consider combining or dropping one of these two measures if parsimony is preferred.
All loadings were highly significant (smallest Tvalue was 40.7!), which is to be expected with a decent model and a large sample size. Standardized lambda coeficients (sometimes called validity coefficients, though they're more lindicators of reliability than validity) are mostly quite good, in the .6 to .8 range, except for the variable of sharing housework, which had a value as low as .31. So the story of the CFA is remarkably similar to that told by the EFA: they all hang together great, except for that one item.
Fit Indices, Three Models 


No Auto Corr 
W/Auto Corr 
Lambdas Equal 
Degrees of Freedom 

246 
210 
225 
Minimum Fit Function ChiSquare 

3011.80 
951.19 
1132.93 
Normal Theory Weighted Least Squares ChiSquare 

3368.84 
968.69 
1156.77 
Estimated Noncentrality Parameter (NCP) 

3122.84 
758.69 
931.77 
90 Percent Confidence Interval for NCP 

(2939.02 ; 3313.99) 
(665.74 ; 859.17) 
(829.18 ; 1041.87) 

Minimum Fit Function Value 

0.64 
0.2 
0.24 
Population Discrepancy Function Value (F0) 

0.67 
0.16 
0.2 
90 Percent Confidence Interval for F0 

(0.63 ; 0.71) 
(0.14 ; 0.18) 
(0.18 ; 0.22) 
Root Mean Square Error of Approximation (RMSEA) 

0.052 
0.028 
0.03 
90 Percent Confidence Interval for RMSEA 

(0.050 ; 0.054) 
(0.026 ; 0.030) 
(0.028 ; 0.031) 
PValue for Test of Close Fit (RMSEA < 0.05) 

0.015 
1 
1 

Expected CrossValidation Index (ECVI) 

0.74 
0.25 
0.28 
90 Percent Confidence Interval for ECVI 

(0.70 ; 0.78) 
(0.23 ; 0.27) 
(0.26 ; 0.30) 
ECVI for Saturated Model 

0.13 
0.13 
0.13 
ECVI for Independence Model 

20.55 
20.55 
20.55 

ChiSquare for Indep Model with 276 DF 

96232.7 
96232.7 
96232.7 
Independence AIC 

96280.7 
96280.7 
96280.7 
Model AIC 

3476.84 
1148.69 
1306.77 
Saturated AIC 

600 
600 
600 
Independence CAIC 

96459.5 
96459.5 
96459.5 
Model CAIC 

3879.26 
1819.4 
1865.7 
Saturated CAIC 

2835.7 
2835.7 
2835.7 

Normed Fit Index (NFI) 

0.97 
0.99 
0.99 
NonNormed Fit Index (NNFI) 

0.97 
0.99 
0.99 
Parsimony Normed Fit Index (PNFI) 

0.86 
0.75 
0.81 
Comparative Fit Index (CFI) 

0.97 
0.99 
0.99 
Incremental Fit Index (IFI) 

0.97 
0.99 
0.99 
Relative Fit Index (RFI) 

0.96 
0.99 
0.99 

Critical N (CN) 

468.48 
1284.54 
1147.6 


Root Mean Square Residual (RMR) 

0.017 
0.011 
0.016 
Standardized RMR 

0.036 
0.023 
0.031 
Goodness of Fit Index (GFI) 

0.94 
0.98 
0.98 
Adjusted Goodness of Fit Index (AGFI) 

0.93 
0.98 
0.97 
Parsimony Goodness of Fit Index (PGFI) 

0.77 
0.69 
0.73 
Loadings (Lambdas) from final model
Raw loading, Std error, and Completely standardized loading in parentheses(). 
home79 
1 




(.69) 








notime79 
0.92 




0.01 




(.66) 








juvdel79 
0.73 




0.01 




(.53) 








trade79 
0.98 




0.01 




(.68) 








share79 
0.42 




0.01 




(.31) 








happy79 
0.78 




0.01 




(.61) 








home82 
1 




(0.75) 











notime82 
0.92 




0.01 




(0.74) 







juvdel82 
0.73 




0.01 




(0.59) 







trade82 
0.98 




0.01 




(0.74) 







share82 
0.42 




0.01 




(0.36) 







happy82 
0.78 




0.01 




(0.65) 







home87 

1 




(0.76) 










notime87 

0.92 




0.01 




(0.72) 






juvdel87 

0.73 




0.01 




(0.59) 






trade87 

0.98 




0.01 




(0.73) 






share87 

0.42 




0.01 




(0.38) 






happy87 

0.78 




0.01 




(0.62) 






home2004 


1 




(0.76) 









notime20 


0.92 




0.01 




(0.71) 





juvdel20 


0.73 




0.01 




(0.55) 





trade200 


0.98 




0.01 




(0.72) 





share200 


0.42 




0.01 




(0.38) 





happy200 


0.78 




0.01 




(0.6) 
Here is my needlessly complicated path diagram for the model.
Limitations
First, the normality assumptions that were violated with abandon should be examined more closely, with models that take into account the ordinal nature of the data. Given the large sample size, and that the data wasn't too abnormal, I would expect similar results, but this should be checked.
Second, the sample that was left in 2004 with full data across all waves is only 4,686 individuals out of an original 12,686 in 1979, with 7,496 left in 2004. An examination of missing data patterns, with possible imputation, might get at this. Related to this, the sample should be weighted.
Third, while we have reason to claim factor invariance across time withinperson, this analysis does not allow one to claim that the factors are invariant across groups (or groups by time). Analyses broken out by race, gender, or other empirically/theoretically interesting variables should be done.
