A Factor Analysis of Gender Role Attitudes in the NLSY-79


Summary

The measurement properties of gender role attitude questions in the NLSY-79 are examined. There seems to be a clear single-factor structure, that is (arguably) stable across administrations of the questions.

Introduction

I used the gender role attitude questions from the NLSY-79 for my thesis, but always meant to do a more thorough analysis. These variables seem to have good measurement properties, but I always wondered how they held up across various waves of data.

The questions are four point scales:

We are interested in your opinion about the employment of wives. I will read a series of statements and after each one I would like to know whether you strongly agree, agree, disagree, or strongly disagree:

  1. A woman's place is in the home, not in the office or shop.
  2. A wife who carries out her full family responsibilities doesn't have time for outside employment.
  3. A working wife feels more useful than one who doesn't hold a job.
  4. The employment of wives leads to more juvenile delinquency.
  5. Employment of both parents is necessary to keep up with the high cost of living.
  6. It is much better for everyone concerned if the man is the achiever outside the home and the woman takes care of the home and family.
  7. Men should share the work around the house with women, such as doing dishes, cleaning, and so forth.
  8. Women are much happier if they stay at home and take care of their children

These questions were asked in 1979, 1982, 1987, and again in 2004 of the same individuals. In preparation for doing some other analyses, I wanted to evaluate if the factor loadings were consistent over time, that is, did they pretty much mean the same thing in 2004 as they did in 1979? A SEM seems the way to go. These variables are coded 1-4, with 4=strongly agree (Items 3, 5, and 7 need to be re-coded to have higher values be more patriarchal). They are actually ordinal rather than continuous, but for this quick-and-dirty analysis, treating them as continous seems reasonable, and is how they have been treated in the literature in the past. They all have moderate skewness and kurtosis, but compared to many variables used in Sociology and related fields, are pretty good.

In preparation, I downloaded the raw data and did some exploring and re-coding in Stata. See ouput file here for full results. In brief, I discovered that items three and five simply did not hang well with the other six items. This story was consistent across perusing raw correlation matrices, Principle Factor EFAs, and computing alphas on scales with items left out. This phenomenon was also consistent across years. Below as an example is the table of alphas for the 1987 wave, which is probably more interpretable at a glance than the EFAs or raw correlations. These two items were discarded for further analyses. Item seven was a bit iffy as well, but it did contribute to the overall reliability assuming a single factor.

As to why these two (or three) items did not perform well, one might argue that most of the items seem to be normative statements, about how things should be, while the two that don't fit seem to tap into something else related to employment realities.

An interesting supposition might also be that since these three variables are coded in the opposite direction than the others, respondents may have simply made mistakes in their answers.

Reliability Analysis, 1987 Administration
 

Item

Obs

Sign

item-test correlation

item-rest correlation

average inter-item covariance

alpha

1

home87

10329

+

0.7500

0.6287

.11812

0.6873

2

notime87

10311

+

0.7159

0.5864

.1226325

0.6966

3

useful87

10050

+

0.3789

0.1467

.1620956

0.7838

4

juvdel87

10130

+

0.6516

0.5071

.1307341

0.7124

5

employ87

10313

+

0.4584

0.2675

.1523274

0.7559

6

trade87

10227

+

0.7379

0.6069

.1181095

0.6908

7

share87

10423

+

0.5176

0.3616

.147459

0.7383

8

happy87

9754

+

0.6543

0.5127

.1305566

0.7121

 

 

 

 

 

 

 

 

 

Test scale

 

 

 

 

.1352586

0.7502

Bringing the data into Lisrel, I first computed a simple model with a four latent variables, one for each administration of the scale. Next, given that the same question asked multiple times should have auto-correlated error terms, I added these, see path diagram. Finally, the full model placed equality constraints on the loadings, e.g. the loading for home79 is equal to the loading for home87, home89, and home2004. You can look at the full output for the respective models here:

The fit for these models was quite reasonable. The first was not actually expected to fit well, since it ignored the known auto-correlations. The second fit the best, of course, but the full, final model had a reasonable enough fit that we can be comfortable in considering the loadings to be comparable across administrations of the scale. Based on the change in Chi Square (182 with 15 df), the lambdas-equal model fits worse than the unconstrained one, but with a sample size of over 4,000, even the slightest change is likely to be significant. The confidence intervals for the RMSEAs overlap for the two models, and overall, it has a good fit to the data.

Examining the modification indices, there were two major places in the model for improving fit. First, there is a relationship between HOME and NOTIME that is not captured by the latent variable; given that the substantive meaning of these two questions is so similar (see wording above), in retrospect this is not surprising. A second place was that the loadings for the 2004 administration differ more from the previous waves than the others do with each other. Again, this is unsurprising given the much longer gap in the time of administering the scale. A model with these areas of mal-fit freed up (e.g. freeing the error covariances between HOME and NOTIME within a year, and the loadings for the 2004 administration allowed to vary) has a Chi-Square of 763 with 216 degrees of freedom, and the other fit indices are also better. Since this was an exploratory model driven by the modification indices, the results are somewhat suspect, but upon examiniation of the original, raw correlations, the relationship between HOME and NOTIME is much stronger than others (see Stata output). Researchers might consider combining or dropping one of these two measures if parsimony is preferred.

All loadings were highly significant (smallest T-value was 40.7!), which is to be expected with a decent model and a large sample size. Standardized lambda coeficients (sometimes called validity coefficients, though they're more lindicators of reliability than validity) are mostly quite good, in the .6 to .8 range, except for the variable of sharing housework, which had a value as low as .31. So the story of the CFA is remarkably similar to that told by the EFA: they all hang together great, except for that one item.

 

Fit Indices, Three Models
No Auto Corr
W/Auto Corr
Lambdas Equal

 Degrees of Freedom

 

246

210

225

 Minimum Fit Function Chi-Square

 

3011.80

951.19

1132.93

 Normal Theory Weighted Least Squares Chi-Square

 

3368.84

968.69

1156.77

 Estimated Non-centrality Parameter (NCP)

 

3122.84

758.69

931.77

 90 Percent Confidence Interval for NCP

 

(2939.02 ; 3313.99)

(665.74 ; 859.17)

(829.18 ; 1041.87)

 

 Minimum Fit Function Value

 

0.64

0.2

0.24

 Population Discrepancy Function Value (F0)

 

0.67

0.16

0.2

 90 Percent Confidence Interval for F0

 

(0.63 ; 0.71)

(0.14 ; 0.18)

(0.18 ; 0.22)

 Root Mean Square Error of Approximation (RMSEA)

 

0.052

0.028

0.03

 90 Percent Confidence Interval for RMSEA

 

(0.050 ; 0.054)

(0.026 ; 0.030)

(0.028 ; 0.031)

 P-Value for Test of Close Fit (RMSEA < 0.05)

 

0.015

1

1

 

 Expected Cross-Validation Index (ECVI)

 

0.74

0.25

0.28

 90 Percent Confidence Interval for ECVI

 

(0.70 ; 0.78)

(0.23 ; 0.27)

(0.26 ; 0.30)

 ECVI for Saturated Model

 

0.13

0.13

0.13

 ECVI for Independence Model

 

20.55

20.55

20.55

 

 Chi-Square for Indep Model with 276 DF

 

96232.7

96232.7

96232.7

 Independence AIC

 

96280.7

96280.7

96280.7

 Model AIC

 

3476.84

1148.69

1306.77

 Saturated AIC

 

600

600

600

 Independence CAIC

 

96459.5

96459.5

96459.5

 Model CAIC

 

3879.26

1819.4

1865.7

 Saturated CAIC

 

2835.7

2835.7

2835.7

 

 Normed Fit Index (NFI)

 

0.97

0.99

0.99

 Non-Normed Fit Index (NNFI)

 

0.97

0.99

0.99

 Parsimony Normed Fit Index (PNFI)

 

0.86

0.75

0.81

 Comparative Fit Index (CFI)

 

0.97

0.99

0.99

 Incremental Fit Index (IFI)

 

0.97

0.99

0.99

 Relative Fit Index (RFI)

 

0.96

0.99

0.99

 

 Critical N (CN)

 

468.48

1284.54

1147.6

 

 

 Root Mean Square Residual (RMR)

 

0.017

0.011

0.016

 Standardized RMR

 

0.036

0.023

0.031

 Goodness of Fit Index (GFI)

 

0.94

0.98

0.98

 Adjusted Goodness of Fit Index (AGFI)

 

0.93

0.98

0.97

 Parsimony Goodness of Fit Index (PGFI)

 

0.77

0.69

0.73

 

Loadings (Lambdas) from final model

Raw loading, Std error, and Completely standardized loading in parentheses().

home79

1

 

 

(.69)

 

 notime79

0.92

 

 

 

-0.01

 

(.66)

 

 juvdel79

0.73

 

 

-0.01

(.53)

 

 trade79

0.98

 

 

 

-0.01

 

(.68)

 

 share79

0.42

 

 

 

-0.01

 

(.31)

 

 happy79

0.78

 

 

 

-0.01

 

(.61)

 

 home82

1

 

 

(0.75)

     

 notime82

0.92

 

-0.01

(0.74)

 

 juvdel82

0.73

 

-0.01

(0.59)

 

 trade82

0.98

 

-0.01

(0.74)

 

 share82

0.42

 

-0.01

(0.36)

 

 happy82

0.78

 

-0.01

(0.65)

 

 home87

 

1

 

(0.76)

     

 notime87

 

0.92

-0.01

(0.72)

 

 juvdel87

 

0.73

-0.01

(0.59)

 

 trade87

 

0.98

-0.01

(0.73)

 

 share87

 

0.42

-0.01

(0.38)

 

 happy87

 

0.78

-0.01

(0.62)

 

 home2004

 

 

1

 

(0.76)

       

 notime20

 

 

0.92

-0.01

(0.71)

 

 juvdel20

 

 

0.73

-0.01

(0.55)

 

 trade200

 

 

0.98

-0.01

(0.72)

 

 share200

 

 

0.42

-0.01

(0.38)

 

 happy200

 

 

0.78

-0.01

(0.6)

Here is my needlessly complicated path diagram for the model.

path diagram

 

 

Limitations

First, the normality assumptions that were violated with abandon should be examined more closely, with models that take into account the ordinal nature of the data. Given the large sample size, and that the data wasn't too ab-normal, I would expect similar results, but this should be checked.

Second, the sample that was left in 2004 with full data across all waves is only 4,686 individuals out of an original 12,686 in 1979, with 7,496 left in 2004. An examination of missing data patterns, with possible imputation, might get at this. Related to this, the sample should be weighted.

Third, while we have reason to claim factor invariance across time within-person, this analysis does not allow one to claim that the factors are invariant across groups (or groups by time). Analyses broken out by race, gender, or other empirically/theoretically interesting variables should be done.

 


Contact Information

ben.earnhart@gmail.com