Differential Item Functioning, Part 3: Example Measures

 

Differential item functioning (DIF) has received increasing amounts of attention over the past few decades.  Previously, I described how measures of DIF can be categorized based on whether they focus on observed scores vs. latent variables, whether they use parametric vs. nonparametric methods, and whether they are designed for dichotomous vs. polytomous items.

One common measure for DIF is the Mantel-Haenszel chi-square, a nonparametric test for bias in observed scores. It examines the count of actual passes and failures compared to what should be expected. It is popular due to its [relative] simplicity, the ease of collecting data for it, as well as its lenient sample size requirements. The only data needed is simple categorical data and is in some ways easier to explain to the average non statistical person.  However, it has difficulty when confronted with non-uniform DIF. That is when the data is not consistently uniform throughout the results leading to harder to interpret results.

Another common approach is logistic regression, which can be applied as parametric test of observed score bias (for dichotomous items only).  This can be used to examine how a set group membership ( e.g. male or female) influences the probability of a correct answer, while controlling for any number of potential confounds, such as performance ratings, education, or tenure.  In addition, logistic regression can be seen as more useful as it is capable of detecting non-uniform DIF.