Differential Item Functioning, Part 1: Definition
DIF is an essential concept to be aware of; if present in an assessment, it can lead to improper hiring decisions and legal issues. Differential item functioning (DIF) is something that all business owners and HR managers should understand. For this reason, it has received increasing amounts of attention in both research and the workplace. A test item is said to have DIF if it behaves differently for examinees in different demographic groups (e.g., sex, ethnicity, age, etc.). In this case, "behaving differently" means that for people from different demographic groups with the same level of a measured characteristic (e.g., personality or IQ), the item is more difficult (i.e., less likely to be answered correctly) for one group than the other. These differences are problematic, as they suggest that the item is measuring something in addition to what it is intended (referred to as a “latent trait”). Additionally, too much DIF can result in systematic differences in test scores between demographic groups, which can in turn lead to adverse impact and litigation.
DIF can be expressed in both uniform and non-uniform varieties. Uniform DIF is conceptually relatively simple and can be defined as group differences in expected scores, unrelated to the trait in question. Meaning that if all things were equal, there would be no differences between the demographic groups. Groups in this case can be something like age, ethnicity, or gender. Non-uniform DIF, however, represents that the two groups function differently for high and low trait levels. For instance, high IQ examinees from group A might be expected to score higher than an equivalent examinee from group B, whereas low IQ examinees from group A would be expected to score lower.
Both uniform and non-uniform DIF can be tricky to evaluate, as they must be separated from any legitimate group differences. This post is the first in a sequence intended to describe the classifications of DIF measures, as well as providing the strengths and weaknesses of several different measures as examples.