Abstract
Measurement bias poses a significant challenge to valid inference in education and psychology, particularly when measures are used to compare groups or evaluate intervention effects. Traditional approaches to detecting measurement non-invariance, such as multigroup confirmatory factor analysis (MGCFA), are often limited in their ability to capture complex, intersectional sources of bias. This dissertation addresses these limitations by examining how measurement bias arises, how it is currently diagnosed, and how it can be more effectively modeled across applied, simulation, and intervention contexts. The first study applies Moderated Non-linear Factor Analysis (MNLFA) to data from the 2015 Programme for International Student Assessment (PISA) to investigate multiple sources of non-invariance simultaneously, including language, socioeconomic status (SES), immigration status, and gender. Results show widespread non-invariance across items and meaningful changes in individual scores after accounting for these factors. The second study uses Monte Carlo simulation to evaluate the performance of MGCFA under model misspecification, demonstrating that partial invariance models can mask fundamental structural problems and produce misleadingly acceptable model fit. The third study revisits item-level heterogeneous treatment effects (IL-HTE) in intervention research, showing that observed item-level heterogeneity is sensitive to modeling choices and often attenuated when accounting for regularization, covariates, clustering, and intersectional interactions. Taken together, these studies highlight that measurement bias is often complex, multidimensional, and context-dependent. The findings underscore the importance of flexible modeling approaches and careful diagnostic practices, and they provide guidance for improving the validity and interpretability of measurement in empirical research.