Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Yi Chen; Hanming Fang; Yi Zhao; Zibo Zhao

Categorical variables have no intrinsic ordering, and researchers often adopt a fixed-effect (FE) approach in empirical analysis. However, this approach has two significant limitations: it overlooks textual labels associated with the categorical variables; and it produces unstable results when there are only limited observations in a category. In this paper, we propose a novel method that utilizes recent advances in large language models (LLMs) to recover overlooked information in categorical variables. We apply this method to investigate labor market mismatch. Specifically, we task LLMs with simulating the role of a human resources specialist to assess the suitability of an applicant with specific characteristics for a given job. Our main findings can be summarized in three parts. First, using comprehensive administrative data from an online job posting platform, we show that our new match quality measure is positively correlated with several traditional measures in the literature, and at the same time, we highlight the LLM's capability to provide additional information conditional on the traditional measures. Second, we demonstrate the broad applicability of the new method with a survey data containing significantly less information than the administrative data, which makes it impossible to compute most of the traditional match quality measures. Our LLM measure successfully replicates most of the salient patterns observed in a hard-to-access administrative dataset using easily accessible survey data. Third, we investigate the gender gap in match quality and explore whether there exists gender stereotypes in the hiring process. We simulate an audit study, examining whether revealing gender information to LLMs influences their assessment. We show that when gender information is disclosed to the GPT, the model deems females better suited for traditionally female-dominated roles.

econometrics public economics labor compensation estimation methods labor economics labor studies labor supply and demand demography and aging

Authors

Yi Chen, Hanming Fang, Yi Zhao, Zibo Zhao

Acknowledgements & Disclosure: This paper was presented at Shanghai University of Finance and Economics, Jinan University, Shanghai Jiaotong University, and the 25th Quarterly Forum of China Labor Economists Forum. We are grateful for the feedback from all the participants. All remaining errors are our own. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

DOI: https://doi.org/10.3386/w32327

Published in: United States of America

Recovering Overlooked Information in Categorical Variables with LLMs: An Application to Labor Market Mismatch

Authors

Related Topics

Share artifact

Add to list

Citation

Full-page Screenshot