cover image: Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

20.500.12592/2ygxhg3

Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

12 Sep 2024

As distorted maps may mislead, Natural Language Processing (NLP) models may misrepresent. How do we know which NLP model to trust? We provide comprehensive guidance for selecting and applying NLP representations of patent text. We develop novel validation tasks to evaluate several leading NLP models. These tasks assess how well candidate models align with both expert and non-expert judgments of patent similarity. State-of-the-art language models significantly outperform traditional approaches such as TF-IDF. Using our validated representations, we measure a secular decline in contemporaneous patent similarity: inventors are “spreading out” over an expanding knowledge frontier. This finding is corroborated by declining rates of multiple invention from newly-digitized historical patent interference records. In contrast, selecting another single representation without validating alternatives yields an ambiguous or even opposing trend. Thus, our framework addresses a fundamental challenge of selecting among different black-box NLP models that produce varying economic measurements. To facilitate future research, we plan to provide our validation task data and embeddings for all US patents from 1836–2023.
data collection econometrics industrial organization market structure and firm performance development and growth productivity, innovation, and entrepreneurship development of the american economy innovation and r&d

Authors

Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas F. Reynolds

Acknowledgements & Disclosure
The views expressed in this paper are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Philadelphia, the Federal Reserve System, or the National Bureau of Economic Research. No statements here should be treated as legal advice. Any errors or omissions are the responsibility of the authors. Acknowledgements: We gratefully acknowledge support from an NBER Innovation Policy Grant. We also received excellent RA support from Aaron Rosenbaum, Joseph Huang, Cameron Fen, Annette Gailliot, Jake Moore, and Isaac Rand. Finally, we received useful feedback from Matt Clancy, Darya Davydova, Gaétan de Rassenfosse, Luise Eisfeld, Deanna James, Semyon Malamud, Roxana Mihet, participants of the seminar at EPFL, and participants of the NBER Innovation Information Initiative Technical Working Group Meeting and TADA 2023. First version: December 21, 2023.
DOI
https://doi.org/10.3386/w32934
Pages
68
Published in
United States of America

Table of Contents

Related Topics

All