Authors
Ina Ganguli, Jeffrey Lin, Vitaly Meursault, Nicholas F. Reynolds
- Acknowledgements & Disclosure
- The views expressed in this paper are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Philadelphia, the Federal Reserve System, or the National Bureau of Economic Research. No statements here should be treated as legal advice. Any errors or omissions are the responsibility of the authors. Acknowledgements: We gratefully acknowledge support from an NBER Innovation Policy Grant. We also received excellent RA support from Aaron Rosenbaum, Joseph Huang, Cameron Fen, Annette Gailliot, Jake Moore, and Isaac Rand. Finally, we received useful feedback from Matt Clancy, Darya Davydova, Gaétan de Rassenfosse, Luise Eisfeld, Deanna James, Semyon Malamud, Roxana Mihet, participants of the seminar at EPFL, and participants of the NBER Innovation Information Initiative Technical Working Group Meeting and TADA 2023. First version: December 21, 2023.
- DOI
- https://doi.org/10.3386/w32934
- Pages
- 68
- Published in
- United States of America
Table of Contents
- Introduction 3
- Framework and Pipeline 9
- Data 10
- Representation: Mapping Patents to Idea Space 10
- Visualizing Idea Spaces 14
- Measuring Concepts from Patent Text 14
- Validation-Based Selection 16
- Validation Task Results 19
- Interferences 19
- Interference Data 19
- Interference Results 20
- Non-Expert Human Judgment 23
- Human Annotation Results 25
- Exploring LLMs for Scalable Patent Similarity Validation 26
- Patent Office Classifications 26
- Model Selection is Critical for Downstream Economic Measurement 29
- Similarity Dynamics Within and Across Patent Office Technology Classifications 32
- Declines in Interferences 33
- Conclusion 34
- Visualization of Embedding Spaces 40
- Methodology 40
- Plotting 40
- Instructions for the Non-Expert Human Judgement Task 43
- LLM prompt for patent similarity assessment 47
- LLMs for patent similarity assessment 48
- LLM-based Results 49
- Revisiting Breakthrough Patents with Validated Patent Representations 51
- Why are Deep Learning Models Better? An In-Depth Look at Why S-BERT is Better than TF-IDF. 55
- Example: Bicycle versus Velocipede 55
- TF-IDF Overweights Period-Specific Words versus Universal Synonyms 57
- Google Ngrams Analysis 59
- Synonyms Analysis 61
- Why is S-BERT Better? Conclusion 64
- Miscellanea 67
- Photograph of the Register of Interferences 67