cover image: U C B E R K E L E Y

U C B E R K E L E Y

14 May 2024

CONCLUSION: OVERALL SUMMARY AND NEXT STEPS 38 v B E N C H M A R K E A R L Y A N D R E D T E A M O F T E N ACKNOWLEDGMENTS 44 ABOUT THE AUTHORS 45 REFERENCES 46 vvii B E N C H M A R K E A R L Y A N D R E D T E A M O F T E N Executive Summary A critical question has emerged for AI model developers, policymakers, and others aiming to improve the safety and security of cutting-edge or “frontier” AI fo. [...] (The accuracy of both benchmarking and red teaming can be limited by factors such as the presence of guardrails that cause the AI system to refuse to answer questions or prevent it from answering questions in a way that shows its underlying capabilities.) We use the term hazard to refer to a potential for harm from misuse of a model’s capability or a rating of the magnitude of that potential for h. [...] Much of the public discussion about the potential for the weaponization of foundation mod- els for more effective CBRN and cyber attacks has been overly theoretical, not well grounded 6 B E N C H M A R K E A R L Y A N D R E D T E A M O F T E N in existing examples, and not tied to a structured framework for context-specific assessment. [...] For example, we recommend the use of benchmarks, red teaming, and other evaluations for issues outside the focus of this paper, including for better understanding a broader set of risks, such as how models can support evasion, deception, and influence operations, as well as other kinds of events that can have 9 B E N C H M A R K E A R L Y A N D R E D T E A M O F T E N high-magnitude adverse impact. [...] 28 B E N C H M A R K E A R L Y A N D R E D T E A M O F T E N Figure 4 outlines the main steps of the BRACE Framework approach that we recommend for foundation model developers and evaluators, as part of pre-release evaluations of a model.

Related Organizations

Pages
62
Published in
United States of America