Safeguarded AI: constructing safety by design

0.03

attributed to: David "davidad" Dalrymple

Imagine a future where advanced AI powers breakthroughs in science, technology, engineering, and medicine, enhancing global prosperity and safeguarding humanity from disasters—but all with rigorous engineering safety measures, like society has come to expect from our critical infrastructure. This programme shall prototype and demonstrate a toolkit for building such safety measures, designed to channel any frontier AI’s raw potential not only responsibly, but legibly and verifiably so. This programme envisions a pathway to leverage frontier AI itself to collaborate with humans to construct a “gatekeeper”: a targeted AI whose job is to fully understand the real-world interactions and consequences of an autonomous AI agent, and to ensure the agent only operates within agreed-upon safety guardrails and specifications for a given application. These safeguards would not only reduce the risks of frontier AI and enable its use in safety-critical applications, they would also unlock the upside of frontier AI in business-critical applications and commercial activities where reliability is key[...]. At the end of the programme, we aim to show a compelling proof-of-concept demonstration, in at least one narrow domain, where AI decision-support tools or autonomous control systems can improve on both performance and robustness versus existing operations, in a context where the net present value attainable by full deployment is estimated to be billions of pounds. Some examples of potential such early demonstration areas include: balancing electricity grids, supply chain management, clinical trial optimisation, and 5G beamforming/subchannel allocation for mobile telecommunications networks. If successful, this would in turn produce a scientific consensus that “AI with quantitative safety guarantees” is a viable R&D pathway that yields key superhuman capabilities for managing cyber-physical systems, unlocking positive economic rewards—while also building up large-scale civilisational resilience, thereby reducing risks from humanity’s vulnerability to potential future “rogue AIs” to an acceptable level within an acceptable time frame.

open PDF in new tab

0.03

Vulnerabilities & Strengths