What medicine taught me about deploying AI safely

“Do no harm” is the underlying principle of medicine and a working constraint that shapes every decision, from the doctor at the bedside to the industry that produces the treatments. Less obviously, this principle is what produced medicine’s defining practice: building scaffolding around the introduction of new treatments or investigations that lead to better outcomes for patients. Clinical trials, ethical oversight, regulation - none of these scaffolds are only about whether a treatment works. They are about how it is brought into contact with people - which patients first, what evidence justified the decision, and how harm is detected when it shows up.

What this means is that deployment frameworks matter as much as the capability being deployed. Medicine has spent a century perfecting this, because capability without scaffolding can produce more harm than capability never deployed at all. The capability tells you what could work; the framework tells you whether it reaches people safely and leaves them better off.

What this means for AI

Treatments, whether pills or surgery, and the investigations that precede them, all rely on an underlying capability - a compound that works, a technique that achieves what it should, a test that detects what it is looking for. But it is the scaffolding around them that turns those capabilities into something safe to put into the hands of patients.

AI is now reaching people faster than the scaffolding around it might catch up - through healthcare triage, legal aid, medical advice and other ways that do not announce themselves as AI.

The decisions being made today about how models are trained, evaluated, deployed, and integrated will narrow or widen the gap between evolving capabilities and the scaffolding around them. A widening gap could lead to two failure modes:

Technology that turns out to be harmful only after deployment - thalidomide is the classic case, and it is what produced modern drug approval. Thalidomide was deployed widely in the late 1950s, caused thousands of birth defects, which directly led to modern drug approval frameworks like the Kefauver-Harris Amendment.
Technology that could be useful but gets rejected wholesale because earlier careless deployments poisoned the well - which is where vaccine hesitancy now sits.

How the gap closes

The gap could close faster when capability work and safety work share the same building. The clinical trial framework was not designed in isolation from medical practice - it was designed through collaboration with clinicians who could see the failure modes from inside the system.

The same pattern seems to be holding in AI: the places doing serious safety research are also the places shipping models people use. That coupling matters, because it is what keeps the safety work tethered to the actual shape of the technology rather than an abstracted version of it. The capabilities will continue to improve quickly. The frameworks are where the harder, slower work sits, which will decide how AI integrates safely into our lives.