AI systems

The Fire We're Trying to Build

Toward self-evolving AI and the safeguards it demands

Fire changed the trajectory of our species, and I have been trying to think through how the arrival of systems that can improve themselves—through continual learning and self-evolution—might bend our future the same way.

Nearly every perilous technology we have built shares one quiet virtue: it stays where you put it. A knife sits on the table. A warhead sits in its silo. Even a loaded gun does nothing until a hand reaches for it.

Fire was the exception. Once lit, it could spread and grow with no further help from us. It was the first thing we made that carried its own momentum. And yet fire also cooked our food, warmed our nights, and eventually turned our engines. We never tried to abolish it. We learned it well enough to build hearths, furnaces, and turbines—vessels that let it do work without burning the house down.

Self-evolving AI is the first thing we have built since fire to share its defining trait. A static model stays where we put it; a model that can rewrite itself does not—set it going and it can grow on its own. That is what makes it the next fire. The question was never whether to light it. The question is whether we understand it well enough to benefit humanity.

Levels of Autonomy

Self-driving gave us a useful vocabulary for talking about machine autonomy. The industry doesn't ask whether a car is autonomous or not; it grades autonomy in levels, from a car that does nothing on its own to one that needs no steering wheel at all. The levels are clarifying precisely because each one hands the machine a decision that used to be ours.

AI deserves the same ladder. I find it helpful to think in three levels, and to ask at each rung the same question: what is the model now allowed to decide for itself?

Level 1 · Executionhow to act

Level 2 · Learningwhat to learn

Level 3 · Goalwhat to want

Borrowing the framing of autonomy levels from self-driving. Each level hands the model one more decision. The engineering challenge climbs with every step—and so does the safety challenge.

Level 1: execution autonomy. This is where we are today, and it is further along than people give it credit for. A base model can write code and call tools; wrap it in a good harness and it will run a task for hours, choosing which step comes next and when to stop. But all of that autonomy lives in execution. The goal is ours, and the weights never move. Picture everything the model knows as a sphere—a large one. We can prompt it to travel anywhere inside that sphere, but the boundary is fixed. It can act, tirelessly. It cannot grow.

Level 2: learning autonomy. Here the model earns the right to change itself. Point it at a hard problem—an open conjecture, a stubborn disease—and it doesn't merely attempt answers. It chooses what to learn from, runs experiments, keeps what works, and folds the results back into its own weights. The sphere expands, but only in the direction we pushed it. This is not reinforcement learning as it's usually practiced: the prompts and data aren't laid out in advance, only the goal is. The model sets the curriculum; we still set the subject.

This is the leap that matters most in the near term, and it takes more than a longer context window. Context gives a model temporary access to information. A weight update compresses experience into something durable—new circuits, abstractions, heuristics, priors. GPT-3 could not have been built by stuffing GPT-3's training data into GPT-2's prompt. In the same way, a model that learns on the job has to absorb what it verifies into its weights, not just carry more history in its context.

Building such a system takes three things. The first is expandable capacity: the model should grow with novelty-weighted experience, not just more raw samples. The second is plasticity-aware optimization: the optimizer has to decide where and how much to change the model without erasing what it already knows. The third is a closed loop: the model generates its own tasks, attempts them, verifies the results, keeps the discoveries, distills the useful traces, and updates itself.

Level 3: goal autonomy. At the top, the model chooses the goal itself. It notices what it doesn't know, decides the gap is worth closing, and goes—without waiting for us to hand it a problem. This is where curiosity becomes an engine: the system is no longer only solving assigned tasks, but deciding which unknowns are worth pursuing. Alyosha Efros once put it to me well: composing music like Bach isn't the intelligent part; wanting to compose like Bach is. Level 3 is the wanting.

But "full autonomy" hides a fork, and once again self-driving drew the line first. Its top tiers separate a car that drives itself anywhere from one that drives itself only inside a mapped, bounded area—the geofence. Goal autonomy splits the same way. A model can set its own goals inside an arena we define—find me a better superconductor, and it invents every sub-goal along the path—or it can set goals with no fence at all, free to decide that superconductors were never worth its time.

Those two systems differ in the only dimension that ultimately counts: safety. A bounded goal-setter that picks the wrong sub-goal wastes compute. An unbounded one that picks the wrong goal is a different kind of risk entirely. Nearly all of the hard alignment work lives on this fence line—not in whether the model can choose, but in whether we can choose where it is allowed to.

What a Learning Machine Could Do

Notice that none of the transformative near-term possibilities require Level 3. A reliable Level 2 system—its goal still ours, its learning its own—would already be a profound achievement. Point it at drug discovery, give it a year, and it might return with a therapy we wouldn't have found in a decade. Point it at materials science and it might hand back that room-temperature superconductor. These are not superintelligences. They are systems that can learn on the job, which is the one thing no model today can truly do.

Building the Hearth

We don't need to solve everything at once, and we certainly don't need goal autonomy tomorrow. What we need is to start building the hearth—the optimization methods, the architectural patterns, the alignment frameworks—that would let us safely light a fire that learns. Fire didn't become safe because we feared it. It became safe because we built the right vessel to hold it.