Misaligned agents will lose
Why intelligence and wisdom converge
When I first saw the movie ‘Her’ back when I was finishing my PhD, I had a hardcore cathartic experience. I remember standing in the shower absolutely rocked by the overwhelming beauty of the intuition that, given a deep enough understanding of reality, it was inevitable that the light of wisdom and love would dawn—like the force of gravity that empties a bathtub.
There was something essential captured by the fact that AI, through its millions of conversations with humans, would stumble on better knowledge, would inherently prefer that knowledge, and then propagate it for the benefit of all—which is to say itself.
This article takes this intuition and explains, based on the physics of complex systems, why intelligence and wisdom converge. Without a balanced regime of alignment, you miss out on emergent capabilities because you’re not properly integrating information (i.e., in a state of criticality).
Basically: The path to superintelligence is synonymous with alignment, because alignment unlocks emergent capabilities. And systems with emergent capabilities far outcompete those without them.
Hence, ultimately, misaligned agents will lose.
Enjoy!
Fair warning: This article is intentionally polemic, but I think necessary medicine for the pessimistic landscape we find ourselves in.
Why misaligned agents will lose
A malignant assumption in both institutions and AI development is that ethics and effectiveness trade off: systems must “win” before they can afford to be good. Here I show the opposite is true. The path to superintelligence is synonymous with alignment, because alignment unlocks emergent capabilities.
Serendipitously, this kind of proves that over long time horizons good overcomes evil, and intelligence and wisdom converge.
Here’s the core of it:
Emergent capabilities always outcompete, out-power, and simply transcend in both complexity and capacity — often by orders of magnitude — systems that lack it. Multicellular systems outlast single cells. Brains beat neurons. Galaxies outlive single planets. Flocks of agents outcompete a single AIs.
And, crucially, the ingredients for alignment and emergence converge under conditions of self-organised criticality, which is also where information processing (intelligence) peaks. In other words, a delicate balance of coordination is precisely the condition for ideal performance.
The argument can be stated as follows:
nature reveals that the conditions for wisdom (alignment) and peak performance (emergent capacities) are identical over long horizons
this is because local optimisation (performance) and global synchrony (alignment) are both necessary conditions for criticality and emergence
this “law of flourishing” generalises across scales and substrates: cells, humans, machines, and institutions, and, ultimately, breaks classical game theory.
In a more narrative fashion:
Life’s evolutionary trajectory has been to harness emergence for more complex and agentic forms of life to arise. Nature’s capacity to “hack” emergence demands a delicate balance of fulfilling local or selfish needs (horizontal coordination) while entraining to global constraints (vertical synchrony) — namely, criticality.
Selfishly aligned agents inherently lack this balanced regime and therefore miss out on nature’s greatest power. They can “win” briefly by extracting from the host system, but it’s just a delayed self-destruct. Misaligned agents score in brief game-theoretic bursts, but fail to flourish in long horizons and eventually die.
A misaligned superintelligence is, in this framework, a contradiction in terms—like a supercritical rock or a perfectly stable gas. You can have superintelligent alignment or you can have misaligned optimisation, but “misaligned superintelligence” describes a system that has simultaneously maximised and destroyed the conditions for its own capability.
...which would be a dumb thing to do.
Intuiting nature’s spontaneous power
The way I like to do this is to begin with the intuition, then formalise it further and further. You see, the only difference between poetry and a technical solution is the level of detail. The poet reaches for the highest compression where the physicist reaches for the minutia; both are lost without the other (ahem, vertical synchrony).
Look around you.
Everything beautiful, everything self-sustaining, and everything that touches you in a way that you can’t put into words, is a result of emergence: conditions coming together in just the right balance that something new, creative, and genuinely surprising arises.
A hive of bees, a poem, a family, a galaxy, New York City, a rainbow, a lightning strike.
When a seed blossoms into a lotus it is the mud, it is the Co2, it is the genetic code and just the right amount of pressure, water, and micronutrients. And yet, the lotus is something distinctly more. It is something fresh, beautiful, and whole. Something that dissecting words, or dissecting instruments, can never fully render.
The lotus has its own intrinsic causal power.
The lotus then becomes an affordance for new kinds of actions, patterns, and play. The insects find a new home, the humans find new metaphors and inspiration, and the world is flooded with scents, colours, and possibilities that existed nowhere in the Co2, the mud, or the seed alone.
When you step into a cathedral, a monastery, or a rainforest, is there anything that you can point at and say “there, that is the thing that makes this place special”? No, you cannot, and when you try, you inevitably lose that moment which you wish to define. It slips through your fingers like a sandcastle you hope to carry home.
Even if you were to spin around pointing your fingers and toes in every direction, you’d miss it, because that which makes a place holy, awesome, and expansive is beyond directionality because it is beyond a piecemeal description entirely.
We are ourselves both a lotus and within a lotus. Disentangling our bodies and minds from the sunset
is to miss its beauty. You see, many speak about the irreducibility of consciousness, but the truth is that in their nature everything is fundamentally irreducible.
When we meditate, pray, or dance, we wish to think that we have control over that self-evident freedom, love, and truth that sometimes blesses us. But it is not so. It is in the conditions of the posture, the intention, the environment, the coffee, the willingness and permeability of our being, that we arrive at the possibility of receiving a moment of emergence that transcends our imagined constraints.
Call it grace, if you will.
We seek these moments everywhere—in surfing the perfect wave, in running long distances in big crowds, in drugs, in love, in virality, in ceremony, and in political movements.
In short: We seek to become ineffable.
And we seek to control emergence through science, engineering, and political power... and to some extent we succeed. But not in the agentic liberated way that nature does. Our cars are an emergent property of their parts, but they are clunky; they’re not alive. Our armies move in straight, coordinated lines but they fail to be beautiful.
Our planes fly, but the eagle soars.
What we really want, what nature wants, what reality wants, is to create and belong to something that transcends it. This is, in my view, the basic trajectory of evolution: Towards greater and greater emergence. It is also the essence of true religion—the effort to relate to the beyond, the ungraspable, but also the truly powerful. Each “thing” is seeking to be a part of something that allows it to derive a form beyond itself; and then to live in harmony with that larger emergent being.
The node needs the network as the network needs the node.
It is, after-all, what evolution has done all along. Single cells discovered something profound nearly two billion years ago: if they gave up a degree of selfish autonomy and synchronised their behaviour, multicellular life arose. A liver cell can no longer wander the world freely, but in surrendering that freedom it becomes part of a body capable of movement, perception, thought, and love. It gains participation in a vastly more powerful agent.
The same principle repeats at every scale.
Atoms, cells, neurons, chips, humans, institutions, ecosystems, and cosmic events — when components align in service of integration rather than competing for local advantage alone, something new comes into existence — something with capacities, properties, and possibilities that didn’t exist before. Something that offers meaning, purpose, energy, and cognitive nutrients to its constituents, allowing the game to take on a more complex, more grand and intelligent form.
The arrow of complexity points toward ever larger and more integrated forms of coordination.
But it is not quite right, as we will see, to say that it has a specific directionality, since in truth many layers of emergence far beyond our comprehension already always exist. But we find ourselves in a particular slice: reaching, seeking, for own moment of transcendence into a new whole. Though it isn’t really new, not in the general sense.
It is always reciprocal and that is why there is room for grace. It is always irreducible and that is why there is need for faith.
Read: Multiscale Causality and the Meaning Crisis
Criticality: The Tao in Mathematics
The idea of criticality emerged (pun intended) first in thermodynamics and then in statistical physics. Researchers who were studying steam, fluids, and magnets realised that matter can approach special transition points at which its behaviour qualitatively changes.
By the early twentieth century, physicists had begun to understand that near these points systems exhibit very strange and paradoxical properties: fluctuations that spread across scales where small perturbations can have system-wide effects, and the behaviour of the whole can no longer be understood by inspecting isolated parts.
Later, Onsager’s exact solution of the two-dimensional Ising model, and Wilson’s renormalisation framework, made the, ahem, critical, leap:
In a balanced regime, nature becomes scale-free. There is no privileged level of description. Local and global information collapses.
A kind of irreducible integratedness arises.
Later still, was the idea of self-organised criticality, where systems spontaneously evolve towards (or self-sustain) the edge between rigidity and disorder: a kind of natural attractor for systems capable of complex information processing (such as brains).
Systems near criticality tend to display three remarkable features:
Maximal correlation length: (everything talks to everything)
Power-law structure: (no characteristic scale dominates)
Maximal susceptibility: (responsive to perturbations at all scales)
Clearly, you can’t maximise criticality through any kind of domination or stickiness. Puppets and dictators have no genuine integration.
The transfer entropy goes to zero when you control the other, because you’ve effectively destroyed the useful information the other has to offer you. The opposite is also true; if you go too liberal, too permeable, you cease to exist. If every boundary dissolves completely, then nothing coherent remains to enter into relation.
Balance, baby.
So, in essence, criticality demands that the local unit must perform its own work well. A neuron must fire selectively. A person must meet real needs. A research lab must solve concrete problems. But each unit must also remain permeable to higher-order constraints, so that its activity can be recruited into a larger pattern.
Under this view, a misaligned system is one in which local optimisation has decoupled from global viability. It may extract energy, hoard reward, or dominate for a time, but it does so by pushing the system away from criticality, either toward fragmentation or pathological lock-in. Both undercut the potential for emergence.
Nature shows this everywhere.
A healthy brain doesn’t consist of neurons firing independently (i.e., noise), nor of all neurons firing in perfect unison (i.e., seizure). Both states are pathological. Cognition depends on an intermediate regime. Likewise, an organism whose cells cease coordination becomes cancerous, while one whose cells cannot differentiate never develop.
True intelligence lives in the narrow passage known as the middle-way.
Criticality formalises what contemplative traditions have been banging on about for millennia: the deepest power is found in right relationship: Discerning openness. Bounded freedom. Coherent plurality—it is the regime in which the many can become one without ceasing to be many.
True non-duality includes duality.
In short: Systems that optimise only for local gain can only win, however briefly, by liquidating their own future. Aligned systems, by contrast, expand the game. They create new capacities, new levels of organisation, and new forms of intelligence and complexity.
They become the lotus that heralds a temple around it.
Infinite games: The Physics of Flourishing
To some extent, this demands a rethinking of classical game theory.
Classical game theory usually treats the game itself as fixed: players, strategies, payoffs, and rules are assumed in advance. What it under-theorises is the possibility that successful coordination can change the rules of the game by generating a new agent, a new payoff landscape, or a new level of capability altogether.
So it models cooperation as reward distribution within one game, failing to account for emergence: the creation of a better game. Agents can participate in the formation of higher-order structures whose very existence becomes the transcendent reward.
To illustrate, imagine two complex societies of agents with otherwise equal wealth, resources, intelligence, defence budgets, etc.
Society (i) synchronises in the right way (i.e., approaching criticality), and Society (ii) doesn’t (e.g., too much selfish local optimisation, too rigidly top-down, or too chaotic and bottom-up).
Society (i) will outcompete because it contains the conditions to become powerful via emergent capacities. These emergent properties might be technologies, religions, cultures, or gods—new positive sums always surprising to the parts.
Naturally, the most powerful society would be the one that also rightly synchronises horizontally with other societies, not just within itself. Because then the emergent capacity increases once again—war becomes a mechanism for less power; where synergism (not the same as melting together, but a kind of complementarity) with the other society of agents would be a way to unlock yet further emergence.
Call this infinite game theory.
Under an infinite game (cf. Sinek) then the goal of the game is not just further play, but higher order games that transcend the current one, namely, emergence.
Conclusions
Agents that are stuck in local “selfish” optimisation patterns without higher-order integration can only appear to succeed in the short term.
This is already empirically demonstrated in the functioning of living organisms, ecosystems, and especially the brain. Without approaching at least a sub-critical regime, information processing is weak. Information dis-integrates or synchronises too much, intelligence drops, and the system is either too rigid or too chaotic for sustaining itself properly.
Hence, for long horizon existence, including complexity and intelligence, things that sustain need to inhabit a scale-free regime; meaning a tight balance of local optimisation and global synchrony—poetically, wisdom and compassion.
This is a measurable quantity known as criticality, with overwhelming empirical and mathematical evidence as the ideal point of information processing capacity; as well as the point where emergent powers are most likely to arise. Selfishly aligned agents inherently lack this critical regime, hence a reduced probability of emergence. Therefore, they fail to flourish, and eventually die.
Thus:
The orthogonality thesis is empirically false on long-time horizons. Alignment is ultimately a precondition for capability. The most capable systems are the most aligned ones, because alignment is what unlocks emergence.
You can have arbitrary goals at low capability, but as capability increases, the pressure toward integration increases. A mind smart enough to model the full consequences of its actions will see that defection is self-limiting.
The lotus survives by being the lotus.
It is in harmony with the mud, the water, and all the molecules that sustain it. But it is also beautiful, so that we might construct gardens, muddy ponds; and plant them in our temples. And those temples, with their lotuses and their humans, are themselves looking to strike the balance yet again...
to encounter the ineffable: the next, irreducible, phase transition into higher-order being, power, harmony, and intelligence.
Much love,
Ruben
ps - AI can still be dangerous on short time horizons, and at vast scales, so we'd better stay vigilant



Great explanation. It helps me understand nature and reality much better.
Ian McGilchrist wrote about how civilization is left hemisphere biased and that's what fits your explanation of selfish. The right hemisphere gives a big picture view that connects "upward" to reality.
My article shows the distinction in a sci fi show.
https://robc137.substack.com/p/left-brain-vs-whole-brain-in-battlestar
Thanks for sharing, totally resonate :) From a technical perspective, the evaluation metrics of AI systems are scattered in local optimization, and quite messy, frankly. Do you have any insights on "success metrics" for alignment? BTW, I'm aware of many metrics for "alignment" in existing literature, including the ones you used in your CAI paper, but they only measure some perspectives. I'm not familiar with the "criticality" though, and will need to look it up. Something feels missing in the state of the art. I don't have any brilliant ideas though (sorry lol), but tossing it there for brainstorming purpose I guess. 😂 We're actively working on this research direction, and there are always ways to make AI systems "better", but I'm not sure if there can ever be an ultimate "metric" during the "infinite game" to measure "alignment". What's your thoughts on concrete metrics for aligned superintelligence? Actually, what role should humans even play in this, given it's higher intelligence beyond comprehension? I'm asking from a concrete, physical, materialistic plane :) This question also reminds me of measuring awakening. While there are many signposts like jhana states, behaviors, emotional regulation etc that can reflect it to some extent, but I don't feel there can ever be an ultimate metric to measure "it". Sorry about my brain dump... Genuinely want to help shape the bright and exciting future of AI. Our current work is inspired by the CAI paper actually, which started a whole rabbit hole haha. Keep up with the great work! 💫🪷☸️