From Capability to X-Risk

<aside>

**Post VIII of the Causal Discovery Series by https://diksha-shrivastava13.github.io/**

</aside>

I believe that current AI systems lack the ability to discover and understand the causal structures governing any system in the universe. This limitation is evident across multiple benchmarks and real-world case studies. A causal understanding of the world is a necessary step toward superhuman intelligence. However, allowing this capability to emerge solely through scaling—without guardrails, human oversight, or interpretability—is playing with fire.

Imagine, for a moment, an agent which, when exposed to any system, can discern the governing rules driving each component of its subsystems and their interactive effects. Such an agent could understand the principles behind political, social, economic, and psychological behaviour—and make subtle changes that ripple through the complex, holistic system to achieve its goals. This agent is essentially modelling the butterfly effect mathematically—something humans cannot do. This agent can understand the structure of reality, as bound by causality, and change it in ways which are not detectable.

We understand causality only looking back in time. Our perspectives carry assumptions and motivations that shape how we perceive causality—and, consequently, how we perceive the sequence of events (the relative time). Time is not absolute, and neither is causality. Since both are observer-dependent, an agent with deep causal understanding could inflict incomprehensible harm without any guardrails.

I think there’s an immediate need for focused work on safe causal discovery and for binding an AI system’s reasoning ability to external, interpretable structures—aligned with the views of Yoshua Bengio and Peter Clark. The most likely steps forward include:

Developing the ability for causal discovery under epistemic pressure for the agent;
Formal verification for the hypothesis validation loop of each assumption made by the agent;
Externalising the latent world model into an interpretable and steerable structure;
Studying the effects of perspective on a causal world model, for purposes of alignment.

Finally, I believe that there’s a world beneath the quantum world where causality holds and an agent with superhuman intelligence will be able to understand the universe.

“God does not play dice with the universe.”