<aside>

**Post I of the Causal Discovery Series by https://diksha-shrivastava13.github.io/**

</aside>

During summer 2024, I worked on-contract as an AI Engineer for the Federal Ministry of Economic Cooperation and Development of Germany (BMZ) in partnership with the Digital Product School by UnternehmerTUM. There, I explored how Large Language Models (LLMs) could design Decision-Making Systems for unstructured reports, using a five-level hierarchy to support country policy decisions and negotiations.

At BMZ, I designed complex reasoning and analysis pipelines to answer critical questions by identifying relevant nodes, comparing similarities, and understanding relationships across all entity levels. I achieved this using a world model combined with policy models for subtasks, implementing multi-hop reasoning with an agentic approach. I experimented with fifty-four system modifications, including Property Graphs, GraphRAG, Agentic Planning-Reasoning Data for Mixture-of-Experts (MoE), Agentic Standardising of Reports, Reason-Guided Parsing, and various data representation approaches.

After my BMZ work, I became fascinated by a challenge in holistic systems: the difficulty of analysing correlations between subsystems and entities in large datasets, and translating these insights into machine learning problems. This led me to explore hybrid approaches for representing subsystem data and relationships, ultimately developing Swan AI. My research there centred on a key question: is there a way that Language Models can learn the hidden relationships between entities, identify similar links on the go and make predictions?

The Problem at BMZ: Designing a Decision-Making System for Country Policy Decisions & Negotiations

In the summer of 2024, I accepted a contract to work on a dashboard which automates data science for a GovTech product. I thought I would be building a multi-agentic workflow which performs EDA, a beautiful dashboard with toggles for each of the features which visualises the impact on the target and helps the government decide on factors like budget, implementation area, impact of rainfall (or the Ukraine war) and help make better decisions while protecting their projects from risks.

In hindsight, I think I was inspired by this Chess problem I had been working on in 2023: what is the maximum number of blunder moves a machine can make while still winning the game so that the human opponent doesn’t suspect it’s a machine (because of sheer perfection) and pass the Turing test?

Continuing ahead, I was expecting a fairly straight-forward multi-agentic workflow, much like the recent demos from Prem AI, and I thought this is cool! After the onboarding with the Data Scientists & Policy Officers at BMZ, it began to sink in that the problem is nearly not as simple as that because the complex-hierarchy data sucks.

Allow me to paint a picture to convey what I mean by a “complex” hierarchy:

200 Pages in a single Report.

8 Reports in a single Project.

60 Projects in a single Program.

60 Programs in a single Theme.

8 Themes in a single Country.

60 Countries that you have partnerships with!

And this hierarchy does not always have sequential relations. Now why was any of it a problem? Before I dive into the details, we need to take a look into the problem from three different perspectives.

First: The Country Policy Officers at the German Federal Ministry

These are our key users facing the problem of navigating this aforementioned complex hierarchy to answer critical questions for the BMZ leadership, the parliament and the people. They need to prepare for commissioning new projects, providing recommendations for annual negotiations with partner countries and continually work to improve the state of projects, mitigate risks, examine outcomes, analyse changes for budget and duration and help make a ton of data-driven decisions!

Now the problem begins with not having a central data lake for the reports and no structured database for these reports. They just do it with sheer patience and hardwork. The answer to any of these questions requires them to analyse the relationships between entities at multiple levels by looking at a complex, multidimensional space where none of the relationships are apparent.

It also requires a lot of expertise and training to get comfortable with this data and analysis work. I myself took 3-4 weeks just to understand what this problem might require, and I kept asking clarifying questions for the hierarchy till the product handover.

Second: The Stakeholders at the German Federal Ministry

The stakeholders were the Senior Policy Officers and Senior Data Scientists at the DataLab at the BMZ. They wanted us to design a system which can present AI-generated insights from the reports which are not explicitly written. These reports consisted of complex multi-page flowcharts, tables and mostly text but they didn’t follow the same structure and varied according to country, implementing organisations and year. They asked us to do something innovative.

To clarify here, I joined this project after it had already been at the Digital Product School for three phases of development. When I joined the fourth phase, it was with the expectation that they’ll receive something more than simple Retrieval-Augmented Generation and receive a dashboard where they can see comparative analysis of KPI growth across the levels of the hierarchy across years.