Diksha Shrivastava (diksha-shrivastava13.github.io) | Published on 25th November, 2024.

<aside> 💡

My Compiled Work on Reasoning, Abstract Representation and Similar Link Prediction

</aside>

This work presents my research and engineering in language model reasoning, abstract representation of linked entities, and link prediction. During summer 2024, I worked on-contract as an AI Engineer for the Federal Ministry of Economic Cooperation and Development of Germany (BMZ) in partnership with the Digital Product School by UnternehmerTUM. There, I explored how Large Language Models (LLMs) could design Decision-Making Systems for unstructured reports, using a five-level hierarchy to support country policy decisions and negotiations.

At BMZ, I designed complex reasoning and analysis pipelines to answer critical questions by identifying relevant nodes, comparing similarities, and understanding relationships across all entity levels. I achieved this using a world model combined with policy models for subtasks, implementing multi-hop reasoning with an agentic approach. I experimented with fifty-four system modifications, including Property Graphs, GraphRAG, Agentic Planning-Reasoning Data for Mixture-of-Experts (MoE), Agentic Standardising of Reports, Reason-Guided Parsing, and various data representation approaches.

After my BMZ work, I became fascinated by a challenge in holistic systems: the difficulty of analysing correlations between subsystems and entities in large datasets, and translating these insights into machine learning problems. This led me to explore hybrid approaches for representing subsystem data and relationships, ultimately developing Swan AI. My research there centred on a key question: is there a way that Language Models can learn the hidden relationships between entities, identify similar links on the go and make predictions?

I conducted thought experiments with LLMs to test their capability to learn, identify, and predict similar links by creating fictional characters in the "Arc" world. Unsurprisingly, GPT-4o performed poorly. While I lacked a formal benchmark, the experiment aligned with the ARC-AGI problem's core concepts: learning relationships in new data and reasoning based on those relationships.

My research across BMZ, Swan AI, and subsequent work led to the conclusion: agency in dynamic databases and training pipelines is essential for representing learned links— allowing data objects to remain fluid while maintaining "memorised" relationships. I emphasise the value of frameworks (like LeanAgent, released shortly after) that incorporate hypotheses cycles, dynamic databases, and continual learning. Recent advances in Test-Time Training (TTT) further support the importance of hypotheses cycles. I propose a data representation and continual learning framework which can allow for representation of hypotheses and similar link prediction. I conclude by emphasising the need for focused work on data representation for complex systems.

Note: My work at BMZ dealt with private, sensitive data and I have taken the utmost care to protect all private information in this article. I talk about the technical and implementation challenges, which can be widely applied to any complex system. For any issues or concerns, please let me know by dropping an email at [email protected].

The Problem at BMZ: Designing a Decision-Making System for Country Policy Decisions & Negotiations

In the summer of 2024, I accepted a contract to work on a dashboard which automates data science for a GovTech product. I thought I would be building a multi-agentic workflow which performs EDA, a beautiful dashboard with toggles for each of the features which visualises the impact on the target and helps the government decide on factors like budget, implementation area, impact of rainfall (or the Ukraine war) and help make better decisions while protecting their projects from risks.

In hindsight, I think I was inspired by this Chess problem I had been working on in 2023: what is the maximum number of blunder moves a machine can make while still winning the game so that the human opponent doesn’t suspect it’s a machine (because of sheer perfection) and pass the Turing test?

A little fun fact:

Continuing ahead, I was expecting a fairly straight-forward multi-agentic workflow, much like the recent demos from Prem AI, and I thought this is cool! After the onboarding with the Data Scientists & Policy Officers at BMZ, it began to sink in that the problem is nearly not as simple as that because the complex-hierarchy data sucks.

Allow me to paint a picture to convey what I mean by a “complex” hierarchy:

200 Pages in a single Report.

8 Reports in a single Project.

60 Projects in a single Program.

60 Programs in a single Theme.

8 Themes in a single Country.

60 Countries that you have partnerships with!

And this hierarchy does not always have sequential relations. Now why was any of it a problem? Before I dive into the details, we need to take a look into the problem from three different perspectives.

First: The Country Policy Officers at the German Federal Ministry

These are our key users facing the problem of navigating this aforementioned complex hierarchy to answer critical questions for the BMZ leadership, the parliament and the people. They need to prepare for commissioning new projects, providing recommendations for annual negotiations with partner countries and continually work to improve the state of projects, mitigate risks, examine outcomes, analyse changes for budget and duration and help make a ton of data-driven decisions!

Now the problem begins with not having a central data lake for the reports and no structured database for these reports. They just do it with sheer patience and hardwork. The answer to any of these questions requires them to analyse the relationships between entities at multiple levels by looking at a complex, multidimensional space where none of the relationships are apparent.

It also requires a lot of expertise and training to get comfortable with this data and analysis work. I myself took 3-4 weeks just to understand what this problem might require, and I kept asking clarifying questions for the hierarchy till the product handover.

Second: The Stakeholders at the German Federal Ministry

The stakeholders were the Senior Policy Officers and Senior Data Scientists at the DataLab at the BMZ. They wanted us to design a system which can present AI-generated insights from the reports which are not explicitly written. These reports consisted of complex multi-page flowcharts, tables and mostly text but they didn’t follow the same structure and varied according to country, implementing organisations and year. They asked us to do something innovative.

To clarify here, I joined this project after it had already been at the Digital Product School for three phases of development. When I joined the fourth phase, it was with the expectation that they’ll receive something more than simple Retrieval-Augmented Generation and receive a dashboard where they can see comparative analysis of KPI growth across the levels of the hierarchy across years.

To give more context about the Digital Product School: cross-functional, five-membered teams consisting of a product manager, an AI Engineer, software engineers and an interaction designer are presented with a wide problem space on which they have to pinpoint the user problems by conducting user interviews, rapid prototyping and iterating from feedback to deliver a final product. The problem space for BMZ was: “Wouldn’t it be great if Country Policy Officers could drive forward their projects by always having the right conclusions from a sea of reports?”

When the product was handed over to my team, the user could upload a project report to the portal with a form which was auto-filled from the fixed fields retrieved from the report to obtain summaries of five sections along with a table. This process took five minutes. Looking at the problem space, a lot more work was required to match stakeholder expectations. On top of that, this was the first project that the Digital Product School had in partnership with the Government, so it needed to be a success.