<aside>

**Post V of the Causal Discovery Series by https://diksha-shrivastava13.github.io/**

</aside>

[Conducted on 2nd October, 2024]

Following my work at implementing hybrid vector-graph databases and looking at industry use-cases, I wanted to test how good are the current LLMs at Similar Link Prediction. Now, why the focus on “Similar Link Prediction”?

To explain that, it’s important to first take a look at the ARC-AGI problem and François Chollet’s paper **https://arxiv.org/abs/1911.01547** which introduced the Arc Benchmark in 2019. Chollet talks in great detail about defining intelligence, generalisation and adaptability. For our interests here, I will only talk about a selected few things, although I do strongly recommend reading the whole paper.

Generalisations can be categorised in two ways:

  1. System-Centric Generalisation: Defining generalisation this way would mean that we look at the system’s ability to adapt to situations it has not encountered before. However, here we do not take in account the prior knowledge of the developer for this system. For example, when training a machine learning algorithm, the generalisation of the system is its ability to perform on the test set that it has not seen before, however any developer designing such a system has knowledge of the “known unknowns” that the system would need to perform in. Thus, the generality itself (in my opinion) is limited.
  2. Developer-Aware Generalisation: This is the ability of the system to adapt to and handle the situations which were not previously encountered by either the system or the developer.

Now, the degree of generalisation can be broadly defined as:

  1. Absence of Generalisation: When there is no uncertainty about the tasks to be executed, there is no generalisation required.
  2. Local Generalisation or Robustness: This is the ability of a system to handle new points from a known distribution for a single task or a well-scoped set of known tasks, given a sufficiently dense sampling of examples from the distribution or adaptation to known unknowns within a single task or well-defined set of tasks*.*
  3. Broad Generalisation or Flexibility: This is the ability of a system to handle a broad category of tasks and environments without further human intervention or adaptation to unknown unknowns across a broad category of related tasks*.*
  4. Extreme generalisation: This describes open-ended systems with the ability to handle entirely new tasks that only share abstract commonalities with previously encountered situations, applicable to any task and domain within a wide scope or adaptation to unknown unknowns across an unknown range of tasks and domains.

For my purpose at handling reasoning in holistic systems, I wanted broad generalisation which would enable the system to handle a vast number of ML problems by first understanding the hidden relationships and implications between entities of different subsystems.

Now, let’s take a quick look at an ARC-AGI example before we dive into the “Arc” World.

image.png

The Arc Challenge provides a couple of examples of input-output pairs using which you have to predict the output corresponding to the input in the test pair. The Arc Prize 2024 Competition ended with the SOTA score at 55% on the public leaderboard and generated significant interest in the field of Test-Time Training (TTT).

My attempt here is to achieve good enough similar results which are much cheaper to run in production. Just like Retrieval-Augmented Generation (RAG) allows anyone to leverage the power of LLMs without having to fine-tune and host their own models for a large number of general use-cases, there can be a data representation which allows organisations to leverage reasoning and generality without having to scale compute on inference time.

The question is, what should this data representation look like? (see next section)