A Hybrid Neurosymbolic and Reinforcement Learning Approach to Dynamic Decision-Making in Quest-Driven Narrative Worlds
— *Authored by ****Diksha Shrivastava ***(diksha-shrivastava13.github.io)
This preliminary proposal is in response to the project idea by Red Hen Labs detailing my approach to model dynamic decision-making by combining neurosymbolic AI with reinforcement learning (RL). My aim is to create a framework that embodies the “wayfinding” model—one that integrates ambiguity, uncertainty, and dynamic goal setting into a richly detailed narrative environment. This proposal is strongly influenced by the insights from McCubbins and Turner’s work, which emphasise the fluid, context-dependent construction of selves and the trade-offs between cognitive processing and action.
I aim to model a text-based, quest-driven fictional world—similar to those in Dune, The Three-Body Problem, Lord of the Rings, or Harry Potter—to provide a dynamic, evolving environment in which agents can make decisions, act, and learn from reward models. The choice of a quest-driven fictional world is intended to allow for dynamic goal setting, where there is both a higher-level objective and multiple immediate priorities. Unlike modeling a rigid environment, a fictional world would enable agents to form hypotheses, blend their perceptions, and have their decisions play out over the long term; delayed rewards are a typical factor in reinforcement learning.
At every point, the choices available to the characters would depend on their position in the fictional world’s timeline. This is strictly governed by the neurosymbolic module, which is responsible for setting a set of priors, rules, and available actions at any given time. This module also influences the agents’ decision-making process, depending on whether they can afford to think or act.
Using LLMs as agents or “characters” in the fictional world would allow regulation of the cost of thinking and acting, enable the agent to simulate various scenarios before taking action, and support the blending of selves—from past memories, perceptions of other characters, and the environment at the moment of decision. Developing an explainable and interpretable model is thus necessary for the success of this project.
I present the pseudocode in the next section to portray how the different modules can play out.
The project’s foundation will be a simulated narrative world reminiscent of a novel-like setting (imagine a universe similar to Harry Potter or even a broader epic like Lord of the Rings). This environment will:
To bridge the gap between raw sensory input and high-level reasoning:
To capture the dynamic “blending of selves” and foster hypothesis generation: