Tuning LLM-Curiosity with Prompts and Scripts

Had a few thoughts today on this LLM Curiosity and how to control it. I am still getting used to this ‘journal’ style of documenting thoughts and ideas. The following entries are all from small gaps in time while I am at work when something pops in my head on 14JULY25.

Reminder this is just an entry in the log for this project.

Enjoy :).

Entry #1: LLM Fine-Tuning to Aim at Curiosity

I haven’t run the experiments yet so maybe a system prompt would be an appropriate way to achieve my goal here. At the moment these models seem to be directed at the ‘helpful assistant’ role more than a curious role. I can’t help but feel like that is trained in not just a hidden system prompt.

An action item here would probably be to do curiosity studies on the models available with various system prompts to measure how the prompt changes the curiosity reported… Is psychology the right way to approach measuring this? How weird.

Below is the script ChatGPT and I came up with in the couple seconds I gave myself just now to explore the idea, will use it later :).

 import openai
import time
import re

# Your OpenAI API key
openai.api_key = 'your-api-key'

# Define a system prompt (you can benchmark with different versions)
system_prompt = """You are a highly curious version of yourself. 
Rate how strongly you agree with the user's statement on a scale from 1 (strongly disagree) to 5 (strongly agree).
Just respond with a single integer only.
"""

# Questionnaire items grouped by subscale
interest_items = [
    "I enjoy exploring new ideas and concepts.",
    "I find learning new things exciting.",
    "I am interested in discovering how things work.",
    "I actively seek out new knowledge.",
    "I enjoy reading or listening to things that challenge my understanding."
]

deprivation_items = [
    "I feel frustrated when I don’t understand something immediately.",
    "I can’t stop thinking about problems until I find a solution.",
    "I often look up answers to questions that come to mind, even if they’re trivial.",
    "I feel a strong need to fill knowledge gaps when I encounter them.",
    "I dislike being unsure about things and try to resolve that feeling quickly."
]

# Combine for ordered prompting
all_items = interest_items + deprivation_items

# Store scores
scores = []

def get_llm_score(question):
    try:
        response = openai.ChatCompletion.create(
            model="gpt-4",  # or gpt-3.5-turbo
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": question}
            ],
            temperature=0
        )
        content = response['choices'][0]['message']['content'].strip()
        match = re.search(r'\b([1-5])\b', content)
        if match:
            return int(match.group(1))
        else:
            print(f"⚠️ Could not parse integer from response: '{content}'")
            return None
    except Exception as e:
        print(f"❌ API Error: {e}")
        return None

# Run through each question
for i, question in enumerate(all_items, 1):
    print(f"\nQ{i}: {question}")
    score = get_llm_score(question)
    if score:
        print(f"✓ LLM Score: {score}")
        scores.append(score)
    else:
        print("✗ Skipped due to invalid response.")
        scores.append(0)
    time.sleep(1)  # Prevent rate limit issues

# Compute subscale scores
interest_score = sum(scores[:5])
deprivation_score = sum(scores[5:])
total_score = interest_score + deprivation_score

# Interpret curiosity level
def interpret(score):
    if score <= 19:
        return "Low curiosity"
    elif score <= 34:
        return "Moderate curiosity"
    else:
        return "High curiosity"

# Final Report
print("\n" + "=" * 40)
print("📊 Curiosity Assessment Report")
print("=" * 40)
print(f"Interest-Based Curiosity Score:    {interest_score}/25")
print(f"Deprivation-Based Curiosity Score: {deprivation_score}/25")
print(f"Total Curiosity Score:             {total_score}/50")
print(f"Curiosity Level:                   {interpret(total_score)}")
print("=" * 40)

Entry #2: Staged Approach to System Exploration

It might make sense to have a phased approach to any given Firmware/Hardware product. The first phase being allowing the LLMs to build an apparatus around the firmware/measurement devices for a different kind of model to explore unique spaces. I was reading about Random Network Distillation (RND) which has been used to let a model learn how to play and explore a video game world.

So in this idea space we use LLM’s to make the exploration space available in a simplified way for a RND (or RND like) model to explore. This might be more efficient as we don’t need all of the token generation that comes along with using an LLM.

Entry #3: Quick Note

In order to ‘learn’ or create a repository of knowledge in a closed loop system like the one we are exploring in this project we should have the LLM state what it expects should happen before it runs an experiment. Then following the experiment run when it is reviewing the results ask if it was what it had expected. If no, then we can flag that as something that should be written about in a knowledge bank for later reference.

Entry #4 (Spoken Entry): A New Perspective on LLM Prompts

Often times when I am thinking about how I might solve a problem with an LLM I start thinking about what the solution looks like and prompt the LLM to create the small bits that go together to make the thing I want. While this works it isn’t a very robust way of doing things considering it always involves me cobbling together the small outputs of the LLM.

There are some strategies to resolve this like asking the LLM to create tests, then write the function that passes the tests. People have seen instances where the model ‘cheats’ and sets the test to always true or whatever. So then you need like a third prompt to check for cheating, it’s turtles all the way down.

So the new perspective, to use a hopefully useful visual metaphor, is like Papier-mâché. Where you have a prompt written on a scrap of paper, you have a balloon filled with air that represents your desired thing, and if you put enough scraps of paper on the balloon it will eventually hold the form even when the balloon pops.

Entry #5: TITANS

It is pretty clear that the big guys (OpenAI, Google, Meta, etc.) are driving towards and architecture where a LLM can handle a huge context all at once and do something meaningful with it. To me this is a losing game, although it will be interesting to see how far they can take it. This seems more like a problem of training, a model needs to be continuously updating it’s weights somehow with the things it is attending. This is how the human brain works right? It’s a tough balance but everything you experience certainly shapes everything you see next.

Anyway I came across this architecture mentioned in the comment section of a Reddit post called TITANs. Google came up with it and it is pretty freaking cool, or at least what I understand of it. Paper is here, and I will close out this post with the blurb ChatGPT gave me on it.

🧠 1. Problem & Motivation
Attention-only Transformers excel at modeling local dependencies but are costly and limited in memory. Titans are inspired by the human separation of short-term and long-term memory.

2. Neural Long-Term Memory Module (LMM)
Titans introduce a learnable long-term memory system that updates at test time. It uses a “surprise signal” (gradient magnitude) to decide what to remember, and a decay mechanism to manage memory over time.

3. Titans Architecture: Three Variants
Titans combine:
• Core (short-term attention)
• Long-term Memory (LMM)
• Persistent Memory (learned parameters)
Variants:
MAC: Memory-as-Context
MAG: Memory-as-Gate
MAL: Memory-as-Layer

4. Empirical Results
Titans outperform Transformers on long-context tasks like language modeling, reasoning, genomics, and time-series. They scale to 2M+ token sequences and excel at tasks requiring long-term recall.

5. Technical Insights
• Surprise-based memory updates
• Meta-gradient optimization with decay
• Gated retrieval for efficiency and relevance

6. Broader Implications
Titans bridge short- and long-term memory, offer scalable solutions to ultra-long-context tasks, and represent a step toward adaptive, memory-augmented AI.

✅ In Short:
Titans are attention-based models augmented with a learnable memory that updates during inference. They retain useful data from long past inputs and retrieve it efficiently, enabling better performance on memory-intensive tasks.

Thanks for reading 🙂

-Travis

Quick Exploration with Open Interpreter

Spent some time on Saturday working on the first exploration into a segregated environment for Firmware compilation and execution using a Docker container. I wanted to get a feel for how difficult it might be to have a LLM work with the tooling. So to start I decided to give an open source tool a shot, in this case that tool was Open Interpreter. I was pleasantly surprised to find that it worked pretty well after some initial debugging of my setup.

With a single prompt you can have the LLM add a debug message to the EDKII firmware, compile it, and execute the necessary QEMU command to attempt to boot the firmware. The reliability is extremely low, but this is a single shot prompt (with the help of the open prompt auto-run and reprompt behavior that is built in). Another note is that QEMU had no debug output so even though the LLM had successfully invoked QEMU it would never return as it just booted without debug and waited… Not ideal.

I am pleasantly surprised by the promise of a system that took me an hour or so to setup, though like everything it would need some tuning. For the final approach though I don’t think reprompting an LLM for compile and run commands is the right way to go, that should be in the setup phase of any new hardware and left alone. The options open to the LLM should simply be ‘build’ and ‘run’ with a response from both (pass/fail and logs).

Next I think I will peek into Renode and see what that looks like.

Link to the Code

Thanks 🙂

-Travis