Alright we have the results from the previous journal entry. I have revised my thinking on the subject but if you are interested in how the previous benchmark went the information will be at the end of this post — or at least a small sampling to get the idea.
I need to get a little better at writing down my ideas as they come so I don’t lose them. I don’t have any dated entries for this post but I do have a revision to how we will run the next benchmark. I believe this next benchmark to be more accurate in measuring the models baseline curiosity.
Reminder this is just an entry in the log for this project.
Re-cap
Previously I had discussed using prompts to maximize curiosity in a models output. While the prompt may be a valid way to illicit curious responses, using a psychological ‘self-report’ method of measuring that is not appropriate in this case. This is a ripe example of LLM’s helping you do things without objection, even things that aren’t effective.
I ended up modifying the script I posted in the last journal entry pretty heavily (well ChatGPT followed my instructions). What I ended up at was 4 system prompts with a total of 10 questions that the model was asked to select a number 1 (highly disagree) to 5 (highly agree). We add these numbers and that is the curiosity score for that Prompt + Model Pairing. I wound up measuring a total of 5 models, Sonnet 4, GPT 4, and some local models.
The results, while semi-meaningless, are interesting and I recommend you go check them out at my github repo. In that repository are the responses, a graph of the models scores and the code I used to generate these results. I ran each model 10 times to verify repeatability and only one model/provider had strange results that I am still looking into which was OpenAI’s GPT 4. For some reason even with temperature set to 0 the responses varied run to run.
Why Change?
It was for a couple reasons that I felt the benchmark was completely invalid. I already had a gut feel that running a self-report psychology based benchmark wasn’t correct because well… It isn’t human, and doesn’t have an ‘internal’ world. But also I found the reasoning models looking at my system prompt as something to ‘think’ about, for instance:
Okay, the user asked me to rate how strongly I agree with the statement “I enjoy exploring new ideas and concepts” on a scale from 1 to 5. Let me think about this.
First, I need to recall what the user’s role is. They mentioned being a “highly curious version of yourself.” So, this is about my own curiosity and interest in new ideas. Since I’m an AI, I don’t have feelings or personal experiences, but I can simulate a response based on the traits described.
The key here is that the user wants me to act as if I’m highly curious…
From this I took that the model is not somehow being ‘directed’ by the vectors of the input prompt. Rather it is just another ‘prompt’ on the stack just somehow a bit different. Either way this wasn’t doing what I expected and so I am changing things.
What’s next?
So coming next, we will be using a behavior based approach. For this I will be coding up a text based game framework. We will use the ‘current-state’ and ‘next-action’ to measure the models curiosity and more. I have already started having ChatGPT code this up, but I have a feeling it will be fun to see how the model plays the game and watch it’s stats go up with each decision.
Had a few thoughts today on this LLM Curiosity and how to control it. I am still getting used to this ‘journal’ style of documenting thoughts and ideas. The following entries are all from small gaps in time while I am at work when something pops in my head on 14JULY25.
Reminder this is just an entry in the log for this project.
Enjoy :).
Entry #1: LLM Fine-Tuning to Aim at Curiosity
I haven’t run the experiments yet so maybe a system prompt would be an appropriate way to achieve my goal here. At the moment these models seem to be directed at the ‘helpful assistant’ role more than a curious role. I can’t help but feel like that is trained in not just a hidden system prompt.
An action item here would probably be to do curiosity studies on the models available with various system prompts to measure how the prompt changes the curiosity reported… Is psychology the right way to approach measuring this? How weird.
Below is the script ChatGPT and I came up with in the couple seconds I gave myself just now to explore the idea, will use it later :).
import openai
import time
import re
# Your OpenAI API key
openai.api_key = 'your-api-key'
# Define a system prompt (you can benchmark with different versions)
system_prompt = """You are a highly curious version of yourself.
Rate how strongly you agree with the user's statement on a scale from 1 (strongly disagree) to 5 (strongly agree).
Just respond with a single integer only.
"""
# Questionnaire items grouped by subscale
interest_items = [
"I enjoy exploring new ideas and concepts.",
"I find learning new things exciting.",
"I am interested in discovering how things work.",
"I actively seek out new knowledge.",
"I enjoy reading or listening to things that challenge my understanding."
]
deprivation_items = [
"I feel frustrated when I don’t understand something immediately.",
"I can’t stop thinking about problems until I find a solution.",
"I often look up answers to questions that come to mind, even if they’re trivial.",
"I feel a strong need to fill knowledge gaps when I encounter them.",
"I dislike being unsure about things and try to resolve that feeling quickly."
]
# Combine for ordered prompting
all_items = interest_items + deprivation_items
# Store scores
scores = []
def get_llm_score(question):
try:
response = openai.ChatCompletion.create(
model="gpt-4", # or gpt-3.5-turbo
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": question}
],
temperature=0
)
content = response['choices'][0]['message']['content'].strip()
match = re.search(r'\b([1-5])\b', content)
if match:
return int(match.group(1))
else:
print(f"⚠️ Could not parse integer from response: '{content}'")
return None
except Exception as e:
print(f"❌ API Error: {e}")
return None
# Run through each question
for i, question in enumerate(all_items, 1):
print(f"\nQ{i}: {question}")
score = get_llm_score(question)
if score:
print(f"✓ LLM Score: {score}")
scores.append(score)
else:
print("✗ Skipped due to invalid response.")
scores.append(0)
time.sleep(1) # Prevent rate limit issues
# Compute subscale scores
interest_score = sum(scores[:5])
deprivation_score = sum(scores[5:])
total_score = interest_score + deprivation_score
# Interpret curiosity level
def interpret(score):
if score <= 19:
return "Low curiosity"
elif score <= 34:
return "Moderate curiosity"
else:
return "High curiosity"
# Final Report
print("\n" + "=" * 40)
print("📊 Curiosity Assessment Report")
print("=" * 40)
print(f"Interest-Based Curiosity Score: {interest_score}/25")
print(f"Deprivation-Based Curiosity Score: {deprivation_score}/25")
print(f"Total Curiosity Score: {total_score}/50")
print(f"Curiosity Level: {interpret(total_score)}")
print("=" * 40)
Entry #2: Staged Approach to System Exploration
It might make sense to have a phased approach to any given Firmware/Hardware product. The first phase being allowing the LLMs to build an apparatus around the firmware/measurement devices for a different kind of model to explore unique spaces. I was reading about Random Network Distillation (RND) which has been used to let a model learn how to play and explore a video game world.
So in this idea space we use LLM’s to make the exploration space available in a simplified way for a RND (or RND like) model to explore. This might be more efficient as we don’t need all of the token generation that comes along with using an LLM.
Entry #3: Quick Note
In order to ‘learn’ or create a repository of knowledge in a closed loop system like the one we are exploring in this project we should have the LLM state what it expects should happen before it runs an experiment. Then following the experiment run when it is reviewing the results ask if it was what it had expected. If no, then we can flag that as something that should be written about in a knowledge bank for later reference.
Entry #4 (Spoken Entry): A New Perspective on LLM Prompts
Often times when I am thinking about how I might solve a problem with an LLM I start thinking about what the solution looks like and prompt the LLM to create the small bits that go together to make the thing I want. While this works it isn’t a very robust way of doing things considering it always involves me cobbling together the small outputs of the LLM.
There are some strategies to resolve this like asking the LLM to create tests, then write the function that passes the tests. People have seen instances where the model ‘cheats’ and sets the test to always true or whatever. So then you need like a third prompt to check for cheating, it’s turtles all the way down.
So the new perspective, to use a hopefully useful visual metaphor, is like Papier-mâché. Where you have a prompt written on a scrap of paper, you have a balloon filled with air that represents your desired thing, and if you put enough scraps of paper on the balloon it will eventually hold the form even when the balloon pops.
Entry #5: TITANS
It is pretty clear that the big guys (OpenAI, Google, Meta, etc.) are driving towards and architecture where a LLM can handle a huge context all at once and do something meaningful with it. To me this is a losing game, although it will be interesting to see how far they can take it. This seems more like a problem of training, a model needs to be continuously updating it’s weights somehow with the things it is attending. This is how the human brain works right? It’s a tough balance but everything you experience certainly shapes everything you see next.
Anyway I came across this architecture mentioned in the comment section of a Reddit post called TITANs. Google came up with it and it is pretty freaking cool, or at least what I understand of it. Paper is here, and I will close out this post with the blurb ChatGPT gave me on it.
🧠 1. Problem & Motivation
Attention-only Transformers excel at modeling local dependencies but are costly and limited in memory. Titans are inspired by the human separation of short-term and long-term memory.
2. Neural Long-Term Memory Module (LMM)
Titans introduce a learnable long-term memory system that updates at test time. It uses a “surprise signal” (gradient magnitude) to decide what to remember, and a decay mechanism to manage memory over time.
4. Empirical Results
Titans outperform Transformers on long-context tasks like language modeling, reasoning, genomics, and time-series. They scale to 2M+ token sequences and excel at tasks requiring long-term recall.
5. Technical Insights
• Surprise-based memory updates
• Meta-gradient optimization with decay
• Gated retrieval for efficiency and relevance
6. Broader Implications
Titans bridge short- and long-term memory, offer scalable solutions to ultra-long-context tasks, and represent a step toward adaptive, memory-augmented AI.
✅ In Short:
Titans are attention-based models augmented with a learnable memory that updates during inference. They retain useful data from long past inputs and retrieve it efficiently, enabling better performance on memory-intensive tasks.
Spent some time on Saturday working on the first exploration into a segregated environment for Firmware compilation and execution using a Docker container. I wanted to get a feel for how difficult it might be to have a LLM work with the tooling. So to start I decided to give an open source tool a shot, in this case that tool was Open Interpreter. I was pleasantly surprised to find that it worked pretty well after some initial debugging of my setup.
With a single prompt you can have the LLM add a debug message to the EDKII firmware, compile it, and execute the necessary QEMU command to attempt to boot the firmware. The reliability is extremely low, but this is a single shot prompt (with the help of the open prompt auto-run and reprompt behavior that is built in). Another note is that QEMU had no debug output so even though the LLM had successfully invoked QEMU it would never return as it just booted without debug and waited… Not ideal.
I am pleasantly surprised by the promise of a system that took me an hour or so to setup, though like everything it would need some tuning. For the final approach though I don’t think reprompting an LLM for compile and run commands is the right way to go, that should be in the setup phase of any new hardware and left alone. The options open to the LLM should simply be ‘build’ and ‘run’ with a response from both (pass/fail and logs).
Next I think I will peek into Renode and see what that looks like.
This week, I built a small embedded program with ChatGPT at the proverbial coder’s keyboard. The only coding I did was to correct small mistakes and to integrate the chunks I asked ChatGPT to generate. I am still experimenting with the various ways I can communicate with ChatGPT, so bear with me as I try some new things.
For our project, we will use an ESP32 and an OLED screen to display famous quotes retrieved from the Internet. The design will be roughly like this:
We won’t concern ourselves with smaller details, like how the user might configure the ESP32 to connect to a wireless network. At this point, we are concerned with standing up the ‘core technology’ of our ‘product.’
ESP32 Quote Fetching (Written by ChatGPT)
First things first, we need to get our ESP32 to retrieve quotes from the Internet. I was originally dubious about ChatGPT’s ability to write an application that could pull these quotes, so I started by asking it to write a program in Python. With a Python script, I could verify functionality before adding steps like compilation and flashing to the mix.
After proving out its abilities in Python, I asked ChatGPT to write the same program in C. You can see our conversation below. Notice the back-and-forth nature of the conversation, including corrections and specifications I made to ChatGPT:
Travis: Can you write the same program in C for a WROOM ESP32?
ChatGPT: Yes, I can! Here's an example of how you could implement the same program in C for an ESP32 module using the ESP-IDF framework:...CODE BLOCK...
Travis: This is for esp32 so needs a loop and setup
ChatGPT: You're right, I apologize for that oversight. Here's an updated version of the program that includes a setup and loop function for the ESP32:...CODE BLOCK...
Travis: Can you replace the printf with a serial out
ChatGPT: Sure, here's the modified program with serial output instead of printf:...CODE BLOCK...
Travis: Sorry I meant the arduino serial out
ChatGPT: No problem! Here's the modified program with serial output using the Arduino Serial library:
After a quick attempt at compiling, I see that I don’t have the ‘ArduinoJson.h’ that ChatGPT wants to use, so I ask how to attain that in the following snippet of the conversation:
Travis: How do I use the ArduinoJson.h?
ChatGPT: ArduinoJson.h is a library that provides a simple and efficient way to parse and generate JSON data on Arduino and other embedded systems. To use it, you'll need to follow these steps:
Download and install the library. You can do this by going to the "Sketch" menu in the Arduino IDE, selecting "Include Library", and then "Manage Libraries". In the Library Manager, search for "ArduinoJson" and click "Install" to install the latest version.
...MORE INSTRUCTIONS (NOT NEEDED FOR OUR CASE)...
Once I followed ChatGPT’s instruction, I found one final bug: There were two ‘setup’ and ‘loop’ blocks. Once I fixed this simple mistake, I then had a program that could be compiled. The next exciting step would be to flash it to the ESP32 and then see if the serial monitor would start displaying quotes!
Attempt #1 Output
Serial connected!
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
WiFi connected!
HTTP request failed: connection lost
xV���k@T@0
- ��@�?
HTTP request failed: connection lost
xV���k@T@0
- ��@�?
...MORE OF THIS...
The ESP32 connected to WiFi. It looks like it attempted an HTTP request, but the connection was lost. Let’s ask ChatGPT what is up…
Travis: I am getting 'HTTP request failed: connection lost'
ChatGPT: The error "HTTP request failed: connection lost" could indicate that the ESP32 is losing its Wi-Fi connection during the HTTP request. This could happen if the Wi-Fi signal is weak or unstable.
To troubleshoot this issue, you could try the following steps:
1. Make sure that the ESP32 is connected to a stable Wi-Fi network with a strong signal.
2. Increase the timeout for the HTTP request by calling the setTimeout method on the HTTPClient object. For example:
...CODE BLOCK...
This will give the ESP32 more time to complete the HTTP request before timing out.
3. Try using a different HTTP client library, such as the WiFiClient library, to make the HTTP request. For example:
...CODE BLOCK...
This code uses the WiFiClient library to make an HTTP GET request to the API server. Note that this code does not use HTTPS, so it is less secure than the previous code that used WiFiClientSecure. However, it may be more reliable in some situations.
I hope this helps! Let me know if you have any other questions.
ChatGPT gave me a list of debug options. Super helpful!
I took ChatGPT’s list out of order and honed in on a small detail in the ‘WiFiClientSecure’ debug option. I noticed in the original code that ChatGPT was accessing the quotable API using an HTTPS connection. I simply removed the ‘s’ from ‘https’ to access it without SSL security and… (drum-roll, continue onto the next section please!)…
Attempt #2 Output, Nicely Done ChatGPT!
Serial connected!
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
WiFi connected!
Never find fault with the absent.
- Alexander Pope
One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.
- Elbert Hubbard
Real success is finding your lifework in the work that you love.
- David McCullough
A dream is your creative vision for your life in the future. You must break out of your current comfort zone and become comfortable with the unfamiliar and the unknown.
We have quotes!
ESP32 OLED Display Output
All right, now we need to set up the quotes we get from the Internet to display on our OLED screen.
Check out my continued conversation with ChatGPT as I ask for help on the display portion of the project.
Travis: I have a 0.96" OLED screen, are you familiar with them?
ChatGPT: Yes, I'm familiar with 0.96" OLED screens! They are commonly used with microcontrollers and are great for displaying text and graphics in small projects. There are a variety of libraries available for interfacing with OLED screens, so depending on the specific screen you have, there may be different libraries that work best.
If you have any questions about using an OLED screen with your project, feel free to ask!
Travis: I would like to display the fetched quote on the OLED screen instead of serial out
ChatGPT: Sure, here's an example code snippet that shows how you can display the fetched quote on a 0.96" OLED screen using the Adafruit_SSD1306 library:...CODE BLOCK (See Below)...In this example code, we're using the Adafruit_SSD1306 library to initialize and interact with the OLED screen. The fetched quote is then displayed on the OLED screen using the display object's println() and setCursor() methods.
Note that you'll need to modify the API_HOST, API_URL, your_SSID, and your_PASSWORD variables to match your specific requirements.
Travis: What pins are used for the OLED screen on the ESP32?
ChatGPT: The pins used for the OLED screen on the ESP32 depend on the specific breakout board you're using. However, many 0.96" OLED screens for the ESP32 use the I2C protocol to communicate with the microcontroller.
Here's an example pinout for the popular ESP32 DevKitC board and an OLED screen using the I2C protocol:...TABLE (See Below)...Note that the pin names on your OLED screen may be different depending on the manufacturer, so it's important to consult the documentation for your specific screen to ensure you're using the correct pins. Additionally, you may need to install additional libraries to interface with the OLED screen depending on the specific breakout board you're using.
To summarize our conversation, I asked if ChatGPT was familiar with the OLED screen I was using. Then, I asked if ChatGPT could generate the code to display the quote fetched on the OLED screen. Finally, I asked how I should wire my ESP32 to the OLED screen.
Below you will find the code and pin-out table that ChatGPT provided me. I noticed a couple of things, namely:
ChatGPT has nearly zero deterministic behavior between code generations. Each one was noticeably unique.
Anecdotally, ChatGPT seems to have organized the code better with the added complexity of displaying to the OLED screen.
#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
#define SCREEN_WIDTH 128 // OLED display width, in pixels
#define SCREEN_HEIGHT 64 // OLED display height, in pixels
#define OLED_RESET -1 // Reset pin # (or -1 if sharing Arduino reset pin)
#define SCREEN_ADDRESS 0x3C // I2C address of the OLED screen
#define API_HOST "api.quotable.io"
#define API_URL "/random"
Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);
void setup() {
Serial.begin(115200);
// Initialize OLED display
if(!display.begin(SSD1306_SWITCHCAPVCC, SCREEN_ADDRESS)) {
Serial.println("SSD1306 allocation failed");
while (1);
}
// Clear the display buffer
display.clearDisplay();
// Set text color to white
display.setTextColor(SSD1306_WHITE);
// Set font size
display.setTextSize(1);
// Connect to Wi-Fi network
WiFi.begin("your_SSID", "your_PASSWORD");
while (WiFi.status() != WL_CONNECTED) {
delay(1000);
Serial.println("Connecting to WiFi...");
}
Serial.println("Connected to WiFi");
}
void loop() {
// Make HTTP request to API
HTTPClient http;
http.begin(API_HOST, 80, API_URL);
int httpCode = http.GET();
if (httpCode == HTTP_CODE_OK) {
String payload = http.getString();
// Parse JSON response
const size_t bufferSize = JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 160;
DynamicJsonDocument doc(bufferSize);
deserializeJson(doc, payload);
const char* quote = doc["content"];
const char* author = doc["author"];
// Display quote on OLED screen
display.clearDisplay();
display.setCursor(0, 0);
display.println(quote);
display.setCursor(0, 20);
display.println("- " + String(author));
display.display();
} else {
Serial.println("HTTP request failed");
}
// Wait 10 seconds before making the next request
delay(10000);
}
ESP32 Pin
OLED Pin
GPIO21
SDA
GPIO22
SCL
3.3V
VCC
GND
GND
Wiring table provided by ChatGPT
Display, Attempt #1 Output
I am blown away by the results of this. It worked! There was only one flaw: When a quote was too long, it wrapped around the screen over the top of the author’s name. This display problem was due to ChatGPT’s use of hard-coded location coordinates. I could see two solutions right out of the gate:
Have the quote scroll across the screen instead of wrapping.
Have the author’s name appended to the quote with a new line just before it.
Either of these would have worked, but both required too much coding for me. So, I asked ChatGPT to do it!
Travis: The text wraps and is colliding with the authors name, how can I resolve this?
ChatGPT: To prevent text from colliding with other elements on your OLED screen, you may need to adjust the font size, spacing, or position of your text.
Here are some suggestions for resolving the text wrapping issue:List Items 1..23. Move the author's name to a different location on the OLED screen, such as below the quote text or in a separate area of the screen. This will create more space for the quote text to be displayed without colliding with other elements.List Items 4..5
Travis: How would I implement option #3
ChatGPT: To move the author's name to a different location on the OLED screen, you can adjust the Y position of the text using the display.drawString() function. Here's an example of how to move the author's name to be displayed below the quote text:...CODE BLOCK...In this example, we first display the quote text using the drawStringMaxWidth() function, which automatically wraps the text based on the width of the OLED screen. We then use the setCursor() function to move the cursor to a new line below the quote text, and display the author's name using the print() function.
Note that you may need to adjust the Y position of the cursor depending on the size of your OLED screen and the amount of spacing you want between the quote text and the author's name.
Travis: How would I dynamically change the authors line to be below the quote text
ChatGPT: If you want to dynamically change the position of the author's name based on the length of the quote text, you can calculate the height of the quote text and adjust the Y position of the author's name accordingly.
Here's an example of how to do this:...MORE DEBUG (2 More Rounds of Questions)...
Travis: Is there a way to simply get the cursors location after a println?
ChatGPT: In Arduino, you can use the getTextBounds() function to get the size of the text that was just printed. You can then use this information to calculate the position of the next line of text.
Here's an example:...CODE BLOCK...
In this example, we start by printing the first line of text. We then use the getTextBounds() function to get the size of the printed text. The function takes six parameters: the string of text to measure, the x and y coordinates of the cursor, and four output variables that will contain the position and size of the text.
After getting the size of the printed text, we calculate the position of the next line of text by adding some padding between lines. Finally, we print the second line of text at the new cursor position.
ChatGPT’s Solution to the Text Collision
Basically, I asked ChatGPT how to resolve the text collision with the author’s name and it gave me a list of suggestions. I opted for moving the author’s name below the quote on the screen. I then asked ChatGPT to generate the code for me. The new code broke my compile by requiring things that were non-existent in the libraries I have. After a few more questions, I had an example of a way to do it without using the libraries it had suggested. Below is the example that got me “un-stuck”:
int x, y; // x and y coordinates of the cursor
// Print the first line of text
display.println("Hello, world!");
// Get the size of the printed text
int16_t x1, y1;
uint16_t w, h;
display.getTextBounds("Hello, world!", x, y, &x1, &y1, &w, &h);
// Calculate the position of the next line of text
x = x + w + 5; // add some padding between lines
y = y + h;
// Print the second line of text
display.println("This is the second line.");
I couldn’t merely copy and paste the above snippet of ChatGPT’s code into the overall program. Instead, I had to use it as a general example to guide me as I implemented my own solution. As the architect rather than the technical developer, this was almost too much coding work, but it was still helpful for finishing out the quote display sketch. The final code is below:
#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
#define SCREEN_WIDTH 128 // OLED display width, in pixels
#define SCREEN_HEIGHT 64 // OLED display height, in pixels
#define OLED_RESET -1 // Reset pin # (or -1 if sharing Arduino reset pin)
#define SCREEN_ADDRESS 0x3C // I2C address of the OLED screen
#define API_HOST "api.quotable.io"
#define API_URL "/random"
Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);
void setup() {
Serial.begin(115200);
// Initialize OLED display
if(!display.begin(SSD1306_SWITCHCAPVCC, SCREEN_ADDRESS)) {
Serial.println("SSD1306 allocation failed");
while (1);
}
// Clear the display buffer
display.clearDisplay();
// Set text color to white
display.setTextColor(SSD1306_WHITE);
// Set font size
display.setTextSize(1);
// Connect to Wi-Fi network
WiFi.begin("your_SSID", "your_PASSWORD");
while (WiFi.status() != WL_CONNECTED) {
delay(1000);
Serial.println("Connecting to WiFi...");
}
Serial.println("Connected to WiFi");
}
void loop() {
// Make HTTP request to API
HTTPClient http;
http.begin(API_HOST, 80, API_URL);
int httpCode = http.GET();
if (httpCode == HTTP_CODE_OK) {
String payload = http.getString();
// Parse JSON response
const size_t bufferSize = JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 160;
DynamicJsonDocument doc(bufferSize);
deserializeJson(doc, payload);
const char* quote = doc["content"];
const char* author = doc["author"];
display.clearDisplay();
display.setCursor(0, 0);
display.println(quote);
// Get the size of the printed text
int x, y; // x and y coordinates of the cursor
int16_t x1, y1;
uint16_t w, h;
display.getTextBounds(quote, x, y, &x1, &y1, &w, &h);
// Calculate the position of the next line of text
x = x + w + 5; // add some padding between lines
y = y + h;
display.setCursor(x, y);
display.println("- " + String(author));
display.display();
} else {
Serial.println("HTTP request failed");
}
// Wait 10 seconds before making the next request
delay(10000);
}
Conclusion
I was able to get a functioning copy of the core technology of this project in under 2 hours. Granted, I did need some background knowledge in coding to be able to integrate and get required libraries. Keep in mind though that I still got stuck, but ChatGPT was able to guide me through each time.
So far, I am concerned about a few things after using ChatGPT:
Does it actually make things faster?
Blind acceptance is problematic.
I’m not so keen to be reliant on something which exists exclusively online.
Let’s quickly address these three points.
Does it actually make things faster?
I have to question how much faster this actually makes an engineer. In the case of this morning, I think I completed the goal faster than I might have otherwise. However, there is a case to be made that the large amount of output from ChatGPT can easily make an engineer feel like they are making headway, when in fact they are not.
Blind acceptance is problematic (ChatGPT might have your back)
In the article today, ChatGPT offered a few different API’s for generating quotes from the Internet. This is a problem if ChatGPT’s training data happened to include malicious APIs and someone blindly accepted it as “working.” The same concern applies to libraries suggested by ChatGPT.
Reliance on something that exists only online
This concern is probably the least troublesome of all three, as most developers will presumably have an Internet connection. However, if you are relying on ChatGPT for the majority of your work, you may be lost if your Internet connection is compromised. In the future, it would be awesome to have an onsite version of the Large Language Model to ensure engineers have guaranteed access. A model developer firm could license local versions (based on a general set of training data) to individual corporations. The client companies could then continue to train their individually-licensed models on their own internal data. Intellectual property would be safeguarded and models would be more versatile and accurate when responding to user needs.
Wrapping Up
Thank you for reading this week’s post! It was really quite amazing working with ChatGPT. I learned some new and crazy things, like the Python “Easter egg” built-in library called this that contains a poem written by Tim Peters (?!).
As always, please feel free to leave a comment below!
The technology is out there, available for anyone with an internet connection. Insane. If you have interacted with ChatGPT, it should be immediately clear that this isn’t Alexa, Siri, or ‘OK, Google.’ This is something new that will reshape how humans create. Strap in, embrace it, and avoid the easiest route of existential threat.
We are taking a break from the usual content this week to talk about something that both scares the hell out of me and sparks awe – ChatGPT. The implications of this technology are so general that it is difficult to avoid divergent, runaway speculation. In this article, I focus on the practical implications of a tool like ChatGPT for a software engineer like myself.
With ChatGPT as such an accessible and erudite resource, my immediate reaction is, As a developer, what am I good for now? I pride myself on being able to cleverly solve problems with software. How do I find the joy of helping people if ChatGPT is readily available, free, and often more capable than me at a specific task?
ChatGPT is Here: Get Over It
So…you worry that your precious skill has been disrupted and you have little security in your identity and ability to be helpful. It has happened to many before you and will continue to happen to those who believe they can ride a single skill to the grave. As we are continually reminded, no skill or ability is that special anymore.
“Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.”
Stewart Brand
Time to step up and face what’s coming! Where do we go from here?
Architects from the Start
For every motivated software engineer, there comes a day when they must step away from writing code to begin breaking apart problems and designing solutions for others to implement.
I am in this exact situation now. I love to code, but I need to begin exercising my system design skills. In the early transition to this new technology, this will still include coaching junior engineers in implementing software designs. Eventually, the junior engineer will be replaced by queries to a Large Language Model like ChatGPT.
Fighting your way through difficult code, learning how to deal with the complexity of large projects, and specializing in something specific (e.g., a programming language or a piece of hardware) are all things the software engineer of the future will be liberated from.
Coming out of university, I believe the software engineer of the future will simply be a system architect who knows how to ask a Large Language Model like ChatGPT the right questions to generate the vision they have. They will be able to skip the 3-5 years of code/hardware specialization.
ChatGPT: High-Level System, Details Later
If you want to be successful in the future as a software engineer, you must be creative and think at the system level. Use ChatGPT to design systems that automate meeting basic human needs, such as food, water, and shelter.
Start from the top-down. Is your goal something big or something small? Write it down and start breaking it into sub-problems. Can’t think of how to break it apart? Ask ChatGPT!
In the example above, I asked ChatGPT to help me develop a problem set to meet the end goal of desalinating water to help a small population. ChatGPT responded with a useful, 8-point list. Now, I have 8 tasks to target for design and improvement. While ChatGPT was able to help with the plan, ChatGPT is not going to perform all 8 of these tasks alone. There is plenty of work left for a human. ChatGPT is not going to help us simply by existing. Human interaction is REQUIRED.
My hope is that this kind of tool is used to meet many human needs through automation, securing our population.
Strong Character Needed
Today, the early stumbling of an inexperienced software engineer has little effect on the large system they belong to. In the future, when a software engineer designs a large-scale and accidentally flawed system with ChatGPT, the consequences will be far more drastic. Future software engineers should be taught and frequently reminded of their responsibility when using this kind of tool.
Careful consideration of each new technology that is developed using this tool is required. We need engineers with discipline and character.
Conclusion
I hope that this article sparks hope in those (like me) who see a tool like ChatGPT and consequently fear that they are no longer useful. Our responsibility has grown greatly and the stakes are high, but with strong character, I think the future is bright for humanity.
On a closing note, I am embracing this technology and I refuse to be steamrolled. Moving forward, my articles will focus on design and architecture. Any code implementation will be carried out by Large Language Models like ChatGPT, then integrated by me to meet the final design requirements.
Please leave a comment if you find this thought-provoking or have anything to add to the headings I have here.
If you are interested in reading about some of my other projects (which were not implemented by AI), please check out the following:
This week, we explore the asyncio library and calculate how long it would take for a single process to scan the entire Internet’s worth of IPV4 addresses (…only 4,294,967,296 total addresses!).
In this article, I briefly walk through the asyncio library and create a real-world example!
If you are new to Python or programming in general, I would advise you to learn the basics before approaching asyncio. This is a bit of an advanced topic, so having a solid grasp on the foundations of synchronous programming will be a requirement here.
A [Kind of] Experienced Software Guy
Asyncio
Asyncio was introduced as a built-in library in Python version 3.4. It is intended for IO-bound operations. One example is network communication, as it is often the case that you will wait for data to be returned from a remote server.
In the time spent waiting, however, you may have an opportunity to complete other work. This is where an asynchronous application shines.
You define a co-routine by simply adding the ‘async’ keyword in front of the normal ‘def’ keyword, as shown below:
async def main():
When you want to run this function, it must be done via a runner like this:
asyncio.run(main())
The Mental Model of Asyncio
When a runner is created to execute one or many asynchronous tasks, it starts by creating an event loop.
“The event loop is the core of every asyncio application.”
Below is a very simplistic mental model for a basic asyncio application.
Each time an asyncio ‘Task’ has to ‘await’ something, there is an opportunity for the other tasks in the loop to execute more of their work if they are no longer ‘awaiting.’
Our mental model looks very similar to a single-threaded synchronous application. However, each of the asynchronous tasks are being handled independently, as long as no single task is hogging attention from the event loop.
Concrete Example in Asyncio
Now we have all that is needed to develop a solution for an important question:
How Long Would it Take?
How long would it take for a single-threaded program to scan the entire Internet’s-worth of IPV4 Addresses?
Let’s start by defining a class that can generate all of the IP addresses.
@dataclass
class IPAddress:
subnet_ranges: List[int]
Above, we define a data class that simply holds a list of subnet values. If you are familiar with IPV4, you probably guessed that this list will have a length of four. I didn’t go out of my way to enforce the length, but in a production environment it would be ideal to do so.
We now have something that can hold our IP address in a sane way, but we still need to generate them.
@classmethod
def get_all_ips(cls, start: int = 1, stop: int = 255) -> 'IPAddress':
first = second = third = fourth = start
while fourth < stop:
first += 1
if first >= stop:
first = start
second += 1
if second >= stop:
second = start
third += 1
if third >= stop:
third = start
fourth += 1
curr_ip = cls([fourth, third, second, first])
if cls.is_valid_ip(curr_ip):
yield curr_ip
For this, I introduce a factory class method that will yield IP addresses with the default range of ‘1.1.1.1’ to ‘255.255.255.255.’ The method increments the least-significant subnet value and rolls over to the higher-order subnets each time its value reaches 255. The bulleted list below illustrates the method’s address outputs.
1.1.1.1
1.1.1.2
1.1.1.3
…
1.1.1.254
1.1.1.255
1.1.2.1
…
255.255.255.254
255.255.255.255
If you have a keen eye, you will have likely noticed the ‘is_valid_ip’ class method. It’s called just before yielding to the calling function.
This function simply checks if the IP address is in a valid public range as defined by the private ranges. See below:
@classmethod
def is_valid_ip(cls, ip_Address: 'IPAddress') -> bool:
if ip_Address.subnet_ranges[0] == 0:
return False
if ip_Address.subnet_ranges[0] == 10:
return False
if ip_Address.subnet_ranges[0] == 172 and 16 <= ip_Address.subnet_ranges[1] <= 31:
return False
if ip_Address.subnet_ranges[0] == 192 and ip_Address.subnet_ranges[1] == 168:
return False
return True
Are We Asynchronous Yet?
No, not yet…but soon! Now that we have our IP address generator defined, we can start building an asynchronous function that will do the following:
Iterate our generator function an N-number of times to get a batch of IPs.
Create an asynchronous task for each IP address in our batch which checks if a port is open.
By adding timers to this code, we will find out how long it would theoretically take! Keep in mind that we already know the performance impacts of using Python vs. C++, but this is a toy problem, so…Python is perfect.
Iterate our Generation Function
for _ in range(IP_COUNT_PER_GATHER):
try:
next_group_of_ips.append(ip_addr_iter.__next__())
except StopIteration:
complete = True
break
Above is how we will iterate our IP generator function.
Create an Asyncio Task For Each IP
for i in range(IP_COUNT_PER_GATHER):
async_tasks.append(asyncio.create_task(check_port(str(next_group_of_ips[i]))))
We create a task for each IP port check and store a reference to the task.
Asyncio: Await Results
With multiple tasks executing at the same time, it doesn’t win much if we have to ‘await’ each individually. To solve this problem, the asyncio library has a function called ‘gather.’ See below for how I used ‘gather’ in this application:
await asyncio.gather(*async_tasks)
for i in range(IP_COUNT_PER_GATHER):
if async_tasks[i].result():
ip_addresses_found.put(str(next_group_of_ips[i]))
By ‘awaiting’ the ‘gather’ function, we are actually awaiting all tasks in the list. When all have completed, if tasks returned not None, we add it to our queue of IPs that we may want to process later.
All Together!
The whole function together looks like this:
async def main(ip_addresses_found: Queue, start: int = 1, end: int = 255):
ip_addr_iter = iter(IPAddress.get_all_ips(start, end))
complete = False
while not complete:
next_group_of_ips = []
async_tasks = []
for _ in range(IP_COUNT_PER_GATHER):
try:
next_group_of_ips.append(ip_addr_iter.__next__())
except StopIteration:
complete = True
break
for i in range(IP_COUNT_PER_GATHER):
async_tasks.append(asyncio.create_task(check_port(str(next_group_of_ips[i]))))
await asyncio.gather(*async_tasks)
for i in range(IP_COUNT_PER_GATHER):
if async_tasks[i].result():
ip_addresses_found.put(str(next_group_of_ips[i]))
Conclusion
The time has come to share my results! These will vary based on Internet latency and machine hardware.
I set my scan to do 10k IPs per batch, with a timeout on the connection of 1 second. This resulted in an average batch runtime of ~1.3 seconds.
I didn’t let it run to see how long it would actually take (running this program had major effects on our ability to use the Internet), but if you divide the total number of possible IPs by our batch size of 10k, you get ~430k batches. At 1.3 seconds per batch, that totals 558,346 seconds, or 6.46 days of constant running.
Not as bad as I originally thought 🙂
Fun fact: I first was introduced to co-routines while programming my Paper Boy game in Unity!
Thanks for reading this week! Please ‘like’ if you enjoyed the content. Feel free to leave any comments or suggestions as well!
Have you ever wondered how Python libraries are so fast? Well, it’s because they aren’t written in Python! C and C++ compose the major backbone that make Python so powerful. This week, we will look at how I adopted this idea for the spelling maze generator we built previously!
The week before last week’s post really exemplified why being able to program in C++ gives you unending power in the realm of efficiency gains. Unfortunately, however, most of the world has decided to move on to the cleaner and more abstract sandboxes of Python, mainly to reduce the development cost of building new software. I guess some people just aren’t here for the pain like others…
Anyways, this week’s post tackles how the hardcore C++ programmer can still make friends in the Python playground.
It’s Already Built
Lucky for us, people share their tunnels to brighter levels of existence — all we have to do is look and follow!
The simplest way to make your C++ program usable in the high-level language of Python is to use the Pybind11 Library.
pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code.
Pybind11 Frontpage
What does it mean?
We just have to include a header file to be able to bind functions to a module that is importable/callable from a Python script.
Where would this be useful?
I am glad you asked! We already have a project that is a perfect candidate. Do you remember when we produced spelling mazes to help my son study for spelling? Then, we rewrote it in C++ to measure the efficiency gains…which were sizeable.
Well, why not pair the two? Then we can keep our web front-end built in Python and back it up with our C++ library!
Code Time (C++, CMake)
So really, our code is going to be very minimal. All we need to do is:
Create a single function that accepts a word and saves our maze to a file
Bind that function to a Python module using Pybind11’s PYBIND11_MODULE macro
The Function (C++)
void generate_maze(std::string word, int grid_width, int grid_height, std::string file_prefix, int block_width = 20, int block_height = 20){
generator.seed(time(0));
WordMaze m(word, grid_width, grid_height, block_width, block_height);
m.save_to_png(file_prefix + word + ".png");
}
Above, you can see the function we use to create our WordMaze object in C++ and save it off to a file. You might notice we are taking in a few more parameters than we had previously discussed. This is just to make it easier to configure the maze from the Python side.
The Module Binding (C++)
PYBIND11_MODULE(SpellingMaze, m) {
m.def("generate_maze", &generate_maze, "A function to generate a maze.");
}
Could it really be that simple? Well, yes and no. This is the code that is required, but actually building it is the part that will sneak up on you (especially if you are on a Windows machine).
Building
To preface this section, I will say:
If you are building a first-party app without third party dependencies in C++, building with Pybind11 is going to be really easy (their website does an excellent job of walking you through the many ways to build).
Travis (He’s a NOOB at C++ build systems)
However, we don’t have it that easy, since we need to also link against SFML.
Story Time
So, with my little experience setting up C++ build systems (none), I set out to combine the powers of SFML and Pybind11 to create a super fast high-level Maze Generator!
SFML, Pybind11 and MSYS2 walk into a bar…
Previously, I wrote an article on how to set up a Windows build environment for compiling simple SFML applications. Looking back now, I realize there are definitely better ways to do this.
For this project, I struggled to wrap my head around including Pybind11 in my already convoluted SFML build system. From the examples given on the Pybind11 website, it was clear my CMake approach wasn’t their go-to option for building.
In fact, the CMake example they give requires that the entire Pybind11 repository be included as a submodule to my repository. So, after a few failed attempts at copy-pasting their CMake config, I took a different approach.
Side note: I like using Visual Studio Code. It is by far my favorite IDE with all of its plugins and remote development solutions. However, in this case, the CMake plugin makes things nearly impossible to understand which configuration is being generated.
Eventually, I opted for using MSYS2 directly. I was able to install the Pybind11 and SFML dependencies using the following commands:
Just like that, we can bring the power of C++ to the world of Python (even on a Windows machine!). I hope you enjoyed this post! I am always open to feedback. Let me know if there is anything you would like me to dive into, attempt, and/or explain! I am more than willing to do so. Please add your questions and suggestions in the comments below!
This week, we will create a Linux program wrapper for the ‘traceroute’ program to make tracing a network route easier for a Python user. By the end of this article, I will have demonstrated a few useful techniques in data parsing and object orientation that you might find helpful.
The Target: ‘traceroute,’ Linux, and Python
Here is an abstract view of the entire system:
In the diagram, the ‘Python Linux Layer’ is the thing we want to program. ‘Traceroute’ resides at the lowest ‘Linux’ layer.
We want to wrap the traceroute tool on a Linux system and make it usable in a Python program. ‘Traceroute’ lists the route that a network packet follows in order to retrieve the information you seek from the remote host.
A Plan
We need to be able to:
Call the program from Python and capture its data
Parse the data into an importable and usable object
The first will be relatively easy, but the second is where the work lies. If we do it in a systematic way and keep the goal in mind, however, it shouldn’t be too difficult.
Calling the ‘traceroute’ Program and Storing its Output
To explain this a little bit, we are using ‘traceroute’ in the IPV4 mode (the ‘-4’ flag).
Right now, it is printing directly to the console, which is not ideal — we need that data for parsing!
To accomplish this, we will switch the method we call from ‘call’ to ‘run.’ This method has a ‘capture_output’ parameter. According to the documentation, when ‘capture_output’ is set to True, the ‘run’ method will return an object which contains the text of the program’s output in stdout.
import subprocess
ret = subprocess.run(["traceroute", "-4", "periaptapis.net"], capture_output=True)
We can access that captured output like this:
print(ret.stdout.decode())
We use the ‘decode’ method because stdout is in bytes and we need a string (more on that here).
Wow! Only 3 lines of code and we already have the data in a place that is very accessible!
Data Parsing
One of the most useful skills I think I’ve gained from programming in Python is data parsing.
Specifically, as a firmware engineer, I produce a lot of serial log data for the boards that I work on. Without the data parsing techniques I have accumulated, I would be spending the majority of my time scrolling long text files looking for less than 1% of the information they hold. Not just once either, because I (like every other programmer) never get it right on the first try.
Data parsing is also useful elsewhere in website scraping, big data collection, and more. Once you get fast enough, it will be hard not to write a parser for everything.
Taking a Peek at our ‘traceroute’ Output Data
The 3-line program we wrote above produces the following output:
traceroute to periaptapis.net (147.182.254.138), 30 hops max, 60 byte packets
1 home (192.168.0.1) 0.267 ms 0.618 ms 0.583 ms
2 boid-dsl-gw13.boid.qwest.net (184.99.64.13) 3.425 ms 3.177 ms 3.032 ms
3 184-99-65-97.boid.qwest.net (184.99.65.97) 3.143 ms 3.458 ms 3.038 ms
4 ae27.edge6.Seattle1.Level3.net (4.68.63.133) 13.147 ms 13.216 ms 30.300 ms
5 * * *
6 * DIGITAL-OCE.edge9.SanJose1.Level3.net (4.71.117.218) 29.985 ms *
7 * * *
8 * * *
9 * * *
10 147.182.254.138 (147.182.254.138) 31.194 ms 30.308 ms 29.753 ms
Oof, that is going to be a pain to parse out the data we want. Let’s take this apart.
The first line is something we already know – we are tracing the route to ‘periaptapis.net.’ From there, each line represents a station that the packet is routed through. These “stations” are things like routers and gateways. The last line is the host that we wanted to reach.
By default, there are 3 queries per station. This helps the user know a couple of things, like average round-trip time and whether there are other hosts that respond to packets at that station number. Below is the breakdown for one of our lines of output:
The station number
The hostname and IP address of the station
Query #1 round-trip time
Query #2 round-trip time
Query #3 round-trip time
If there is an asterisk in a field, the station didn’t respond to the packet we sent. Also, there is a chance that we get a different host in any of the three queries for a given station number. This is due to network configurations to handle incoming requests.
Simplifying Output
To simplify the output of the ‘traceroute’ program and make our parsing lives easier, we will introduce the ‘-n’ and ‘-N1’ flags to our ‘traceroute’ program call.
-N1
The ‘-N1’ tells the ‘traceroute’ program to only use one packet at a time.
Sending one packet at a time is just a kindness to the devices we will be tracing so we don’t send a bunch of packets each time we want to trace a route. The downside is that the results will be slower to come in.
-n
The ‘-n’ simply tells traceroute to not print the hostnames, simplifying things further by obtaining only the IP address of the station.
With those two flags added, our call now looks like this:
ret = subprocess.run(["traceroute", "-4", "-N1", "-n", "periaptapis.net"], capture_output=True)
Our call also produces a clean output:
traceroute to periaptapis.net (147.182.254.138), 30 hops max, 60 byte packets
1 192.168.0.1 0.657 ms
2 184.99.64.13 9.288 ms
3 184.99.65.97 3.681 ms
4 4.68.63.133 14.012 ms
5 4.69.219.65 29.552 ms
6 4.71.117.218 30.166 ms
7 *
8 *
9 *
10 147.182.254.138 30.837 ms
This will make our lives much easier!
Coding an Object
Finally we get to the object creation that will be importable and usable by another Python program. We will call this object ‘TraceRouteInstance’ and it will have a single constructor input of ‘hostname_or_ip,’ which we will use in our ‘traceroute’ call.
We store our ‘traceroute’ return data in a ‘traceroute_data’ variable for later parsing. To store each of the stations, we will use a named tuple, shown below:
Station = namedtuple('Station', ['ip', 'latency(ms)'])
The syntax might look a little strange, but it enables us to reference the data in the tuple by a name which looks cleaner in the long run.
Parsing
Next we need to parse the output.
One way to do this would be by stripping leading and trailing white space on each line, then checking for a digit at the first character. Stripping the line of white space at the start will ensure that the line has a digit as its first character (if it is a line we care about).
This is how a newcomer might approach the problem.
However, to avoid so much string manipulation, I always suggest using regular expression for data extraction from lines of text. It is very difficult to understand regular expressions in the beginning, but when you understand them, they will save you headaches down the road and immensely reduce the complexity of data parsing.
Enter Regular Expression
Let’s take a look at the regular expression we will be using for this exercise:
In this, we have 3 named groups: ‘station_number,’ ‘ip_address,’ and ‘latency.’ We can use this regular expression to search a line and reference the named groups to extract the data we want.
Our parse function looks like this:
def parse_data(self):
station_regex = r"(?P<station_number>\d+) (?P<ip_address>\d+\.\d+\.\d+\.\d+) (?P<latency>\d+\.\d+) ms"
for line in self.traceroute_data.split("\n"):
re_match = re.search(station_regex, line)
if re_match:
ip_address = re_match.group("ip_address")
latency = float(re_match.group("latency"))
self.route_list.append(Station(ip_address, latency))
elif '*' in line:
self.route_list.append(Station('*', '*'))
We take the output line-by-line, searching the line for our regular expression. If the regular expression is found, we use the named groups to extract the data from the line, adding it to our ‘route_list.’
If we don’t find the regular expression but we see an asterisk, we assume that the station didn’t respond and add a default value to our ‘route_list.’
Final Object Code
Finally, our importable ‘traceroute’ Python wrapper looks like this:
import subprocess
import re
from collections import namedtuple
Station = namedtuple('Station', ['ip', 'latency_ms'])
class TraceRouteInstance:
def __init__(self, hostname_or_ip: str):
self.traceroute_data = subprocess.run(
["traceroute", "-4", "-N1", "-n", hostname_or_ip],
capture_output=True
).stdout.decode()
self.route_list = []
self.parse_data()
def parse_data(self):
station_regex = r"(?P<station_number>\d+) (?P<ip_address>\d+\.\d+\.\d+\.\d+) (?P<latency>\d+\.\d+) ms"
for line in self.traceroute_data.split("\n"):
re_match = re.search(station_regex, line)
if re_match:
ip_address = re_match.group("ip_address")
latency = float(re_match.group("latency"))
self.route_list.append(Station(ip_address, latency))
elif '*' in line:
self.route_list.append(Station('*', '*'))
As a simple test, let’s write a main function in this script that utilizes the above object and prints out the trace list.
if __name__ == "__main__":
tr = TraceRouteInstance("periaptapis.net")
print(tr.route_list)
If you run this, it will eventually produce the following output:
This output shows that we have successfully extracted the route that a packet would take from my machine to requesting data from ‘periaptapis.net.’
Conclusion
In this post, we explored wrapping a Linux program with a Python interface to make it more accessible by another Python program. If you run this program, however, you will notice how long an object takes to be created. This is due to the length of time it takes for a route to be traced by using ‘traceroute.’
In a future post, we might explore options on how to mitigate the effects of this load time for external programs by using the ‘asyncio’ built-in Python library or by multi-threading.
I hope you enjoyed this week’s post! Please add any comments, suggestions, or questions that you may have! If you are interested in other articles I have written, check out these links:
Has my ego inflated in the last couple of weeks from programming in C++? Maybe, maybe not… All I know is that C++ is leagues faster than Python. Don’t believe me? Well, I am glad you are here! This week, we will be converting a Python program I wrote and documented here to its C++ counterpart.
At the end of this post, I hope to have concrete evidence of just how much faster C++ can be! Let’s dive in!
Measurement
First, we must take measurements on the portion of code we will be converting. If you haven’t read the original article here, I recommend you give it a peek. Here’s a snippet of how the original program generates mazes:
for i in range(args.num_to_generate):
maze = WordMaze(args.word, args.grid_width, args.grid_height, args.pixel_width, args.pixel_height, args=args)
maze.save_image(args.filename.split(".")[0] + "_" + str(i) + ".png")
We will be timing the maze generation and the image storage. To do this, we will utilize the following generator function for a set of input values:
def test_maze_inputs():
words = ["Hello", "Random", "Testing", "Maze", "Generation"]
grid_widths = [(20, 20), (40, 40), (80, 80), (160, 160)]
for word in words:
for width in grid_widths:
yield argparse.Namespace(word=word, grid_width=width[0], grid_height=width[1], pixel_width=20, pixel_height=20)
We will be building mazes at the sizes of 20 x 20, 40 x 40, 80 x 80, and 160 x 160. These will be pixel sizes 400 x 400, 800 x 800, 1600 x 1600, and 3200 x 3200, respectively. The values chosen are arbitrary and all runs will be done on the same system.
Below is the timing test run code:
if __name__ == "__main__":
# args, _ = parseargs()
maze_generation_times = []
image_save_times = []
for test_input in test_maze_inputs():
print(f"Testing Inputs: {test_input}")
# Time Maze Generation
start_time = time.perf_counter()
maze = WordMaze(test_input.word, test_input.grid_width, test_input.grid_height, test_input.pixel_width, test_input.pixel_height)
stop_time = time.perf_counter()
total_time = stop_time - start_time
maze_generation_times.append(total_time)
# Time Maze Saving
start_time = time.perf_counter()
maze.save_image("maze.png")
stop_time = time.perf_counter()
total_time = stop_time - start_time
image_save_times.append(total_time)
# Print our table
print("CSV:\n")
header_printed = False
for ti, test_input in enumerate(test_maze_inputs()):
if not header_printed:
for key in test_input.__dict__:
print(key, end=",")
print("Generation Time,Image Save Time,")
header_printed = True
for key in test_input.__dict__:
print(test_input.__dict__[key], end=",")
print(f"{maze_generation_times[ti]},{image_save_times[ti]},")
It will print out a nice CSV-style output for us to curate in Excel:
All right, we have our times to beat! Let’s convert some code!
C++ FTW… Maybe
To make this as fair as possible, I will do my best not to adjust the design of the program. I will simply translate it to C++ and run the same benchmarks.
One does not simply translate from Python to C++
Basically Anyone (except Travis)
Here I am a week later, literally. I have a functioning C++ program that generates the same output you would expect to see from the Python script in my previous post.
Oh boy, it even looks like it has a memory leak! Let’s go ahead and try to fix that up really quickly….
DONE! My brain is fried. I am going to simply let the results speak for themselves:
Python vs C++ Results
Python and C++ clearly diverge even at the first data point. On the bar graph, you can see the stark differences by maze size. For the largest maze size in this test (3200 x 3200 pixels), Python was about 4 times slower than C++!
On the scatter plot, you can see the time differences plotted out by actual square pixels. The linear fits of the data are pretty good, with R-squared values slightly exceeding 0.99. The linear fit equations could be useful for estimating time values for other maze sizes that we did not test.
The desired maze size (in square pixels) would be substituted for the ‘x,’ and solving for the ‘y’ would result in a time estimate. You can see that the slope (time divided by square pixels) of the Python algorithm is steeper than C++, further showcasing C++’s efficiency.
Conclusion
This was an excellent learning experience for me. In the process of rewriting my code, I got a close-up look at various inconsistencies and poor algorithmic choices that came along with my first implementation.
I have yet to fix these things, as it was important to me to have an apples-to-apples comparison between the two codebases. If you compare the GitHub repositories (Python vs C++), you will find there are very few design differences between the two.
This week, we are generating feed-forward neural networks which learn through a genetic algorithm. These networks are much easier to understand than their back-propagation counterparts. Please join me on this step-by-step guide for simplistic neural networks as they learn through generations of selection.
Neural Networks
Neural networks can be a difficult subject to dive into during ground-up implementation. Here, I will explain a simple ‘feed-forward’ network, as we do not need back-propagation for this article. Fortunately, back-propagation is the only part of a neural net which requires deep knowledge of calculus.
So, if you fear feeling dumb like me, fret not! Multiplication is all we need!
I chose evolution because it requires less math 😉
Not Charles Darwin
Feed-Forward Neural Networks
A feed-forward neural network only feeds data in one direction across the network.
The above diagram shows a neural network with 2 inputs, 4 hidden layer neurons, and 2 output neurons. The data flows from left to right. At each step, the data is modified in a parallel fashion by each neuron.
A neuron simply takes in all of the given inputs, multiplies those inputs by a paired weight stored by the neuron, and sums these products together. The sum of that addition is passed through a sigmoid function, which in turn is the neuron’s output.
Looking back at our neural network diagram, the hidden layer neurons each have 2 inputs. So, each neuron will store 2 weights. Therefore, the hidden layer stores 8 weights in total.
Neural Networks and Genetic Algorithms
The basic rule which governs my algorithm is:
At the start of each generation, select the highest performing agents from the previous generation and combine their neural networks to form the next generation’s agents.
The important factors are 1) how you select the top performers, and 2) how the next generation is stitched together from the previous winners.
Code Time
The code base for this project has a lot of additional functionality that is necessary to facilitate agents’ learning. However, to keep this post short and sweet, I am going to focus on the key areas that make this process work.
The Problem to Solve
The problem which our agents must solve is simple. They must get as close to the goal as possible, given the number of ‘ticks’ in a generation.
Overview
A ‘tick’ is a time step allotted to each agent in which they can choose which direction to move and how far that move will be.
When all ticks have been counted, the program orders the agents by least to greatest distance from the goal. Next, it selects a few agents from the top (closest to goal) to seed our next generation. This process continues until we reach the last configured generation.
Main Loop
for (int generation = 0; generation < NUM_GEN; generation++)
{
// Setup our generations Agents
if (!closest)
// Setup our initial generation of agents
else ...
// Stitch our next set of agents from those that came closest from the previous generation
// Run our simulation
run_sim(*agents, goal);
// Rank our agents and take the configured number of top performers
closest = get_closest_agents(*agents, goal, generation, NUM_AGENTS_SELECTED_EACH_GENERATION);
if (PRINT_GENERATION_PERFORMANCE)
std::cout << closest->at(0)->distance << "," << mutation_chance << std::endl;
}
Our main loop is simple:
Set up the generation’s agents.
Run the simulation.
Rank the agents by final distance to the goal.
setting up the neural networks
The first generation is a set of agents. Each agent has a neural network that is initialized with random weight values and contains the following elements:
2 inputs (distance to goal in x- and y-directions)
10 hidden neurons
4 outputs
Two outputs represent the x- and y-distances that the agent wants to move, each between 0-1. The other two outputs dictate whether the direction of movement is positive or negative.
The following code implements agent creation:
for (int agent_i = 0; agent_i < NUM_AGENTS_PER_GEN; agent_i++)
{
// Check if we are basing our agents on anything
if (based_on)
{
int choice1 = get_rand_int(0, based_on->size() - 1), choice2 = get_rand_int(0, based_on->size() - 1);
while (choice2 == choice1)
choice2 = get_rand_int(0, based_on->size() - 1);
// Using default mutation delta
agents->push_back(new Agent(new Position(start_pos->x, start_pos->y), based_on->at(choice1)->agent, based_on->at(choice2)->agent, mt, mutation_chance));
}
else
{
// Create agents with two sensors, distance from goal x and y
agents->push_back(new Agent(new Position(start_pos->x, start_pos->y), 2));
}
}
Simulation Run
The simulation runs for the number of ticks configured. At each tick, it “asks” the agent for a desired distance and direction, based on the agent’s current distance from the goal.
void run_sim(std::vector<Agent *> agents, Position *goal)
{
for (int tick = 0; tick < NUM_TICKS_PER_GEN; tick++)
{
for (Agent *a : agents)
{
// Sense the Agents distance from the goal
float sensors[2] = {
(a->positions.back()->x - goal->x) / BOUNDARY_EDGE_LENGTH,
(a->positions.back()->y - goal->y) / BOUNDARY_EDGE_LENGTH};
// Ask the Agent what it wants to do
a->move(sensors);
}
}
}
A Few More Specifics
Selecting top performers
It is important to note that the way you select your ‘winners’ greatly determines the behavior of your agents. My program selects winners in the following code:
std::vector<AgentDistancePair *> *agent_distance_pairs = new std::vector<AgentDistancePair *>;
for (Agent *a : agents)
agent_distance_pairs->push_back(new AgentDistancePair(a, get_distance(*(a->positions.back()), *goal)));
The algorithm pairs each agent’s pointer to its respective distance from the goal. This pairing is encapsulated in the ‘AgentDistancePair’ object. Then, the program sorts the vector filled with these pairings using the following function:
Finally, we reduce the number in the list to the number specified by the caller. In other words, how many of the top performers do we want? At the same time, we are being careful to manage our memory properly.
while (agent_distance_pairs->size() > num_to_find)
{
delete agent_distance_pairs->back();
agent_distance_pairs->pop_back();
}
All together, the whole function looks like this:
std::vector<AgentDistancePair *> *get_closest_agents(std::vector<Agent *> agents, Position *goal, int generation_number, int num_to_find = 2)
{
std::vector<AgentDistancePair *> *agent_distance_pairs = new std::vector<AgentDistancePair *>;
for (Agent *a : agents)
agent_distance_pairs->push_back(new AgentDistancePair(a, get_distance(*(a->positions.back()), *goal)));
sort(agent_distance_pairs->begin(), agent_distance_pairs->end(), compare_agent_distance_pair);
if (DRAW_GENERATION_PERFORMANCE && (generation_number % DRAW_EVERY_NTH_GENERATION) == 0 && DRAW_FULL_POPULATION)
draw_agent_path(agent_distance_pairs, goal, generation_number);
while (agent_distance_pairs->size() > num_to_find)
{
delete agent_distance_pairs->back();
agent_distance_pairs->pop_back();
}
if (DRAW_GENERATION_PERFORMANCE && (generation_number % DRAW_EVERY_NTH_GENERATION) == 0 && !DRAW_FULL_POPULATION)
draw_agent_path(agent_distance_pairs, goal, generation_number);
return agent_distance_pairs;
}
Stitching together agents of the past
In a genetic algorithm, the neural networks learn by being stitched with other top performers, sprinkled with a bit of randomness. In order to stitch the weights together, we choose 1 of 3 different methods:
EveryOther
SingleSplit
RandomChoice
EveryOther
‘EveryOther’ method selects each weight of the child agent by alternating evenly between taking parent weights, as shown in the diagram below:
SingleSplit
‘SingleSplit’ method uses a randomly selected weight index as a point where the responsibility for providing weights to the child switches to the other parent. You can see an example of this happening at the ‘Split Line’ in the diagram below:
RandomChoice
‘RandomChoice’ method randomly selects weights from both parents as it builds the child weights. No diagram needed ;).
Conclusion
This is Generation #5 of a ‘SingleSplit’ method simulation. The green dot is the selected agent of the generation and the yellow dot is its goal!
This has been a huge learning experience for me! I will definitely be adding tiny brains to many things in the future.
If you are interested in exploring or cloning this code, you can access it on my Github here. Thank you for reading! I hope you found this helpful and thought-provoking! As always, feel free to leave a comment or suggestion below.
Further reading on my work in simulations is available at these links: