ChatGPT in an Embedded World: Quickly Making A Quote Display

Image of a construction worker standing on a roof with his hands on his hips, staring out at the treeline. Quote says, "ChatGPT, Build Something Here."

This week, I built a small embedded program with ChatGPT at the proverbial coder’s keyboard. The only coding I did was to correct small mistakes and to integrate the chunks I asked ChatGPT to generate. I am still experimenting with the various ways I can communicate with ChatGPT, so bear with me as I try some new things.

For our project, we will use an ESP32 and an OLED screen to display famous quotes retrieved from the Internet. The design will be roughly like this:

We won’t concern ourselves with smaller details, like how the user might configure the ESP32 to connect to a wireless network. At this point, we are concerned with standing up the ‘core technology’ of our ‘product.’

ESP32 Quote Fetching (Written by ChatGPT)

First things first, we need to get our ESP32 to retrieve quotes from the Internet. I was originally dubious about ChatGPT’s ability to write an application that could pull these quotes, so I started by asking it to write a program in Python. With a Python script, I could verify functionality before adding steps like compilation and flashing to the mix.

After proving out its abilities in Python, I asked ChatGPT to write the same program in C. You can see our conversation below. Notice the back-and-forth nature of the conversation, including corrections and specifications I made to ChatGPT:

Travis: Can you write the same program in C for a WROOM ESP32?

ChatGPT: Yes, I can! Here's an example of how you could implement the same program in C for an ESP32 module using the ESP-IDF framework:

...CODE BLOCK...

Travis: This is for esp32 so needs a loop and setup

ChatGPT: You're right, I apologize for that oversight. Here's an updated version of the program that includes a setup and loop function for the ESP32:

...CODE BLOCK...

Travis: Can you replace the printf with a serial out

ChatGPT: Sure, here's the modified program with serial output instead of printf:

...CODE BLOCK...

Travis: Sorry I meant the arduino serial out

ChatGPT: No problem! Here's the modified program with serial output using the Arduino Serial library:
#include <Arduino.h>
#include <WiFi.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include <stdlib.h>
#include <time.h>

const char* ssid = "your_SSID";
const char* password = "your_PASSWORD";

void fetch_quote(char *quote, char *author) {
  WiFiClient client;
  HTTPClient http;
  http.begin(client, "https://api.quotable.io/random");
  int httpCode = http.GET();
  if (httpCode != HTTP_CODE_OK) {
    Serial.printf("HTTP request failed: %s\n", http.errorToString(httpCode).c_str());
    http.end();
    return;
  }
  String payload = http.getString();
  DynamicJsonDocument doc(1024);
  deserializeJson(doc, payload);
  String content = doc["content"].as<String>();
  String author_item = doc["author"].as<String>();
  content.toCharArray(quote, 256);
  author_item.toCharArray(author, 256);
  http.end();
}

void setup() {
  Serial.begin(115200);
  while (!Serial);
  Serial.println("Serial connected!");
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi...");
  }
  Serial.println("WiFi connected!");
  srand(time(NULL));
}

void loop() {
  char quote[256];
  char author[256];
  fetch_quote(quote, author);
  char output[512];
  sprintf(output, "%s\n- %s\n", quote, author);
  Serial.println(output);
  delay(1000);
}

void setup() {
  Serial.begin(115200);
  while (!Serial);
  Serial.println("Serial connected!");
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi...");
  }
  Serial.println("WiFi connected!");
  srand(time(NULL));
}

void loop() {
  char quote[256];
  char author[256];
  fetch_quote(quote, author);
  char output[512];
  sprintf(output, "%s\n- %s\n", quote, author);
  Serial.println(output);
  delay(1000);
}

After a quick attempt at compiling, I see that I don’t have the ‘ArduinoJson.h’ that ChatGPT wants to use, so I ask how to attain that in the following snippet of the conversation:

Travis: How do I use the ArduinoJson.h?

ChatGPT: ArduinoJson.h is a library that provides a simple and efficient way to parse and generate JSON data on Arduino and other embedded systems. To use it, you'll need to follow these steps:

    Download and install the library. You can do this by going to the "Sketch" menu in the Arduino IDE, selecting "Include Library", and then "Manage Libraries". In the Library Manager, search for "ArduinoJson" and click "Install" to install the latest version.

...MORE INSTRUCTIONS (NOT NEEDED FOR OUR CASE)...

Once I followed ChatGPT’s instruction, I found one final bug: There were two ‘setup’ and ‘loop’ blocks. Once I fixed this simple mistake, I then had a program that could be compiled. The next exciting step would be to flash it to the ESP32 and then see if the serial monitor would start displaying quotes!

Attempt #1 Output

Serial connected!
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
WiFi connected!
HTTP request failed: connection lost
xV���k@T@0
- ��@�?

HTTP request failed: connection lost
xV���k@T@0
- ��@�?

...MORE OF THIS...

The ESP32 connected to WiFi. It looks like it attempted an HTTP request, but the connection was lost. Let’s ask ChatGPT what is up…

Travis: I am getting 'HTTP request failed: connection lost'

ChatGPT: The error "HTTP request failed: connection lost" could indicate that the ESP32 is losing its Wi-Fi connection during the HTTP request. This could happen if the Wi-Fi signal is weak or unstable.

To troubleshoot this issue, you could try the following steps:

   1. Make sure that the ESP32 is connected to a stable Wi-Fi network with a strong signal.

   2. Increase the timeout for the HTTP request by calling the setTimeout method on the HTTPClient object. For example:

...CODE BLOCK...

This will give the ESP32 more time to complete the HTTP request before timing out.

    3. Try using a different HTTP client library, such as the WiFiClient library, to make the HTTP request. For example:

...CODE BLOCK...

This code uses the WiFiClient library to make an HTTP GET request to the API server. Note that this code does not use HTTPS, so it is less secure than the previous code that used WiFiClientSecure. However, it may be more reliable in some situations.

I hope this helps! Let me know if you have any other questions.

ChatGPT gave me a list of debug options. Super helpful!

I took ChatGPT’s list out of order and honed in on a small detail in the ‘WiFiClientSecure’ debug option. I noticed in the original code that ChatGPT was accessing the quotable API using an HTTPS connection. I simply removed the ‘s’ from ‘https’ to access it without SSL security and… (drum-roll, continue onto the next section please!)…

Attempt #2 Output, Nicely Done ChatGPT!

Serial connected!
Connecting to WiFi...
Connecting to WiFi...
Connecting to WiFi...
WiFi connected!
Never find fault with the absent.
- Alexander Pope

One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man.
- Elbert Hubbard

Real success is finding your lifework in the work that you love.
- David McCullough

A dream is your creative vision for your life in the future. You must break out of your current comfort zone and become comfortable with the unfamiliar and the unknown.

We have quotes!

ESP32 OLED Display Output

All right, now we need to set up the quotes we get from the Internet to display on our OLED screen.

Check out my continued conversation with ChatGPT as I ask for help on the display portion of the project.

Travis: I have a 0.96" OLED screen, are you familiar with them?

ChatGPT: Yes, I'm familiar with 0.96" OLED screens! They are commonly used with microcontrollers and are great for displaying text and graphics in small projects. There are a variety of libraries available for interfacing with OLED screens, so depending on the specific screen you have, there may be different libraries that work best.

If you have any questions about using an OLED screen with your project, feel free to ask!

Travis: I would like to display the fetched quote on the OLED screen instead of serial out

ChatGPT: Sure, here's an example code snippet that shows how you can display the fetched quote on a 0.96" OLED screen using the Adafruit_SSD1306 library:

...CODE BLOCK (See Below)...

In this example code, we're using the Adafruit_SSD1306 library to initialize and interact with the OLED screen. The fetched quote is then displayed on the OLED screen using the display object's println() and setCursor() methods.

Note that you'll need to modify the API_HOST, API_URL, your_SSID, and your_PASSWORD variables to match your specific requirements.

Travis: What pins are used for the OLED screen on the ESP32?

ChatGPT: The pins used for the OLED screen on the ESP32 depend on the specific breakout board you're using. However, many 0.96" OLED screens for the ESP32 use the I2C protocol to communicate with the microcontroller.

Here's an example pinout for the popular ESP32 DevKitC board and an OLED screen using the I2C protocol:

...TABLE (See Below)...

Note that the pin names on your OLED screen may be different depending on the manufacturer, so it's important to consult the documentation for your specific screen to ensure you're using the correct pins. Additionally, you may need to install additional libraries to interface with the OLED screen depending on the specific breakout board you're using.

To summarize our conversation, I asked if ChatGPT was familiar with the OLED screen I was using. Then, I asked if ChatGPT could generate the code to display the quote fetched on the OLED screen. Finally, I asked how I should wire my ESP32 to the OLED screen.

Below you will find the code and pin-out table that ChatGPT provided me. I noticed a couple of things, namely:

  • ChatGPT has nearly zero deterministic behavior between code generations. Each one was noticeably unique.
  • Anecdotally, ChatGPT seems to have organized the code better with the added complexity of displaying to the OLED screen.
#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>

#define SCREEN_WIDTH 128 // OLED display width, in pixels
#define SCREEN_HEIGHT 64 // OLED display height, in pixels
#define OLED_RESET -1    // Reset pin # (or -1 if sharing Arduino reset pin)
#define SCREEN_ADDRESS 0x3C // I2C address of the OLED screen
#define API_HOST "api.quotable.io"
#define API_URL "/random"

Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);

void setup() {
  Serial.begin(115200);

  // Initialize OLED display
  if(!display.begin(SSD1306_SWITCHCAPVCC, SCREEN_ADDRESS)) {
    Serial.println("SSD1306 allocation failed");
    while (1);
  }

  // Clear the display buffer
  display.clearDisplay();

  // Set text color to white
  display.setTextColor(SSD1306_WHITE);

  // Set font size
  display.setTextSize(1);

  // Connect to Wi-Fi network
  WiFi.begin("your_SSID", "your_PASSWORD");
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi...");
  }
  Serial.println("Connected to WiFi");
}

void loop() {
  // Make HTTP request to API
  HTTPClient http;
  http.begin(API_HOST, 80, API_URL);
  int httpCode = http.GET();
  if (httpCode == HTTP_CODE_OK) {
    String payload = http.getString();

    // Parse JSON response
    const size_t bufferSize = JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 160;
    DynamicJsonDocument doc(bufferSize);
    deserializeJson(doc, payload);
    const char* quote = doc["content"];
    const char* author = doc["author"];

    // Display quote on OLED screen
    display.clearDisplay();
    display.setCursor(0, 0);
    display.println(quote);
    display.setCursor(0, 20);
    display.println("- " + String(author));
    display.display();
  } else {
    Serial.println("HTTP request failed");
  }

  // Wait 10 seconds before making the next request
  delay(10000);
}
ESP32 PinOLED Pin
GPIO21SDA
GPIO22SCL
3.3VVCC
GNDGND
Wiring table provided by ChatGPT

Display, Attempt #1 Output

I am blown away by the results of this. It worked! There was only one flaw: When a quote was too long, it wrapped around the screen over the top of the author’s name. This display problem was due to ChatGPT’s use of hard-coded location coordinates. I could see two solutions right out of the gate:

  1. Have the quote scroll across the screen instead of wrapping.
  2. Have the author’s name appended to the quote with a new line just before it.

Either of these would have worked, but both required too much coding for me. So, I asked ChatGPT to do it!

Travis: The text wraps and is colliding with the authors name, how can I resolve this?

ChatGPT: To prevent text from colliding with other elements on your OLED screen, you may need to adjust the font size, spacing, or position of your text.

Here are some suggestions for resolving the text wrapping issue:

List Items 1..2

3. Move the author's name to a different location on the OLED screen, such as below the quote text or in a separate area of the screen. This will create more space for the quote text to be displayed without colliding with other elements.

List Items 4..5

Travis: How would I implement option #3

ChatGPT: To move the author's name to a different location on the OLED screen, you can adjust the Y position of the text using the display.drawString() function. Here's an example of how to move the author's name to be displayed below the quote text:

...CODE BLOCK...

In this example, we first display the quote text using the drawStringMaxWidth() function, which automatically wraps the text based on the width of the OLED screen. We then use the setCursor() function to move the cursor to a new line below the quote text, and display the author's name using the print() function.

Note that you may need to adjust the Y position of the cursor depending on the size of your OLED screen and the amount of spacing you want between the quote text and the author's name.

Travis: How would I dynamically change the authors line to be below the quote text

ChatGPT: If you want to dynamically change the position of the author's name based on the length of the quote text, you can calculate the height of the quote text and adjust the Y position of the author's name accordingly.

Here's an example of how to do this:

...MORE DEBUG (2 More Rounds of Questions)...

Travis: Is there a way to simply get the cursors location after a println?
 
ChatGPT: In Arduino, you can use the getTextBounds() function to get the size of the text that was just printed. You can then use this information to calculate the position of the next line of text.

Here's an example:

...CODE BLOCK...

In this example, we start by printing the first line of text. We then use the getTextBounds() function to get the size of the printed text. The function takes six parameters: the string of text to measure, the x and y coordinates of the cursor, and four output variables that will contain the position and size of the text.

After getting the size of the printed text, we calculate the position of the next line of text by adding some padding between lines. Finally, we print the second line of text at the new cursor position.

ChatGPT’s Solution to the Text Collision

Basically, I asked ChatGPT how to resolve the text collision with the author’s name and it gave me a list of suggestions. I opted for moving the author’s name below the quote on the screen. I then asked ChatGPT to generate the code for me. The new code broke my compile by requiring things that were non-existent in the libraries I have. After a few more questions, I had an example of a way to do it without using the libraries it had suggested. Below is the example that got me “un-stuck”:

int x, y; // x and y coordinates of the cursor

// Print the first line of text
display.println("Hello, world!");

// Get the size of the printed text
int16_t x1, y1;
uint16_t w, h;
display.getTextBounds("Hello, world!", x, y, &x1, &y1, &w, &h);

// Calculate the position of the next line of text
x = x + w + 5; // add some padding between lines
y = y + h;

// Print the second line of text
display.println("This is the second line.");

I couldn’t merely copy and paste the above snippet of ChatGPT’s code into the overall program. Instead, I had to use it as a general example to guide me as I implemented my own solution. As the architect rather than the technical developer, this was almost too much coding work, but it was still helpful for finishing out the quote display sketch. The final code is below:

#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>

#define SCREEN_WIDTH 128 // OLED display width, in pixels
#define SCREEN_HEIGHT 64 // OLED display height, in pixels
#define OLED_RESET -1    // Reset pin # (or -1 if sharing Arduino reset pin)
#define SCREEN_ADDRESS 0x3C // I2C address of the OLED screen
#define API_HOST "api.quotable.io"
#define API_URL "/random"

Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);

void setup() {
  Serial.begin(115200);

  // Initialize OLED display
  if(!display.begin(SSD1306_SWITCHCAPVCC, SCREEN_ADDRESS)) {
    Serial.println("SSD1306 allocation failed");
    while (1);
  }

  // Clear the display buffer
  display.clearDisplay();

  // Set text color to white
  display.setTextColor(SSD1306_WHITE);

  // Set font size
  display.setTextSize(1);

  // Connect to Wi-Fi network
  WiFi.begin("your_SSID", "your_PASSWORD");
  while (WiFi.status() != WL_CONNECTED) {
    delay(1000);
    Serial.println("Connecting to WiFi...");
  }
  Serial.println("Connected to WiFi");
}

void loop() {
  // Make HTTP request to API
  HTTPClient http;
  http.begin(API_HOST, 80, API_URL);
  int httpCode = http.GET();
  if (httpCode == HTTP_CODE_OK) {
    String payload = http.getString();

    // Parse JSON response
    const size_t bufferSize = JSON_OBJECT_SIZE(2) + JSON_OBJECT_SIZE(3) + 160;
    DynamicJsonDocument doc(bufferSize);
    deserializeJson(doc, payload);
    const char* quote = doc["content"];
    const char* author = doc["author"];

    display.clearDisplay();
    display.setCursor(0, 0);
    display.println(quote);

    // Get the size of the printed text
    int x, y; // x and y coordinates of the cursor
    int16_t x1, y1;
    uint16_t w, h;
    display.getTextBounds(quote, x, y, &x1, &y1, &w, &h);

    // Calculate the position of the next line of text
    x = x + w + 5; // add some padding between lines
    y = y + h;


    display.setCursor(x, y);
    display.println("- " + String(author));
    display.display();
    

  } else {
    Serial.println("HTTP request failed");
  }

  // Wait 10 seconds before making the next request
  delay(10000);
}

Conclusion

I was able to get a functioning copy of the core technology of this project in under 2 hours. Granted, I did need some background knowledge in coding to be able to integrate and get required libraries. Keep in mind though that I still got stuck, but ChatGPT was able to guide me through each time.

Image shows a fully assembled version of the system with functioning code generated by ChatGPT.

So far, I am concerned about a few things after using ChatGPT:

  • Does it actually make things faster?
  • Blind acceptance is problematic.
  • I’m not so keen to be reliant on something which exists exclusively online.

Let’s quickly address these three points.

Does it actually make things faster?

I have to question how much faster this actually makes an engineer. In the case of this morning, I think I completed the goal faster than I might have otherwise. However, there is a case to be made that the large amount of output from ChatGPT can easily make an engineer feel like they are making headway, when in fact they are not.

Blind acceptance is problematic (ChatGPT might have your back)

In the article today, ChatGPT offered a few different API’s for generating quotes from the Internet. This is a problem if ChatGPT’s training data happened to include malicious APIs and someone blindly accepted it as “working.” The same concern applies to libraries suggested by ChatGPT.

Reliance on something that exists only online

This concern is probably the least troublesome of all three, as most developers will presumably have an Internet connection. However, if you are relying on ChatGPT for the majority of your work, you may be lost if your Internet connection is compromised. In the future, it would be awesome to have an onsite version of the Large Language Model to ensure engineers have guaranteed access. A model developer firm could license local versions (based on a general set of training data) to individual corporations. The client companies could then continue to train their individually-licensed models on their own internal data. Intellectual property would be safeguarded and models would be more versatile and accurate when responding to user needs.

Wrapping Up

Thank you for reading this week’s post! It was really quite amazing working with ChatGPT. I learned some new and crazy things, like the Python “Easter egg” built-in library called this that contains a poem written by Tim Peters (?!).

As always, please feel free to leave a comment below!

-Travis

My Other Work

C++ Wrapped in Python: A Daring but Beautiful Duo

Image depicting a coal miner picking away at a ceiling just below a children's playground. The coal miner represents C++ and the children's playground represents the land of Python.

Have you ever wondered how Python libraries are so fast? Well, it’s because they aren’t written in Python! C and C++ compose the major backbone that make Python so powerful. This week, we will look at how I adopted this idea for the spelling maze generator we built previously!

The week before last week’s post really exemplified why being able to program in C++ gives you unending power in the realm of efficiency gains. Unfortunately, however, most of the world has decided to move on to the cleaner and more abstract sandboxes of Python, mainly to reduce the development cost of building new software. I guess some people just aren’t here for the pain like others…

Anyways, this week’s post tackles how the hardcore C++ programmer can still make friends in the Python playground.

It’s Already Built

Cartoon depiction of a man standing next to a wheel with his arm resting at the peak. There is text in the upper right corner asking "Is This New?" The intention being to ask if we are introducing anything new to C++.

Lucky for us, people share their tunnels to brighter levels of existence — all we have to do is look and follow!

The simplest way to make your C++ program usable in the high-level language of Python is to use the Pybind11 Library.

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code.

Pybind11 Frontpage

What does it mean?

We just have to include a header file to be able to bind functions to a module that is importable/callable from a Python script.

Where would this be useful?

I am glad you asked! We already have a project that is a perfect candidate. Do you remember when we produced spelling mazes to help my son study for spelling? Then, we rewrote it in C++ to measure the efficiency gains…which were sizeable.

Well, why not pair the two? Then we can keep our web front-end built in Python and back it up with our C++ library!

Code Time (C++, CMake)

So really, our code is going to be very minimal. All we need to do is:

  • Create a single function that accepts a word and saves our maze to a file
  • Bind that function to a Python module using Pybind11’s PYBIND11_MODULE macro

The Function (C++)

void generate_maze(std::string word, int grid_width, int grid_height, std::string file_prefix, int block_width = 20, int block_height = 20){
    generator.seed(time(0));
    WordMaze m(word, grid_width, grid_height, block_width, block_height);
    m.save_to_png(file_prefix + word + ".png");
}

Above, you can see the function we use to create our WordMaze object in C++ and save it off to a file. You might notice we are taking in a few more parameters than we had previously discussed. This is just to make it easier to configure the maze from the Python side.

The Module Binding (C++)

PYBIND11_MODULE(SpellingMaze, m) {
    m.def("generate_maze", &generate_maze, "A function to generate a maze.");
}

Could it really be that simple? Well, yes and no. This is the code that is required, but actually building it is the part that will sneak up on you (especially if you are on a Windows machine).

Building

To preface this section, I will say:

If you are building a first-party app without third party dependencies in C++, building with Pybind11 is going to be really easy (their website does an excellent job of walking you through the many ways to build).

Travis (He’s a NOOB at C++ build systems)

However, we don’t have it that easy, since we need to also link against SFML.

Story Time

So, with my little experience setting up C++ build systems (none), I set out to combine the powers of SFML and Pybind11 to create a super fast high-level Maze Generator!

SFML, Pybind11 and MSYS2 walk into a bar…

Previously, I wrote an article on how to set up a Windows build environment for compiling simple SFML applications. Looking back now, I realize there are definitely better ways to do this.

For this project, I struggled to wrap my head around including Pybind11 in my already convoluted SFML build system. From the examples given on the Pybind11 website, it was clear my CMake approach wasn’t their go-to option for building.

In fact, the CMake example they give requires that the entire Pybind11 repository be included as a submodule to my repository. So, after a few failed attempts at copy-pasting their CMake config, I took a different approach.

Side note: I like using Visual Studio Code. It is by far my favorite IDE with all of its plugins and remote development solutions. However, in this case, the CMake plugin makes things nearly impossible to understand which configuration is being generated.

Eventually, I opted for using MSYS2 directly. I was able to install the Pybind11 and SFML dependencies using the following commands:

pacman -S mingw-w64-x86_64-pybind11
pacman -S mingw-w64-x86_64-sfml

This made finding them as easy as including these lines in my CMakeLists.txt:

find_package(SFML 2.5 COMPONENTS system window graphics network audio REQUIRED)
find_package(pybind11 REQUIRED)

Here is our final CMakeLists.txt file:

cmake_minimum_required(VERSION 3.12 FATAL_ERROR)

project(SpellingMaze VERSION 0.1)

find_package(SFML 2.5 COMPONENTS system window graphics network audio REQUIRED)
find_package(pybind11 REQUIRED)

pybind11_add_module(SpellingMaze src/spelling_maze.cpp)

target_link_libraries(SpellingMaze PRIVATE sfml-graphics)

Conclusion

Just like that, we can bring the power of C++ to the world of Python (even on a Windows machine!). I hope you enjoyed this post! I am always open to feedback. Let me know if there is anything you would like me to dive into, attempt, and/or explain! I am more than willing to do so. Please add your questions and suggestions in the comments below!

Thanks!

-Travis

Traceroute in Python: A Powerful Wrapper Guide Exposed

Image shows penguin wrapped by a Python. This is supposed to convey the main idea of this article, which is to create a Linux program wrapper with the traceroute program to make tracing a network route easier for a Python user.
Part of this image was generated by DALL-E 2

This week, we will create a Linux program wrapper for the ‘traceroute’ program to make tracing a network route easier for a Python user. By the end of this article, I will have demonstrated a few useful techniques in data parsing and object orientation that you might find helpful.

The Target: ‘traceroute,’ Linux, and Python

Here is an abstract view of the entire system:

Image shows entire system from a high-level view. 'Python Business Logic' is at the top. 'Python Linux Layer' is in the middle. The 'Linux' layer is at the bottom. Although not shown in the diagram, the traceroute program resides in the Linux layer.

In the diagram, the ‘Python Linux Layer’ is the thing we want to program. ‘Traceroute’ resides at the lowest ‘Linux’ layer.

We want to wrap the traceroute tool on a Linux system and make it usable in a Python program. ‘Traceroute’ lists the route that a network packet follows in order to retrieve the information you seek from the remote host.

A Plan

We need to be able to:

  • Call the program from Python and capture its data
  • Parse the data into an importable and usable object

The first will be relatively easy, but the second is where the work lies. If we do it in a systematic way and keep the goal in mind, however, it shouldn’t be too difficult.

Calling the ‘traceroute’ Program and Storing its Output

First things first, we need to call the program…

import subprocess

subprocess.call(["traceroute", "-4", "periaptapis.net"])

Wow! That was easy!

To explain this a little bit, we are using ‘traceroute’ in the IPV4 mode (the ‘-4’ flag).

Right now, it is printing directly to the console, which is not ideal — we need that data for parsing!

To accomplish this, we will switch the method we call from ‘call’ to ‘run.’ This method has a ‘capture_output’ parameter. According to the documentation, when ‘capture_output’ is set to True, the ‘run’ method will return an object which contains the text of the program’s output in stdout.

import subprocess

ret = subprocess.run(["traceroute", "-4", "periaptapis.net"], capture_output=True)

We can access that captured output like this:

print(ret.stdout.decode())

We use the ‘decode’ method because stdout is in bytes and we need a string (more on that here).

Wow! Only 3 lines of code and we already have the data in a place that is very accessible!

Data Parsing

One of the most useful skills I think I’ve gained from programming in Python is data parsing.

Specifically, as a firmware engineer, I produce a lot of serial log data for the boards that I work on. Without the data parsing techniques I have accumulated, I would be spending the majority of my time scrolling long text files looking for less than 1% of the information they hold. Not just once either, because I (like every other programmer) never get it right on the first try.

Data parsing is also useful elsewhere in website scraping, big data collection, and more. Once you get fast enough, it will be hard not to write a parser for everything.

Taking a Peek at our ‘traceroute’ Output Data

The 3-line program we wrote above produces the following output:

traceroute to periaptapis.net (147.182.254.138), 30 hops max, 60 byte packets
 1  home (192.168.0.1)  0.267 ms  0.618 ms  0.583 ms
 2  boid-dsl-gw13.boid.qwest.net (184.99.64.13)  3.425 ms  3.177 ms  3.032 ms
 3  184-99-65-97.boid.qwest.net (184.99.65.97)  3.143 ms  3.458 ms  3.038 ms
 4  ae27.edge6.Seattle1.Level3.net (4.68.63.133)  13.147 ms  13.216 ms  30.300 ms
 5  * * *
 6  * DIGITAL-OCE.edge9.SanJose1.Level3.net (4.71.117.218)  29.985 ms *
 7  * * *
 8  * * *
 9  * * *
10  147.182.254.138 (147.182.254.138)  31.194 ms  30.308 ms  29.753 ms

Oof, that is going to be a pain to parse out the data we want. Let’s take this apart.

The first line is something we already know – we are tracing the route to ‘periaptapis.net.’ From there, each line represents a station that the packet is routed through. These “stations” are things like routers and gateways. The last line is the host that we wanted to reach.

By default, there are 3 queries per station. This helps the user know a couple of things, like average round-trip time and whether there are other hosts that respond to packets at that station number. Below is the breakdown for one of our lines of output:

Example of traceroute output, broken down by component (station number, hostname and IP address of the station, Query #1 round-trip time, Query #2 round-trip time, and Query #3 round-trip time).
  1. The station number
  2. The hostname and IP address of the station
  3. Query #1 round-trip time
  4. Query #2 round-trip time
  5. Query #3 round-trip time

If there is an asterisk in a field, the station didn’t respond to the packet we sent. Also, there is a chance that we get a different host in any of the three queries for a given station number. This is due to network configurations to handle incoming requests.

Simplifying Output

To simplify the output of the ‘traceroute’ program and make our parsing lives easier, we will introduce the ‘-n’ and ‘-N1’ flags to our ‘traceroute’ program call.

-N1

The ‘-N1’ tells the ‘traceroute’ program to only use one packet at a time.

Sending one packet at a time is just a kindness to the devices we will be tracing so we don’t send a bunch of packets each time we want to trace a route. The downside is that the results will be slower to come in.

-n

The ‘-n’ simply tells traceroute to not print the hostnames, simplifying things further by obtaining only the IP address of the station.

With those two flags added, our call now looks like this:

ret = subprocess.run(["traceroute", "-4", "-N1", "-n", "periaptapis.net"], capture_output=True)

Our call also produces a clean output:

traceroute to periaptapis.net (147.182.254.138), 30 hops max, 60 byte packets
 1  192.168.0.1  0.657 ms
 2  184.99.64.13  9.288 ms
 3  184.99.65.97  3.681 ms
 4  4.68.63.133  14.012 ms
 5  4.69.219.65  29.552 ms
 6  4.71.117.218  30.166 ms
 7  *
 8  *
 9  *
10  147.182.254.138  30.837 ms

This will make our lives much easier!

Coding an Object

Finally we get to the object creation that will be importable and usable by another Python program. We will call this object ‘TraceRouteInstance’ and it will have a single constructor input of ‘hostname_or_ip,’ which we will use in our ‘traceroute’ call.

So far, our object looks like this:

import subprocess

class TraceRouteInstance:
    def __init__(self, hostname_or_ip: str):
        self.traceroute_data = subprocess.run(
            ["traceroute", "-4", "-N1", "-n", hostname_or_ip], 
            capture_output=True
            )

We store our ‘traceroute’ return data in a ‘traceroute_data’ variable for later parsing. To store each of the stations, we will use a named tuple, shown below:

Station = namedtuple('Station', ['ip', 'latency(ms)'])

The syntax might look a little strange, but it enables us to reference the data in the tuple by a name which looks cleaner in the long run.

Parsing

Next we need to parse the output.

One way to do this would be by stripping leading and trailing white space on each line, then checking for a digit at the first character. Stripping the line of white space at the start will ensure that the line has a digit as its first character (if it is a line we care about).

This is how a newcomer might approach the problem.

However, to avoid so much string manipulation, I always suggest using regular expression for data extraction from lines of text. It is very difficult to understand regular expressions in the beginning, but when you understand them, they will save you headaches down the road and immensely reduce the complexity of data parsing.

Enter Regular Expression

Let’s take a look at the regular expression we will be using for this exercise:

"(?P<station_number>\d+)  (?P<ip_address>\d+\.\d+\.\d+\.\d+)  (?P<latency>\d+\.\d+) ms"

In this, we have 3 named groups: ‘station_number,’ ‘ip_address,’ and ‘latency.’ We can use this regular expression to search a line and reference the named groups to extract the data we want.

Our parse function looks like this:

    def parse_data(self):
        station_regex = r"(?P<station_number>\d+)  (?P<ip_address>\d+\.\d+\.\d+\.\d+)  (?P<latency>\d+\.\d+) ms"

        for line in self.traceroute_data.split("\n"):
            re_match = re.search(station_regex, line)

            if re_match:
                ip_address = re_match.group("ip_address")
                latency = float(re_match.group("latency"))

                self.route_list.append(Station(ip_address, latency))
            elif '*' in line:
                self.route_list.append(Station('*', '*'))

We take the output line-by-line, searching the line for our regular expression. If the regular expression is found, we use the named groups to extract the data from the line, adding it to our ‘route_list.’

If we don’t find the regular expression but we see an asterisk, we assume that the station didn’t respond and add a default value to our ‘route_list.’

Final Object Code

Finally, our importable ‘traceroute’ Python wrapper looks like this:

import subprocess
import re
from collections import namedtuple

Station = namedtuple('Station', ['ip', 'latency_ms'])

class TraceRouteInstance:
    def __init__(self, hostname_or_ip: str):
        self.traceroute_data = subprocess.run(
            ["traceroute", "-4", "-N1", "-n", hostname_or_ip], 
            capture_output=True
            ).stdout.decode()
        self.route_list = []

        self.parse_data()
    
    def parse_data(self):
        station_regex = r"(?P<station_number>\d+)  (?P<ip_address>\d+\.\d+\.\d+\.\d+)  (?P<latency>\d+\.\d+) ms"

        for line in self.traceroute_data.split("\n"):
            re_match = re.search(station_regex, line)

            if re_match:
                ip_address = re_match.group("ip_address")
                latency = float(re_match.group("latency"))

                self.route_list.append(Station(ip_address, latency))
            elif '*' in line:
                self.route_list.append(Station('*', '*'))

As a simple test, let’s write a main function in this script that utilizes the above object and prints out the trace list.

if __name__ == "__main__":
    tr = TraceRouteInstance("periaptapis.net")
    print(tr.route_list)

If you run this, it will eventually produce the following output:

[Station(ip='192.168.0.1', latency_ms=0.624), Station(ip='184.99.64.13', latency_ms=3.244), Station(ip='184.99.65.97', latency_ms=3.445), Station(ip='4.68.63.133', latency_ms=13.911), Station(ip='4.69.219.65', latency_ms=29.505), Station(ip='4.7.18.10', latency_ms=30.912), Station(ip='*', latency_ms='*'), Station(ip='*', latency_ms='*'), Station(ip='*', latency_ms='*'), Station(ip='147.182.254.138', latency_ms=31.139)]

This output shows that we have successfully extracted the route that a packet would take from my machine to requesting data from ‘periaptapis.net.’

Conclusion

In this post, we explored wrapping a Linux program with a Python interface to make it more accessible by another Python program. If you run this program, however, you will notice how long an object takes to be created. This is due to the length of time it takes for a route to be traced by using ‘traceroute.’

In a future post, we might explore options on how to mitigate the effects of this load time for external programs by using the ‘asyncio’ built-in Python library or by multi-threading.

I hope you enjoyed this week’s post! Please add any comments, suggestions, or questions that you may have! If you are interested in other articles I have written, check out these links:

Thank you for reading!

-Travis

Neural Networks Exposed: Beautiful Genetic Algorithms in C++

Genetic algorithm generations help neural networks learn

This week, we are generating feed-forward neural networks which learn through a genetic algorithm. These networks are much easier to understand than their back-propagation counterparts. Please join me on this step-by-step guide for simplistic neural networks as they learn through generations of selection.

Neural Networks

Neural networks can be a difficult subject to dive into during ground-up implementation. Here, I will explain a simple ‘feed-forward’ network, as we do not need back-propagation for this article. Fortunately, back-propagation is the only part of a neural net which requires deep knowledge of calculus.

So, if you fear feeling dumb like me, fret not! Multiplication is all we need!

I chose evolution because it requires less math 😉

Not Charles Darwin

Feed-Forward Neural Networks

A feed-forward neural network only feeds data in one direction across the network.

Showing how neural networks work

The above diagram shows a neural network with 2 inputs, 4 hidden layer neurons, and 2 output neurons. The data flows from left to right. At each step, the data is modified in a parallel fashion by each neuron.

Showing flow of data in neural networks

A neuron simply takes in all of the given inputs, multiplies those inputs by a paired weight stored by the neuron, and sums these products together. The sum of that addition is passed through a sigmoid function, which in turn is the neuron’s output.

Looking back at our neural network diagram, the hidden layer neurons each have 2 inputs. So, each neuron will store 2 weights. Therefore, the hidden layer stores 8 weights in total.

Neural Networks and Genetic Algorithms

The basic rule which governs my algorithm is:

At the start of each generation, select the highest performing agents from the previous generation and combine their neural networks to form the next generation’s agents.

The important factors are 1) how you select the top performers, and 2) how the next generation is stitched together from the previous winners.

Code Time

The code base for this project has a lot of additional functionality that is necessary to facilitate agents’ learning. However, to keep this post short and sweet, I am going to focus on the key areas that make this process work.

The Problem to Solve

The problem which our agents must solve is simple. They must get as close to the goal as possible, given the number of ‘ticks’ in a generation.

Overview

A ‘tick’ is a time step allotted to each agent in which they can choose which direction to move and how far that move will be.

When all ticks have been counted, the program orders the agents by least to greatest distance from the goal. Next, it selects a few agents from the top (closest to goal) to seed our next generation. This process continues until we reach the last configured generation.

Main Loop

    for (int generation = 0; generation < NUM_GEN; generation++)
    {
        // Setup our generations Agents
        if (!closest)
		// Setup our initial generation of agents
        else ...
		// Stitch our next set of agents from those that came closest from the previous generation

        // Run our simulation
        run_sim(*agents, goal);

        // Rank our agents and take the configured number of top performers
        closest = get_closest_agents(*agents, goal, generation, NUM_AGENTS_SELECTED_EACH_GENERATION);

        if (PRINT_GENERATION_PERFORMANCE)
            std::cout << closest->at(0)->distance << "," << mutation_chance << std::endl;
    }

Our main loop is simple:

  1. Set up the generation’s agents.
  2. Run the simulation.
  3. Rank the agents by final distance to the goal.

setting up the neural networks

The first generation is a set of agents. Each agent has a neural network that is initialized with random weight values and contains the following elements:

  • 2 inputs (distance to goal in x- and y-directions)
  • 10 hidden neurons
  • 4 outputs

Two outputs represent the x- and y-distances that the agent wants to move, each between 0-1. The other two outputs dictate whether the direction of movement is positive or negative.

The following code implements agent creation:

for (int agent_i = 0; agent_i < NUM_AGENTS_PER_GEN; agent_i++)
    {

        // Check if we are basing our agents on anything
        if (based_on)
        {
            int choice1 = get_rand_int(0, based_on->size() - 1), choice2 = get_rand_int(0, based_on->size() - 1);
            while (choice2 == choice1)
                choice2 = get_rand_int(0, based_on->size() - 1);
            // Using default mutation delta
            agents->push_back(new Agent(new Position(start_pos->x, start_pos->y), based_on->at(choice1)->agent, based_on->at(choice2)->agent, mt, mutation_chance));
        }
        else
        {
            // Create agents with two sensors, distance from goal x and y
            agents->push_back(new Agent(new Position(start_pos->x, start_pos->y), 2));
        }
    }

Simulation Run

The simulation runs for the number of ticks configured. At each tick, it “asks” the agent for a desired distance and direction, based on the agent’s current distance from the goal.

void run_sim(std::vector<Agent *> agents, Position *goal)
{
    for (int tick = 0; tick < NUM_TICKS_PER_GEN; tick++)
    {
        for (Agent *a : agents)
        {
            // Sense the Agents distance from the goal
            float sensors[2] = {
                (a->positions.back()->x - goal->x) / BOUNDARY_EDGE_LENGTH,
                (a->positions.back()->y - goal->y) / BOUNDARY_EDGE_LENGTH};
            // Ask the Agent what it wants to do
            a->move(sensors);
        }
    }
}

A Few More Specifics

Selecting top performers

It is important to note that the way you select your ‘winners’ greatly determines the behavior of your agents. My program selects winners in the following code:

    std::vector<AgentDistancePair *> *agent_distance_pairs = new std::vector<AgentDistancePair *>;

    for (Agent *a : agents)
        agent_distance_pairs->push_back(new AgentDistancePair(a, get_distance(*(a->positions.back()), *goal)));

The algorithm pairs each agent’s pointer to its respective distance from the goal. This pairing is encapsulated in the ‘AgentDistancePair’ object. Then, the program sorts the vector filled with these pairings using the following function:

bool compare_agent_distance_pair(AgentDistancePair *first, AgentDistancePair *second)
{
    return bool(first->distance < second->distance);
}

Finally, we reduce the number in the list to the number specified by the caller. In other words, how many of the top performers do we want? At the same time, we are being careful to manage our memory properly.

while (agent_distance_pairs->size() > num_to_find)
    {
        delete agent_distance_pairs->back();
        agent_distance_pairs->pop_back();
    }

All together, the whole function looks like this:

std::vector<AgentDistancePair *> *get_closest_agents(std::vector<Agent *> agents, Position *goal, int generation_number, int num_to_find = 2)
{
    std::vector<AgentDistancePair *> *agent_distance_pairs = new std::vector<AgentDistancePair *>;

    for (Agent *a : agents)
        agent_distance_pairs->push_back(new AgentDistancePair(a, get_distance(*(a->positions.back()), *goal)));

    sort(agent_distance_pairs->begin(), agent_distance_pairs->end(), compare_agent_distance_pair);

    if (DRAW_GENERATION_PERFORMANCE && (generation_number % DRAW_EVERY_NTH_GENERATION) == 0 && DRAW_FULL_POPULATION)
        draw_agent_path(agent_distance_pairs, goal, generation_number);

    while (agent_distance_pairs->size() > num_to_find)
    {
        delete agent_distance_pairs->back();
        agent_distance_pairs->pop_back();
    }

    if (DRAW_GENERATION_PERFORMANCE && (generation_number % DRAW_EVERY_NTH_GENERATION) == 0 && !DRAW_FULL_POPULATION)
        draw_agent_path(agent_distance_pairs, goal, generation_number);

    return agent_distance_pairs;
}

Stitching together agents of the past

In a genetic algorithm, the neural networks learn by being stitched with other top performers, sprinkled with a bit of randomness. In order to stitch the weights together, we choose 1 of 3 different methods:

  • EveryOther
  • SingleSplit
  • RandomChoice
EveryOther

‘EveryOther’ method selects each weight of the child agent by alternating evenly between taking parent weights, as shown in the diagram below:

SingleSplit

‘SingleSplit’ method uses a randomly selected weight index as a point where the responsibility for providing weights to the child switches to the other parent. You can see an example of this happening at the ‘Split Line’ in the diagram below:

RandomChoice

‘RandomChoice’ method randomly selects weights from both parents as it builds the child weights. No diagram needed ;).

Conclusion

This is Generation #5 of a ‘SingleSplit’ method simulation. The green dot is the selected agent of the generation and the yellow dot is its goal!

This has been a huge learning experience for me! I will definitely be adding tiny brains to many things in the future.

If you are interested in exploring or cloning this code, you can access it on my Github here. Thank you for reading! I hope you found this helpful and thought-provoking! As always, feel free to leave a comment or suggestion below.

Further reading on my work in simulations is available at these links:

Nginx + Gunicorn, Guaranteed Web App in an Afternoon

Nginx, Gunicorn and Flask
Funky, right? Read to the end to find out how this image was generated!

Have you ever wondered what it would take to develop a simple web app? Maybe you wanted to create one that would serve requests to merely call a command-line version of an app you made. This is what I wanted to do after I created a Word Maze Generator to help my son study for his 3rd-grade spelling tests. With Gunicorn and Nginx, I easily accomplished this task.

Creating web front-ends for simple applications is an excellent way to reach a larger audience and get feedback on your work. In my case, I wanted to make my application available to my son’s teacher and classmates.

Here, I am focusing on the ‘why’ more than the ‘how.’ For a practical guide, feel free to navigate to this Digital Ocean post that I used to implement these tools. If you are like me though, the ‘why’ is very important, so let’s dive in!

Web Request, What?

Below is the ideal model for what happens when you hit Enter on your keyboard with a URL in your web browser.

Web request

There are steps like Domain Name Resolution that enable this ideal model, but for our purposes, let’s stick to this simple construction.

When you enter in www.periaptapis.net and hit Enter, a Hyper-Text Transfer Protocol (HTTP) GET request is assembled by your browser and shipped over the network to the server (host) responsible for that domain. Once the host receives the request, it packages the content and ships it to the client who requested it.

Naturally, this leads to the question: “So all I need is a web host that is connected to the Internet and running my Flask app, right?” The answer to this question is technically “yes.” However, give it time plus a small amount of traffic, and your site is bound to run into some issues.

WSGI Time!

Let’s say there are three people requesting content from your website. If it’s a small application, you might be able to service that traffic in a semi-timely manner. However, some users are going to ‘hang’ until your website can catch up. Some might possibly be left with a page informing them that their request has timed out and to try again (the problem will only get worse if they do!).

Web Server Gateway Interface, or ‘WSGI,’ was created to solve exactly this problem.

Gunicorn added to web request

In our new model, we show how our three requests go to a single Gunicorn application. Gunicorn translates the web request to the WSGI format and creates threads/processes to distribute the requests in queue. The data that your application returns is then translated to an HTTP response by Gunicorn and sent back to the client.

It helps to be able to increase the number of programs working on the requests, but there is a ceiling to the number of threads/processes that can be created on a given web host.

Let’s consider 1000 requests. Gunicorn, while mighty, cannot service this kind of demand. It is meant for WSGI translation and process/thread management to call your application. Gunicorn is better than just a single-threaded Flask application, but it is still overwhelmed by a large set of requests.

Enter Nginx

Nginx is a program that handles the interface between the world-wide Internet and Gunicorn. It can juggle many connections at once, routing them as configured. Our final setup looks like this:


It is a bit excessive to say we could handle a bajillion requests from our gaggle of Internet users, but we are much better off this way. With Nginx installed, our CPU-dependent application can be spun up and managed by Gunicorn while Nginx keeps the users happy, reassuring the client that we are definitely still working on their request and will get back to them.

An added benefit is that with minimal extra work, we get the nice little lock symbol in the browser, which indicates an HTTPS connection with our service. To do this, we run a tool called Certbot after we have Nginx configured. Certbot asks for the server that needs certificates and the owner’s email address. Finally, Certbot requests your certificate from a certificate authority, which is then auto-configured in your Nginx setup for HTTPS traffic. Awesome!

Final Remarks

Check out the current iteration of our spelling maze app front-end here. It employs the exact setup described in this article to create custom spelling mazes requested by users.

The practical implementation I followed to get my server up and running can be found in a Digital Ocean post here.

If you are curious about the monstrous lead photo, it came from an AI model called Stable Diffusion 2, run with the prompt: “Nginx, Gunicorn and Flask.”

Thank you for reading this post. I hope it puts the ‘why’ into the methodology I chose to serve a web application to multiple users. That being said, I am not an authority on web dev! I’m always looking for new information, so if you want to add to or correct anything in this post, please leave a comment!