C++ > Python EXPOSED: Powerful Language Makes Case with Data

C++ wrestles Python

Has my ego inflated in the last couple of weeks from programming in C++? Maybe, maybe not… All I know is that C++ is leagues faster than Python. Don’t believe me? Well, I am glad you are here! This week, we will be converting a Python program I wrote and documented here to its C++ counterpart.

At the end of this post, I hope to have concrete evidence of just how much faster C++ can be! Let’s dive in!

Measurement

First, we must take measurements on the portion of code we will be converting. If you haven’t read the original article here, I recommend you give it a peek. Here’s a snippet of how the original program generates mazes:

for i in range(args.num_to_generate):
    maze = WordMaze(args.word, args.grid_width, args.grid_height, args.pixel_width, args.pixel_height, args=args)
    maze.save_image(args.filename.split(".")[0] + "_" + str(i) + ".png")

We will be timing the maze generation and the image storage. To do this, we will utilize the following generator function for a set of input values:

def test_maze_inputs():
    words = ["Hello", "Random", "Testing", "Maze", "Generation"]
    grid_widths = [(20, 20), (40, 40), (80, 80), (160, 160)]

    for word in words:
        for width in grid_widths:
            yield argparse.Namespace(word=word, grid_width=width[0], grid_height=width[1], pixel_width=20, pixel_height=20)

We will be building mazes at the sizes of 20 x 20, 40 x 40, 80 x 80, and 160 x 160. These will be pixel sizes 400 x 400, 800 x 800, 1600 x 1600, and 3200 x 3200, respectively. The values chosen are arbitrary and all runs will be done on the same system.

Below is the timing test run code:

if __name__ == "__main__":
    # args, _ = parseargs()
    maze_generation_times = []
    image_save_times = []
    for test_input in test_maze_inputs():
        print(f"Testing Inputs: {test_input}")

        # Time Maze Generation
        start_time = time.perf_counter()
        maze = WordMaze(test_input.word, test_input.grid_width, test_input.grid_height, test_input.pixel_width, test_input.pixel_height)
        stop_time = time.perf_counter()
        total_time = stop_time - start_time
        maze_generation_times.append(total_time)

        # Time Maze Saving
        start_time = time.perf_counter()
        maze.save_image("maze.png")
        stop_time = time.perf_counter()
        total_time = stop_time - start_time
        image_save_times.append(total_time)
    # Print our table
    print("CSV:\n")
    header_printed = False
    
    for ti, test_input in enumerate(test_maze_inputs()):
        if not header_printed:
            for key in test_input.__dict__:
                print(key, end=",")
            print("Generation Time,Image Save Time,")
            header_printed = True
        
        for key in test_input.__dict__:
            print(test_input.__dict__[key], end=",")
        print(f"{maze_generation_times[ti]},{image_save_times[ti]},")

It will print out a nice CSV-style output for us to curate in Excel:

All right, we have our times to beat! Let’s convert some code!

C++ FTW… Maybe

To make this as fair as possible, I will do my best not to adjust the design of the program. I will simply translate it to C++ and run the same benchmarks.

One does not simply translate from Python to C++

Basically Anyone (except Travis)

Here I am a week later, literally. I have a functioning C++ program that generates the same output you would expect to see from the Python script in my previous post.

Oh boy, it even looks like it has a memory leak! Let’s go ahead and try to fix that up really quickly….

DONE! My brain is fried. I am going to simply let the results speak for themselves:

Python vs C++ Results

Python and C++ clearly diverge even at the first data point. On the bar graph, you can see the stark differences by maze size. For the largest maze size in this test (3200 x 3200 pixels), Python was about 4 times slower than C++!

On the scatter plot, you can see the time differences plotted out by actual square pixels. The linear fits of the data are pretty good, with R-squared values slightly exceeding 0.99. The linear fit equations could be useful for estimating time values for other maze sizes that we did not test.

The desired maze size (in square pixels) would be substituted for the ‘x,’ and solving for the ‘y’ would result in a time estimate. You can see that the slope (time divided by square pixels) of the Python algorithm is steeper than C++, further showcasing C++’s efficiency.

Conclusion

This was an excellent learning experience for me. In the process of rewriting my code, I got a close-up look at various inconsistencies and poor algorithmic choices that came along with my first implementation.

I have yet to fix these things, as it was important to me to have an apples-to-apples comparison between the two codebases. If you compare the GitHub repositories (Python vs C++), you will find there are very few design differences between the two.

Thank you for reading this week!

-Travis

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.