Asyncio in Python: The Super Power No One Uses

The image shows three W's superimposed on each other. The text in the image says on the top, "Let's Share What We Know." On the bottom of the image, it says "World Wide Web," with 2 blue and yellow Python cross icons between the words "World Wide" and also between the words "Wide Web." The graphic is here to remind us of the large scale of the Internet. It is composed of over 4 billion IPV4 addresses! In this article, I use the asyncio library to quickly scan all of these addresses with a single process.

This week, we explore the asyncio library and calculate how long it would take for a single process to scan the entire Internet’s worth of IPV4 addresses (…only 4,294,967,296 total addresses!).

In this article, I briefly walk through the asyncio library and create a real-world example!

If you are new to Python or programming in general, I would advise you to learn the basics before approaching asyncio. This is a bit of an advanced topic, so having a solid grasp on the foundations of synchronous programming will be a requirement here.

A [Kind of] Experienced Software Guy

Asyncio

Asyncio was introduced as a built-in library in Python version 3.4. It is intended for IO-bound operations. One example is network communication, as it is often the case that you will wait for data to be returned from a remote server.

In the time spent waiting, however, you may have an opportunity to complete other work. This is where an asynchronous application shines.

You define a co-routine by simply adding the ‘async’ keyword in front of the normal ‘def’ keyword, as shown below:

async def main():

When you want to run this function, it must be done via a runner like this:

asyncio.run(main())

The Mental Model of Asyncio

When a runner is created to execute one or many asynchronous tasks, it starts by creating an event loop.

“The event loop is the core of every asyncio application.”

Python Docs

Below is a very simplistic mental model for a basic asyncio application.

Diagram of a very simplistic mental model for a basic asyncio application.

Each time an asyncio ‘Task’ has to ‘await’ something, there is an opportunity for the other tasks in the loop to execute more of their work if they are no longer ‘awaiting.’

Our mental model looks very similar to a single-threaded synchronous application. However, each of the asynchronous tasks are being handled independently, as long as no single task is hogging attention from the event loop.

Concrete Example in Asyncio

Now we have all that is needed to develop a solution for an important question:

How Long Would it Take?

How long would it take for a single-threaded program to scan the entire Internet’s-worth of IPV4 Addresses?

Let’s start by defining a class that can generate all of the IP addresses.

@dataclass
class IPAddress:
    subnet_ranges: List[int]

Above, we define a data class that simply holds a list of subnet values. If you are familiar with IPV4, you probably guessed that this list will have a length of four. I didn’t go out of my way to enforce the length, but in a production environment it would be ideal to do so.

We now have something that can hold our IP address in a sane way, but we still need to generate them.

@classmethod
def get_all_ips(cls, start: int = 1, stop: int = 255) -> 'IPAddress':
    first = second = third = fourth = start

    while fourth < stop:
        first += 1

        if first >= stop:
            first = start
            second += 1
        if second >= stop:
            second = start
            third += 1
        if third >= stop:
            third = start
            fourth += 1
        curr_ip = cls([fourth, third, second, first])

        if cls.is_valid_ip(curr_ip):
            yield curr_ip

For this, I introduce a factory class method that will yield IP addresses with the default range of ‘1.1.1.1’ to ‘255.255.255.255.’ The method increments the least-significant subnet value and rolls over to the higher-order subnets each time its value reaches 255. The bulleted list below illustrates the method’s address outputs.

  • 1.1.1.1
  • 1.1.1.2
  • 1.1.1.3
  • 1.1.1.254
  • 1.1.1.255
  • 1.1.2.1
  • 255.255.255.254
  • 255.255.255.255

If you have a keen eye, you will have likely noticed the ‘is_valid_ip’ class method. It’s called just before yielding to the calling function.

This function simply checks if the IP address is in a valid public range as defined by the private ranges. See below:

@classmethod
def is_valid_ip(cls, ip_Address: 'IPAddress') -> bool:
    if ip_Address.subnet_ranges[0] == 0:
        return False
    
    if ip_Address.subnet_ranges[0] == 10:
        return False
    
    if ip_Address.subnet_ranges[0] == 172 and 16 <= ip_Address.subnet_ranges[1] <= 31:
        return False
    
    if ip_Address.subnet_ranges[0] == 192 and ip_Address.subnet_ranges[1] == 168:
        return False
    
    return True

Are We Asynchronous Yet?

No, not yet…but soon! Now that we have our IP address generator defined, we can start building an asynchronous function that will do the following:

  • Iterate our generator function an N-number of times to get a batch of IPs.
  • Create an asynchronous task for each IP address in our batch which checks if a port is open.

By adding timers to this code, we will find out how long it would theoretically take! Keep in mind that we already know the performance impacts of using Python vs. C++, but this is a toy problem, so…Python is perfect.

Iterate our Generation Function

for _ in range(IP_COUNT_PER_GATHER):
    try:
        next_group_of_ips.append(ip_addr_iter.__next__())
    except StopIteration:
        complete = True
        break

Above is how we will iterate our IP generator function.

Create an Asyncio Task For Each IP

for i in range(IP_COUNT_PER_GATHER):
           async_tasks.append(asyncio.create_task(check_port(str(next_group_of_ips[i]))))

We create a task for each IP port check and store a reference to the task.

Asyncio: Await Results

With multiple tasks executing at the same time, it doesn’t win much if we have to ‘await’ each individually. To solve this problem, the asyncio library has a function called ‘gather.’ See below for how I used ‘gather’ in this application:

await asyncio.gather(*async_tasks)

for i in range(IP_COUNT_PER_GATHER):
    if async_tasks[i].result():
        ip_addresses_found.put(str(next_group_of_ips[i]))

By ‘awaiting’ the ‘gather’ function, we are actually awaiting all tasks in the list. When all have completed, if tasks returned not None, we add it to our queue of IPs that we may want to process later.

All Together!

The whole function together looks like this:

async def main(ip_addresses_found: Queue, start: int = 1, end: int = 255):
    ip_addr_iter = iter(IPAddress.get_all_ips(start, end))
    complete = False
    
    while not complete:
        next_group_of_ips = []
        async_tasks = []
        
        for _ in range(IP_COUNT_PER_GATHER):
            try:
                next_group_of_ips.append(ip_addr_iter.__next__())
            except StopIteration:
                complete = True
                break
        
        for i in range(IP_COUNT_PER_GATHER):
            async_tasks.append(asyncio.create_task(check_port(str(next_group_of_ips[i]))))
        
        await asyncio.gather(*async_tasks)

        for i in range(IP_COUNT_PER_GATHER):
            if async_tasks[i].result():
                ip_addresses_found.put(str(next_group_of_ips[i]))

Conclusion

The time has come to share my results! These will vary based on Internet latency and machine hardware.

I set my scan to do 10k IPs per batch, with a timeout on the connection of 1 second. This resulted in an average batch runtime of ~1.3 seconds.

I didn’t let it run to see how long it would actually take (running this program had major effects on our ability to use the Internet), but if you divide the total number of possible IPs by our batch size of 10k, you get ~430k batches. At 1.3 seconds per batch, that totals 558,346 seconds, or 6.46 days of constant running.

Not as bad as I originally thought 🙂

Fun fact: I first was introduced to co-routines while programming my Paper Boy game in Unity!

Thanks for reading this week! Please ‘like’ if you enjoyed the content. Feel free to leave any comments or suggestions as well!

-Travis

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.