Python 3.7 Asyncio For Hackers


~ 7 minute read
Crafted : 1 month ago Updated : 1 month ago
Tags:
#guide #python #programming #asyncio #aiohttp #fast-http #coroutines #asynchronous #tasks #python3

Hello Luvs,

I used Python so for many years, and I feel very comfortable using it, but the issue with the Python from the beginning remains. Speed! Python isn't the most performant language in the world (it's not designed to be !), so I ended-up to switching to Go to scan my engine. Because last time (a few years ago) I tried Python Asynco, it was a mess. So few days ago I gave it another shot, and to my surprise, it's relatively matured and is usable. As you may see on my project page, I've been working on a project called hunter suite, which tends to automate all the tedious penetration testing and bug bounty hunters.

 

Table of Contents

Introduction

Concurrency is hard

Sending millions of HTTP requests 

Conclusion

 

Introduction 

As a python developer, you are probably writing a lot of custom scripts. Most of the time, you find yourself in a situation you are talking to some API or network protocol for many reasons. In the case of information security research, fuzzing, or delivering an exploit payload. Now think about if your scripts get results like 100x faster. How cool is that?  

 

Concurrency is hard

No matter what programming language we use, getting concurrency done right is hard. Having a clear goal when we want to write concurrent code will help us a lot. My goal for the parallel client part was achieving a few things.

1- send as much as possible HTTP requests to a single host 

used for directory, password and parameter brute force, API fuzzing

2- resolve as much as different hosts concurrently 

used for the bulk host, virtual host and subdomain discovery, DNS brute force, check for HTTP headers security

You can't except 100% accuracy all the time, especially if you talk to webservers. Still, we aim to get as view error as possible.

I use python 3.7 here code samples may not run correctly on the older versions. here is a slightly modified version of python docs .

import asyncio

import time

async def say_after(delay, what):

    await asyncio.sleep(delay)

    print(what)


async def main():

    task1 = asyncio.create_task(

        say_after(1, 'hello'))



    task2 = asyncio.create_task(

        say_after(20, 'world'))



    task3 = asyncio.create_task(

        say_after(4, 'more'))

    task4 = asyncio.create_task(

        say_after(6, 'words'))



    print(f"started at {time.strftime('%X')}")

    await task1

    await task2

    await task3

    await task4

    print(f"finished at {time.strftime('%X')}")

asyncio.run(main())

Result: 

started at 15:11:22

hello

more

words

world

finished at 15:11:42


Process finished with exit code 0

As we can see the word, the world printed last only because it has the highest sleep time. Here is how it works.

 

First, we so we can talk with its API.

import asyncio

 

we create four tasks using create_task function. 

async def main():

    task1 = asyncio.create_task(

        say_after(1, 'hello'))



    task2 = asyncio.create_task(

        say_after(20, 'world'))



    task3 = asyncio.create_task(

        say_after(4, 'more'))

    task4 = asyncio.create_task(

        say_after(6, 'words'))

 

 

Two new keywords here async and await. Whenever we want to make a function asynchronous, we put the "async" keyword before the function definition we "await" time-consuming call in our case sleep. 

async def say_after(delay, what):

    await asyncio.sleep(delay)

    print(what)

 

we print the time (with seconds ) to record execution time we "await" all the tasks, or in other words, we run our four tasks.

 


print(f"started at {time.strftime('%X')}")



await task1

await task2

await task3

await task4



print(f"finished at {time.strftime('%X')}")

Awaiting tasks one by one is tedious; we can gather tasks together. We use * (asterisk) before the list to unpack it. 

 

# await task1

# await task2

# await task3

# await task4



await asyncio.gather(*[task1, task2, task3, task4])

 

We can get the same result. Now we know who we can create and run tasks, let's take it further by making HTTP requests asynchronously. 

 

Sending millions of HTTP requests 

But before we start, that let me show you another simple.

 

def code():

     await asyncio.sleep(delay)

def main():

    print('code')

    await code

 

If we run this, we will get this error: SyntaxError: 'await' outside async function. So why I show you this example? Because I want you to understand, you can not install any third party python library and make it asynchronous. 

 

But don't worry, Python has one of the most exceptional programming communities in the world. There is always a library. Meet aiohttp.
 

pip3 install aiohttp

and let's run a simple web request.

the code is self-explanatory, but what we do is create an async function for making a GET request and await for its response. 

 

import aiohttp

import asyncio



async def fetch(session, url):

    async with session.get(url) as response:

        return await response.text()



async def main():

    async with aiohttp.ClientSession() as session:

        html = await fetch(session, 'http://python.org')

        print(html)



if __name__ == '__main__':

    asyncio.run(main())

 

Now using our previous knowledge let's create do a simple script to compare their performance one using requests (synchronous) and one using aiohttp (asynchronous) 

import requests

import time



urls = ["https://0xsha.io","https://twitter.com", "https://google.com", 

              ... ]

if __name__ == '__main__':



    print('requests version')



    start = time.time()

    print(f"started at {time.strftime('%X')}")

    for url in urls:

        requests.get(url)

    end = time.time()

    print(f"Ended at {time.strftime('%X')}")

    print(end-start)

 

sync-1.py

requests version

started at 16:09:26

Ended at 16:10:36

69.44833111763

 

Now let's look at  asyncio version.


import aiohttp

import asyncio

import time





async def fetch(session, url):

    async with session.get(url) as response:

        return await response.text()



async def main():

    async with aiohttp.ClientSession() as session:

        start = time.time()

        print(f"started at {time.strftime('%X')}")

        for url in urls:

            await fetch(session, url)

        end = time.time()

        print(f"started at {time.strftime('%X')}")

        print(end-start)





if __name__ == '__main__':

    asyncio.run(main())


 

async-1.py

started at 16:06:27

started at 16:07:17

50.67312693595886

as you can see, we are 20 seconds faster just by switching libraries and in only a few URLs.

But you may ask, can it be faster it's not even 10x? The answer lines in python docs  and to be honest with you. I think python documentation for asyncio still needs a lot of improvements. 

 

There are three main types of awaitable objects: coroutinesTasks, and Futures. in our async-1 example, we used coroutines, and they await for each other to finish. We never say they should run concurrently ! for them to make them concurrent, we have to create a task list and gather it. Also please read this.  

 

here is modified version to run tasks concurrently 

urls = ["https://0xsha.io","https://twitter.com", "https://google.com", 

              ... ]



import aiohttp

import asyncio

import time



async def fetch(session, url):

    async with session.get(url) as response:

        return await response.text()



async def main():

    tasks = []

    async with aiohttp.ClientSession() as session:

        start = time.time()

        print(f"started at {time.strftime('%X')}")

        for url in urls:

            #await fetch(session, url)

            tasks.append(asyncio.create_task(fetch(session, url)))

        await asyncio.gather(*tasks)

        end = time.time()

        print(f"started at {time.strftime('%X')}")

        print(end-start)

if __name__ == '__main__':

    asyncio.run(main())

 

started at 16:25:39

started at 16:25:45

5.263195037841797

 

5 seconds! Amazing!

 

Now let's re-create a fast Dir Buster. To achieve this, we can't merely loop true a large file and send it to the server as fast as possible after a few requests server may drop our future requests so we can use the asyncio queue.

 

We mix all previous examples. We read the files synchronously (for large files we can do that asynchronously as well ) we fill the queue with data we join the queue and cancel all the tasks finally we gather jobs and print any file with response 200. 

 

Here is the code.

import asyncio

from aiohttp import ClientSession

import time


# global

lst = []


async def fetch(url, session, queue):

    while True:

        # Get a "work item" out of the queue.

        x = await queue.get()

        print(x.strip())



        # Notify the queue that the "work item" has been processed.

        queue.task_done()

        async with session.get(url+x.strip()) as response:

            if response.status == 200:



                print(url+x.strip() + "----" + str(response.status))

            x = await response.read()

        lst.append(x)


with open("fuzz.txt", "r") as file:

   #https://github.com/Bo0oM/fuzz.txt/blob/master/fuzz.txt

    x = file.readlines()

url = "http://url.com/"



async def run(r):

    tasks = []

    start = time.time()


    queue = asyncio.Queue()

    for _ in range(len(x)):

        # await queue.put(x[_])

        queue.put_nowait(x[_])



    async with ClientSession() as session:



 # how many tasks?

        for i in range(r):

            task = asyncio.create_task(fetch(url, session, queue))

            tasks.append(task)



 #join them

        await queue.join()



 #cancel remaining

        [task.cancel() for task in tasks]



        await asyncio.gather(*tasks, return_exceptions=True)

        # await asyncio.gather(*asyncio.all_tasks(), ).cancel()

        # https://bugs.python.org/issue29432



        # you now have all response bodies in this variable

        # print(result)

        end = time.time()

        print(end - start)



if __name__ == '__main__':

    asyncio.run(run(500))

 

Results: 


 

http://example.com/access_log----200

http://example.com/index.php----200

http://example.com/error_log----200

http://example.com/forum/----200

http://example.com/flash/----200

http://example.com/robots.txt----200

http://example.com/roundcube/index.php----200

 

Just 20.817044019699097 seconds! For ~4500 files. IT will take ages using traditional libraries like requests. 

Here is full code.

This script is in no way a replacement for tools like dirsearch, dir buster, etc. 

 

Conclusion:

Now we learned about python asyncio. We can leverage it with any async library or you can create your own async libraries. check out an impressive curved list here.  

If you write python codes, start using asyncio whenever you can. It will help you to save a lot of time during your researches.

 

Till then luvs 

Assist me:
Buy Me a Coffee at ko-fi.com