Lets Talk Async Asynchronously - The Gathering of the Tasks (Not Clouds)
You didn’t think we’d limit ourselves to one Task did you? Should the last 3 (including this one) posts on Async been in one post? No, in this case, a trilogy is better (please note that’s not always the case, I can think of one movie..that became two…I mean three). Let’s resume the execution of this.
We’ve looked at some of the concepts and theory behind async, today we’ll use them in a very short program. Don’t worry, there will be a lot more concepts coming soon. To illustrate where asyncio really shines, we’ll be doing some basic IO things: download something from the internet and write to a file locally. We will do that both synchronously and asynchronously and see the difference. await ready? OK I go make some tea and come back. Ok you are ready now. (terrible…i know)
Ok so in this post we will run a few tasks asynchronously. What we want to do is download a few books, and write every single one of those to files in the current directory. I found this website called textfiles.com where I will download a few files from.
Besides the in-built asyncio package, I will also be using httpx and aiofiles. I’m doing that so that in my async code, I don’t have blocking calls. Plus I need something that can make async requests and write async files. Let’s import what we need.
import asyncio
from typing import List
from httpx import get, AsyncClient
import aiofiles
from time import time
Sync Download and Write to File
Let’s first write the synchronous version of the function.
def s_download_write_book(url: str) -> None:
book_name: str = url.split('/')[-1]
# download it
book = get(url)
# write to file
with open(book_name, 'w') as f:
f.write(book.text)
print(f"downloaded and wrote book {book_name}")
So what are we doing? We pass it a string (the url). We get the book name from it. For example, if the url is ‘http://www.google.com/free_books/makeMoneyIn30DaysNotAPyramidSchemeIPromise.txt', the book_name variable will be ‘makeMoneyIn30DaysNotAPyramidSchemeIPromise.txt’. Then we make a get request, the file is downloaded. Then we open a file in the current directory with the book_name as the file name and we write to it. Easy!
Async Download and Write to File
Now let’s write the asynchronous version of the function.
async def a_download_write_book(url: str) -> None:
book_name: str = url.split('/')[-1]
# download it
async with AsyncClient() as client:
book = await client.get(url)
# write to file
async with aiofiles.open(book_name, 'w') as f:
await f.write(book.text)
print(f"downloaded and wrote book {book_name}")
Is it exactly like the one above but async? you bet! Note I am also using an async context manager with AsyncClient from the httpx library.
Timing Synchronous Version
Those are the two functions we will compare. Now let’s use the synchronous version in a program with a list of urls and time it. Please note that I am not putting the urls I am downloading, any text file will work. I don’t want everyone reading this blog to go and download and overload that server. If that happened, instead of the server getting 1 request, it would get….1 request. You know this writing is for me right? It’s not like the domain name is easy to remember…
urls: List[str] = [
'####################1',
'####################2',
'####################3',
'####################4',
'####################5',
'####################6',
'####################7',
'####################8',
'####################9',
]
def s_main(urls: List[str]):
for url in urls:
s_download_write_book(url
if __name__ == "__main__":
start = time()
s_main(urls=urls)
end = time()
print(end-start)
Run the function and I get approximately 5-6 seconds. Ok, not bad. What if we did it asynchronously?
Timing Asynchronous Version
urls: List[str] = [
'####################1',
'####################2',
'####################3',
'####################4',
'####################5',
'####################6',
'####################7',
'####################8',
'####################9',
]
async def a_main(urls: List[str]):
# create a few tasks and gather them
await asyncio.gather(
*[a_download_write_book(url) for url in urls]
)
if __name__ == "__main__":
start = time()
asyncio.run(a_main(urls=urls))
end = time()
print(end-start)
So what’s happening in a_main? We use this asyncio.gather() function. What does it do? It runs a sequence of awaitable objects concurrently. Rather than going one by one and using the create_task function to wrap our coroutines in tasks and schedule their execution, we use gather.
We run the program and I get approximately 1.1-1.6 seconds. That’s quite a bit faster. Keep in mind that I’m only downloading and requesting 9 files with a total size of 2 MB or so. If you were requesting lots and lots of files, you might see an even better improvement. Furthermore, when we ran the synchronous version, the files were downloded in the order in the list and consequenly the print statements ran in that order. In the asynchronous version, however, the print statements occurred whenever the file was downloaded and written. Probably in a different order. In fact if you run it a few times, you will get different orders.
So is asyncio useful? Definitely. In the next few posts we’ll go through some more concepts and theory and then build a few more things before looking at how to test asynchronous code. Finally, I am also using async in the FastAPI backend for Web CNLearn (which may or may not be published at the time I am publishing this). See you soon!