grequests - asynchronous requests in Python

Stop waiting, get more done

Posted by Niko Montonen on Fri 18 August 2017

The problem

We all know that dreadful feeling. You're working on a personal Python project and starting from scratch, and you need to call a web API.

It ain't so bad, eh? This is Python, we've got requests!

import requests

payload = {'apikey': 1, 'locale': enGB}
r = requests.get('url', params=payload)

Well, that's that done with.

You work a bit more, and realize you need to hit another API endpoint. A couple of times in a row. Gah. And you noticed that sometimes the API is a bit wonky, so you really need to implement retrying too.

Well, that's not really hard now is it? The library provides everything we need. We'll just grab Retry from urllib3, HTTPAdapter from requests, start a session to use those, and we'll also skip all the overhead of establishing a new connection for every call we make.

import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

payload = {'apikey': 1, 'locale': enGB}
s = requests.Session()
retries = Retry(total=5,
                backoff_factor=0.1,
                status_forcelist=[500, 502, 503, 504])
s.mount('http://', HTTPAdapter(max_retries=self.retries))
s.mount('https://', HTTPAdapter(max_retries=self.retries))
s.get('url', params=payload)

Yay, it works. And it's fast enough to do at least a couple calls a second.

But then you hit that point. You need to do a couple hundred API calls because the API only allows you to query one thing at a time. Why couldn't they have been smarter when designing this endpoint...

Uh oh. A couple hundred calls. Over the Internet. Sequentially. That's way too slow, and the API allows us to perform 100 calls a second... We're going to need to make more calls.

Well, we have a couple of options.

multiprocessing

multiprocessing is nice, except Windows doesn't have fork(), so some things can be a bit awful (you don't get file descriptors of the parent process etc.), how about we look at something that behaves similarly on all platforms?

threading

Sure, why not? This is a completely viable option, and a lot of people use it. We won't get around the GIL that Python has (which we could with multiprocessing, since it gives every worker its own interpreter), but it's simple to implement.

It's a bit of work though. How about we do something easier?

grequests

grequests is a library that throws gevent and requests together, allowing us to do asynchronous calls. Now, asynchronous is not the same as parallel, but if a 286 was good enough to run a "multitasking" operating system by switching between tasks really quickly, greenlet microthreads are good enough for us.

So, we import our libraries.

import grequests
import requests
from requests.packages.urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

Then we create a bunch of workers.

payload = {'apikey': 1, 'locale': enGB}
NUM_SESSIONS = 10
sessions = [requests.Session() for i in range(NUM_SESSIONS)]
retries = Retry(total=5,
                backoff_factor=0.1,
                status_forcelist=[500, 502, 503, 504])
for s in sessions:
    s.mount('http://', HTTPAdapter(max_retries=retries))
    s.mount('https://', HTTPAdapter(max_retries=retries))

After that, we can create all the requests we need to perform, and spread them out between the workers using modulo.

urls = ['api1', 'api2', 'api3'] # Insert massive list of URLs here.

reqs = []
i = 0

for url in urls:
    reqs.append(grequests.get(url, session=sessions[i % NUM_SESSIONS], params=payload))
    i += 1

Now, we can call grequests.map() to execute all of our calls asynchronously.

responses = grequests.map(reqs, size=NUM_SESSIONS)

The responses object will now be a list of Response objects. You can iterate over this list to access the results.

for r in responses:
    print(r.status_code) # Access the status code we got back from the server. 200, 404 etc.
    print(r.json()) # We can decode JSON straight into a dictionary for easy processing, just like usual.

In my own application, I was looking at waiting 20 minutes on a clean first start with no state, to get the required data to actually use the application. Even if I managed to cut that in half, we'd still be talking about 10 minutes.

With grequests, and 10 workers, a clean start would only leave the user waiting a little over 2 minutes. For how little work it took, not exactly a bad result.

You may want to look at other ways to speed up your API calls if you need even more speed, but grequests is great for how little work it requires you to do.