A Quick Guide to the Python Requests Library

by Alex
A Quick Guide to the Python Requests Library

Before you start, make sure you have the latest version of Requests installed. First, let’s look at some simple examples.

Creating GET and POST request

Import the Requests module:


import requests

Let’s try to get a web page using a get request. In this example, let’s look at a common GitHub timeline:

r = requests.get('https://api.github.com/events')

We got a Response object named r. With this object we can get all the information we need. Simple API Requests means that all types of HTTP requests are obvious. Below is an example of how you can make a POST request:

r = requests.post('https://httpbin.org/post', data = {'key':'value'})

Other types of HTTP requests such as PUT, DELETE, HEAD and OPTIONS are just as easy to make:


r = requests.put('https://httpbin.org/put', data = {'key':'value'})
r = requests.delete('https://httpbin.org/delete')
r = requests.head('https://httpbin.org/get')
r = requests.options('https://httpbin.org/get')

Passing Parameters in a URL

Often you may need to send some data in a URL request line. If you configure a URL manually, that data will be presented as key/value pairs after the question mark. For example, httpbin.org/get?key=val. You can pass these arguments as a dictionary using the params argument. If you want to pass key1=value1 and key2=value2 to the httpbin.org/get resource, you must use the following code:


payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)

As you can see, the URL was generated correctly:

https://httpbin.org/get?key2=value2&key1=value1

A dictionary key whose value is None will not be added to the URL query string. You can pass a list of parameters as a value:


>>> payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> print(r.url)
https://httpbin.org/get?key1=value1&key2=value2&key2=value3

Response content (response)

We can read the content of the server response. Consider the GitHub timeline again:


>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.text
'[{"repository":{"open_issues":0, "url": "https://github.com/...

Requests will automatically decode the contents of the server response. Most unicode encodings decode without any problems. When you make a request, Requests makes an encoding assumption based on the HTTP headers. The same text encoding is used when requesting r.text. You can find out what encoding Requests uses and change it with r.encoding:

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

If you change the encoding, Requests will use the new value of r.encoding whenever you use r.text. You can do this in any situation where you need more specialized logic for dealing with the encoding of response content. For example, in HTML and XML it is possible to set the encoding directly in the body of the document. In such situations, you must use r.content to find the encoding and then set r.encoding. This will allow you to use r.text with the correct encoding. Requests can also use custom encodings in case you need them. If you have created your own encoding and registered it in the codecs module, use the codec name as the value of r.encoding.

Binary response content

You can also access the response body as bytes for non-text responses:


>>> r.content
b'[{"repository":{"open_issues":0, "url": "https://github.com/...

Gzip and deflate compressed transmissions are automatically decoded for you. For example, to create an image based on the binary data returned by the response, use the following code:


from PIL import Image
from io import BytesIO
i = Image.open(BytesIO(r.content))

Response content in JSON

If you work with data in JSON format, use the built-in JSON decoder:


>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.json()
[{'repository': {'open_issues': 0, 'url': 'https://github.com/...

If JSON decoding fails, r.json() will return an exception. For example, if the response has code 204 (No Content), or in case the response contains non-valid JSON, trying to call r.json() will return ValueError: No JSON object could be decoded. Note that a successful call to r.json() does not indicate a successful server response. Some servers may return a JSON object if the response fails (e.g., an HTTP 500 error message). Such JSON will be decoded and returned. To check if the request is successful, use r.raise_for_status() or check which r.status_code.

Unprocessed response content

In those rare cases where you want to access the raw server response at the socket level, refer to r.raw. If you want to do this, make sure that you specify stream=True in your first request. After that you can already do the following:


>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

However, you can use code like this as a template to save the result to a file:


with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)

Using r.iter_content will handle much of what you would have to deal with when using r.raw directly. To extract content when streaming, use the method described above. Note that chunk_size can be freely adjusted to a number that works best in your case. An important note about using Response.iter_content and Response.raw. Response.iter_content will automatically decode gzip and deflate. Response.raw is a raw byte stream, it does not change the content of the response. If you really need access to bytes as they are returned, use Response.raw.

Custom Headers

If you want to add HTTP headers to the request, just pass the appropriate dict in the headers parameter. For example, we didn’t specify our user-agent in the previous example:


url = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)

Headers are given less priority than more specific sources of information. For example:

  • The authorization headers set with headers= will be overridden if credentials are specified by .netrc, which are in turn overridden by the auth= parameter.
  • These will also be removed on redirect.
  • Proxy authorization headers will override the proxy credentials specified in your URL.
  • Content-Length will be overridden when you define the content length.

In addition, queries do not change their behavior at all based on the specified user headers. Header values must be string, bytestring or unicode. Although this is allowed, it is recommended to avoid passing unicode header values.

More complex POST requests

Often you want to send some form-encoded data in the same way as you do with HTML form. To do that, simply pass the appropriate dictionary in the data argument. Your data dictionary will then be automatically encoded as an HTML form when the request is made:


>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("https://httpbin.org/post", data=payload)
>>> print(r.text)
{
...
{ "form": {
{ "key2": "value2",
{ "key1": "value1"
},
...
}

The data argument can also have multiple values for each key. This can be done by specifying data in tuple format, or as a dictionary with lists as values. This is especially useful when the form has multiple elements that use the same key:


>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
>>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples)
>>> payload_dict = {'key1': ['value1', 'value2']}
>>> r2 = requests.post('https://httpbin.org/post', data=payload_dict)
>>> print(r1.text)
{
...
{ "form": {
{ "key1": [
{ "value1",
"value2"
]
},
...
}
>>> r1.text == r2.text
True

There are times when you need to send data not encoded by the form-encoded method. If you send a string instead of a dictionary to the request, the data will be sent unchanged. For example, GitHub API v3 accepts JSON POST/PATCH encoded data:


import json
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))

Instead of encoding dict, you can pass it directly using the json parameter (added in version 2.4.2) and it will be automatically encoded:


url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
r = requests.post(url, json=payload)

Note that the json parameter is ignored if data or files are passed. Using the json parameter in the request will change the Content-Type header to application/json.

POST sending Multipart-Encoded file

Requests make it easy to download Multipart-Encoded files:


>>> url = 'https://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
{ "file": ""
},
...
}

You can set the file name, content_type and headers explicitly:


>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
{ "file": ""
},
...
}

You can send strings that will be accepted as files:


>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.csv', 'some,data,to,send\another,row,to,send\n')}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
{ "file": { "some,data,to,send\another,row,to,send\n"
},
...
}

In case you are sending a very large file as a multipart/form-data request, you may need to send the request as a stream. By default, requests does not support this, but there is a separate package that does: requests-toolbelt. Check the toolbelt documentation for more details on how to use it. To send multiple files in one request, see the extended documentation. Warning! It is highly recommended that you open files in binary mode. This is because requests may try to provide you with a Content-Length header, and if this value is set to the number of bytes in the file errors will occur when opening a file in text mode.

Response Status Codes

We can check the response status code:


>>> r = requests.get('https://httpbin.org/get')
>>> r.status_code
200

Requests has a built-in status code output object:


>>> r.status_code == requests.codes.ok
True

If we made a failed request (error 4XX or 5XX), we can raise an exception with r.raise_for_status():


>>> bad_r = requests.get('https://httpbin.org/status/404')
>>> bad_r.status_code
404
>>> bad_r.raise_for_status()
Traceback (most recent call last):
File "requests/models.py", line 832, in raise_for_status
raise http_error
requests.exceptions.HTTPError: 404 Client Error

But if status_code for r turned out to be 200, when we call raise_for_status() we get:


>>> r.raise_for_status()
None

Response headers

We can look up server response headers using the Python dictionary:


>>> r.headers
{
'content-encoding': 'gzip',
'transfer-encoding': 'chunked',
'connection': 'close',
'server': 'nginx/1.0.4',
'x-runtime': '148ms',
'etag': '"e1ca502697e5c9317743dc078f67693f"',
'content-type': 'application/json'
}

This is a special kind of dictionary; it’s designed specifically for HTTP headers. According to RFC 7230, HTTP header names are case insensitive. We can now access the headers with or without capital letters if we want:


>>> r.headers['Content-Type']
'application/json'
>>> r.headers.get('content-type')
'application/json'

Cookies

If there are cookies in the request, you will be able to access them quickly:


>>> url = 'https://example.com/some/cookie/setting/url'
>>> r = requests.get(url)
>>> r.cookies['example_cookie_name']
'example_cookie_value'

To send your own cookies to the server, use the cookies option:


>>> url = 'https://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')
>>> r = requests.get(url, cookies=cookies)
>>> r.text
'{"cookies": {"cookies_are": "working"}}'

Cookies are returned to RequestsCookieJar, which works like dict, but also offers a more comprehensive interface suitable for use across multiple domains or paths. The cookie dictionary can also be passed to requests:


>>> jar = requests.cookies.RequestsCookieJar()
>>> jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
>>> jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
>>> url = 'https://httpbin.org/cookies'
>>> r = requests.get(url, cookies=jar)
>>> r.text
'{"cookies": {"tasty_cookie": "yum"}}'

Redirects and History

By default Requests will perform redirects for all HTTP verbs except HEAD. We can use the history property of the Response object to track redirects . The Response.history list contains the Response objects that were created in order to execute the request. The list is sorted from earlier, to later responses. For example, GitHub redirects all HTTP requests to HTTPS:

>>> r = requests.get('https://github.com/')
>>> r.url
'https://github.com/'
>>> r.status_code
200
>>> r.history
[]

If you use GET, OPTIONS, POST, PUT, PATCH or DELETE requests, you can disable redirect processing with the allow_redirects parameter:


>>> r = requests.get('https://github.com/', allow_redirects=False)
>>> r.status_code
301
>>> r.history
[]

If you use HEAD, you can also enable redirects:


>>> r = requests.head('https://github.com/', allow_redirects=True)
>>> r.url
'https://github.com/'
>>> r.history
[]

Timeouts

You can make Requests stop waiting for a response after a certain number of seconds using the timeout parameter. Almost all code must use this parameter in requests. Failure to do so may cause your program to hang:


>>> requests.get('https://github.com/', timeout=0.001)
Traceback (most recent call last):
File "", line 1, in
requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

Timeout is not a time limit for fully loading a response. An exception occurs if the server has not responded in timeout seconds (more precisely, if no bytes have been received from the main socket in timeout seconds).

Errors and exceptions

In case of network failures (e.g. DNS failure, connection failure, etc) a ConnectionError exception will be raised. Response.raise_for_status() will raise an HTTPError if an error code status occurs in the HTTP request. Timeout exception will be thrown if the time of the request is out. If the request exceeds the set value of the maximum number of redirects, the TooManyRedirects exception is thrown. All exceptions that are raised directly by Requests are inherited from requests.exceptions.RequestException.

Related Posts

LEAVE A COMMENT