JSON Python module for working with .json format

by Alex
JSON Python module for working with .json format

JSON (JavaScript Object Notation data transfer format), defined by RFC 7159 (which derives from an obsolete version of RFC 4627) and ECMA-404, a lightweight text-based data exchange format based on JavaScript object literal syntax (although it is not a JavaScript subgroup). json provides an API familiar to users of the standard library modules marshal and pickle. Converting basic Python objects to json:


>>> import json
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)})
'['foo', {'bar': ['baz', null, 1.0, 2]}]'
>>>> print(json.dumps("\"foo\bar")
"\"foo\bar"
>>> print(json.dumps('\u1234')
"\u1234"
>>> print(json.dumps('\\\'))
"\\"
>>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
{"a": 0, "b": 0, "c": 0}
>>> from io import StringIO
>>> io = StringIO()
>>> json.dump(['streaming API'], io)
>>> io.getvalue()
'['streaming API']'

Compact conversion:


>>> import json
>>> json.dumps([1, 2, 3, {'4': 5, '6': 7}], separators=(',', ':'))
'[1,2,3,{"4":5,"6":7}]'

Beautiful output:


>>> import json
>>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4))
{
"4": 5,
"6": 7
}

Decoding JSON, converting json to a Python object:


>>> import json
>>> json.loads('['foo', {'bar':['baz', null, 1.0, 2]}]')
['foo', {'bar': ['baz', None, 1.0, 2]}]
>>> json.loads('"\\"foo\\\bar")
''foo\x08ar''
>>> from io import StringIO
>>> io = StringIO('["streaming API"]')
>>> json.load(io)
['streaming API']

Specialized object decoding in JSON:


>>> import json
>>> def as_complex(dct):
...     if '__complex__' in dct:
...         return complex(dct['real'], dct['imag'])
...     return dct
...
>>> json.loads('{"__complex__": true, "real": 1, "imag": 2}',
...     object_hook=as_complex)
(1+2j)
>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

Extension JSONEncoder:


>>> import json
>>> class ComplexEncoder(json.JSONEncoder):
...     def default(self, obj):
...         if isinstance(obj, complex):
...             return [obj.real, obj.imag]
...         # Let the base class default method raise the TypeError
...         return json.JSONEncoder.default(self, obj)
...
>>> json.dumps(2 + 1j, cls=ComplexEncoder)
'[2.0, 1.0]'
>>> ComplexEncoder().encode(2 + 1j)
'[2.0, 1.0]'
>>> list(ComplexEncoder().iterencode(2 + 1j))
['[2.0', ', 1.0', ']']

The use of json.tool is recommended for verification and beautiful output:

$ echo '{'json':'obj'}' | python -m json.tool
{
"json": "obj"
}
$ echo '{1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

JSON is a subset of YAML 1.2 JSON is created using the default settings of this module and is also a subset of YAML 1.0 and 1.1. This module can be used as a YAML serializer. Before Python 3.7, the key order of the dictionary was not preserved, so input and output data tended to be different. Since Python 3.7, the key order has been preserved, so it is no longer necessary to use collections.OrderedDict to parse JSON.

The main methods are

The json dump method

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serializes obj into JSON-like format by writing it to fp (which supports .write()) using this table. If skipkeys=True (default: False), then non-base dictionary keys(str, int, float, bool, None) will be skipped, instead of throwing a TypeError exception. The json module always creates str objects, not bytes. Hence, fp.write() must support str input. When ensure_ascii=True (default), all non-ASCII characters in the output will be escaped with \uXXXX sequences. If ensure_ascii=False, these characters will be written as is. When check_circular=False (default: True), then check_circular references for container types will be skipped, and such references will cause an OverflowError (or a more serious error). If allow_nan=False (default: True), a ValueError, according to JSON certification, will occur every time you try to serialize a float value that is outside the allowed limits(nan, inf, -inf). If allow_nan=True, the JavaScript analogues(NaN, Infinity, -Infinity) will be used. When indent is a non-negative integer or string, then JSON objects and arrays will be rendered with that amount of indent. If the indent level is 0, negative or "", new lines without indentation will be used. None (the default) reflects the most compact representation. If indent string (e.g. "\t"), that string is used as indent. Changes in version 3.2: Indent strings are allowed in addition to integers. Separators must be tuple (item_separator, key_separator). The default is (', ', ': ') if indent=None and (',', ': ') if otherwise. To get the most compact JSON representation you must specify (',', ':'). Changes in version 3.4: Use(',', ':') when indent=None. The default value must be a function. It is called for objects that cannot be serialized. The function must return the encoded version of the JSON object or call TypeError. If default is not specified, a TypeError occurs. If sort_keys=True (default: False), the output dictionary keys will be sorted. To use your own JSONEncoder subclass (such as the one that overrides the default() method to serialize additional types), specify it with the cls argument; otherwise JSONEncoder is used.

The json dumps method

json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serializes obj to the str string of JSON format using the conversion table. The arguments have the same value as for dump(). Keys in key/value pairs are always strings. When a dictionary is converted to JSON, all keys in the dictionary are converted to strings. If, as a result, you convert it first to JSON and then back, the new dictionary may be different from, then you can get a dictionary identical to the original one. In other words, loads(dumps(x)) != x if x has non string keys.

The json load method

json.loads(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Deserializes from fp (a text or binary file that supports the .read() method and contains a JSON document) into a Python object using this conversion table. object_hook is an optional function that applies to the result of object decoding. The value returned by this function will be used, not the resulting dictionary dict. This function is used to implement custom decoders (like JSON-RPC). object_pair_shook is an optional function that applies to the result of decoding an object with a particular sequence of key/value pairs. The result returned by the function will be used instead of the original dict dictionary. This function is used to implement custom decoders. If object_hook is specified, object_pairs_hook will take priority. If parse_float is defined, it will be called for every JSON floating-point value. By default, this is equivalent to float(num_str). You can use another data type or parser for this value (e.g. decimal.Decimal) If parse_int is defined, it will be called to decode JSON int strings. By default, equivalent to int(num_str). You can use another data type or parser for this value (e.g. float). If parse_constant is defined, it will be called for strings: – Infinity, Infinit, NaN. Can be used to raise exceptions on detecting invalid JSON numbers. parse_constant is no longer called on null, true, fasle. To use your own JSONDecoder subclass, specify it with the cls argument; otherwise JSONDecoder is used. Additional keyword arguments will be passed to the class constructor. If the deserialized data is not a valid JSON document, a JSONDecodeError will occur.

The method json loads

json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)

Deserializes s (an instance of str, bytes or bytearray containing a JSON document) into a Python object using a conversion table. The rest of the arguments are the same as in load(), except for the encoding, which is deprecated or ignored. If the deserialized data is not a valid JSON document, a JSONDecodeError will occur.

Encoders and decoders

JSONDecoder

Class json.JSONDecoder(*, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None) Simple JSON decoder. Performs the following transformations during decoding: JSONPythonobjectdictarrayliststringstrnumber (int)intnumber (real)floattrueTruefalseFalsenullNone It also understands NaN, Infinity, and -Infinity as corresponding float values that are outside the JSON specification. object_hook will be called for each value of a decoded JSON object, and its return value will be used in the specified dict location. Can be used to provide deserialization (e.g., to support JSON-RPC class hinting). object_pairs_hook will be called for each decoded JSON object value with an ordered list of pairs. The return value of object_pairs_hook will be used instead of dict. This function can be used to start a standard decoder. If object_hook is also defined, object_pairs_hook will take precedence. parse_float will be called for each floating point JSON value. By default, this is equivalent to float(num_str). Can be used for another data type or JSON float parser. (e.g. decimal.Decimal). parse_int will be called for a JSON int string. By default, equivalent to int(num_str). Can be used for other JSON data types and integer parsers (e.g. float). parse_constant will be called for strings: '-Infinity', 'Infinity', 'NaN'. Can be used to raise exceptions when invalid JSON numbers are detected. If strict=False(True by default), then control characters within strings will be allowed. In this context, control characters are characters with codes in the range 0-31, including \t (tab), \n, \r and \0. If the deserialized data is not a valid JSON document, an error will be thrown JSONDecodeError. decode(s) Returns the representation of s in Python(str – containing JSON document) JSONDecodeError will be called if the JSON document is not valid (or not valid). raw_decode(s) Decodes a JSON document from s(str beginning with the JSON document) and returns a tuple of 2 elements (the Python representation and the string index in s where the document ended). Can be used to decode a JSON document from a string that has extra data at the end.

JSONEncoder

Class json.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None) An extensible JSON encoder for Python data structures. Supports the following data types and default objects: PythonJSONdictobjectlist, tuplearraystrstringint, floatnumberTruetrueFalsefalseNonenull In order to be able to recognize other objects, the subclass must execute the default() method, which will return a serializable object for o if possible, otherwise it must call the parent class implementation (to call TypeError). If skipkeys=False (default), a TypeError is called when trying to encode keys that are not str, int, float, or None. If skipkeys=True, such elements are simply skipped. If ensure_ascii=True (default), the output guarantees that all incoming non-ASCII characters are escaped with \uXXXX sequences. But if ensure_ascii=False, those characters are output as is. If check_circular=True (default), then lists, dictionaries and self-encoded objects will be checked for cyclic references during encoding to prevent infinite recursion (which will cause OverflowError). Otherwise, no such check is performed. If allow_nan=True (default), then NaN, Infinity, and -Infinity will be encoded as such. This does not conform to the JSON specification, but conforms to most JavaScript-based encoders and decoders. Otherwise such values will cause a ValueError. If sort_keys=True (default: False), the output dictionary will be sorted by key names; this is useful for regression testing to compare JSON serialization daily. If indent is a non-negative integer or string, then JSON objects and arrays will be output with that amount of indent. If indent level is 0, negative or "", new lines without indentation will be used. None (the default) reflects the most compact representation. If indent string (e.g. "\t"), that string is used as indent. If separator is specified (must be a tuple of type (item_separator, key_separator)). The default is (', ', ': ') if indent=None and (',', ': ') if not. To get the most compact JSON representation , you should use (',', ':') to reduce the number of spaces. The default value should be a function. It is called for objects that cannot be serialized. The function should return the encoded version of the JSON object or call TypeError. If default is not specified, a TypeError occurs. default(o) Implement this method in a subclass so that it returns a serializable object for o or calls the base implementation (to raise TypeError). For example, to support arbitrary iterators, you can implement default as follows:


def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# let the base class raise a TypeError exception
return json.JSONEncoder.default(self, o)

encode(o) Returns a string representation of the JSON representation of the Python data structure. Example:


>>> json.JSONEncoder().encode({'foo': ['bar', 'baz']})
'{"foo": ["bar", "baz"]}'

iterencode(o) Encodes the passed object o and outputs each string representation as soon as it becomes available. For example:


for chunk in json.JSONEncoder().iterencode(bigobject):
mysocket.write(chunk)

JSONDecodeError exception

Exception json.JSONDecodeError(msg, doc, pos) ValueError subclass with additional attributes: msg – unformatted error message. doc – JSON parsing of the document. pos – the first index of doc, if the parsing failed. lineno – string corresponding to pos. colno – the column corresponding to pos.

Standard matching and compatibility

The JSON format is specified in RFC 7159 and ECMA-404. This section describes the level of compliance of this module with the RFC. For simplicity, JSONEncoder and JSONDecoder subclasses, and parameters that differ from those specified, are not taken into account. This module is not RFC-compliant, setting some extensions that are workable for JavaScript but invalid for JSON. Specifically:

  • Infinite and NaN are accepted and output;
  • Repeated names within an object are accepted and output, but only the last value of the duplicated key.

Because the RFC allows RFC-compliant parsers to accept input texts that are not RFC-compliant, this module’s deserializer is technically RFC-standard.

Character decoding

The RFC requires JSON to be represented using UTF-8, UTF-16 or UTF-32, with UTF-8 being the recommended default for maximum compatibility. Possibly, but not necessarily for RFC, the serializers of this module set ensure_ascii=True by default, so strings contain only ASCII characters. Other than the ensure_ascii parameter, this module does not directly address the character encoding issue. The RFC forbids a byte sequence marker (BOM) at the start of JSON text and this module’s serializer does not add a BOM. The RFC allows, does not require JSON deserializers to ignore BOM at the input. The deserializer of this module causes ValueError when BOM is present. The RFC does not explicitly forbid JSON strings that contain a byte sequence that does not match valid Unicode characters (e.g. unpaired UTF-16 substitutes), it notes – they can cause compatibility problems. By default this module accepts and outputs (if present in the source string) special code sequences.

Infinite and NaN

The RFC does not allow representation for infinite or NaN values. Despite this, by default this module accepts and outputs Infinity, -Infinity, and NaN as if they were actually literal number values in JSON:


>>> # None of these calls will be exceptions, but the results are not JSON
>>> json.dumps(float('-inf'))
'-Infinity'
>>> json.dumps(float('nan'))
'NaN'
>>> # Same for deserialization
>>> json.loads('-Infinity')
-inf
>>> json.loads('NaN')
nan

The serializer uses the allow_nan parameter to change this behavior. In the deserializer, this parameter is parse_constant.

Repetitive names within an object

The RFC specifies that names within a JSON object must be unique, but it does not specify how repetitive names within JSON objects should be handled. By default, this module does not raise exceptions; instead, it ignores all but the last key/value pair for a given key:


>>> weird_json = '{"x": 1, "x": 2, "x": 3}'
>>> json.loads(weird_json)
{'x': 3}

The object_pairs_hook parameter can be used to change this.

Top-level Non-Object, Non-Array value

The old version of JSON specified by the deprecated RFC 4627 required the JSON text top-level value to be a JSON object or array (Python dict or list), or was not a JSON null, boolean, number, string value. RFC 7159 removed this restriction, so this module did not and never did apply this restriction in its serializer or deserializer. However, for maximum compatibility, you may voluntarily adhere to this restriction.

Implementation restrictions

Some JSON deserializer implementations have limits on:

  • received JSON text size
  • maximum nesting level of JSON objects and arrays
  • range and precision of JSON numbers
  • jSON string content and maximum string length

This module places no restrictions other than those that apply to the relevant Python types or the Python interpreter itself. When serializing to JSON, be careful of such restrictions in applications that may consume your JSON. In particular, numbers in JSON are often deserialized into IEEE 754 double precision numbers and are subject to the range and accuracy limitations of this representation. This is especially true when serializing Python int values of extremely large values or when serializing instances of “unusual” numeric types, such as decimal.Decimal.

Command Line Interface

Source code: Lib/json/tool.py The json.tool module provides a simple command line interface to inspect and output JSON objects. If the optional infile and outfile arguments are not specified, sys.stdin and sys.stdout will be respectively:

$ echo '{"json: "obj"}' | python -m json.tool
{
"json": "obj"
$ echo '{"json": "obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

Command-line capabilities

infile Checks and nicely outputs the JSON file:

$ python -m json.tool mp_films.json
[
{
"title": "And Now for Something Completely Different",
{ "year": 1971
},
{
"title": "Monty Python and the Holy Grail",
"year": 1975
}
]
$ python -m json.tool mp_films.json
[
{
"title": "And Now for Something Completely Different",
{ "year": 1971
},
{
"title": "Monty Python and the Holy Grail",
"year": 1975
}
]$ python -m json.tool mp_films.json
[
{
"title": "And Now for Something Completely Different",
"year": 1971
},
{
"title": "Monty Python and the Holy Grail",
"year": 1975
}
]

If infile is not specified, read from sys.stdin.

Related Posts

LEAVE A COMMENT