Serialization in Python

Python Serialization

Serialization refers to the way of converting the object or the data structures into a format where they can be stored or acquired later.

Since the information is changed and stored in another format, it gives the feature of restoring and deserializing the original data from the serialized format. Also the data conversion, serialization even gives the chance to lessen the data size so it can fit into the required disk space or bandwidth.

Pickle

Pickling is the procedure whereby a Python object hierarchy is converted into a byte stream to be written to a file, this is called Serialization. Unpickling is the reverse operation, whereby a byte stream is converted back into a working Python object hierarchy.

Pickle is an operationally easiest approach to store the object. The Python Pickle module is an object-oriented way to store objects directly in a special storage format.

What can it do?

  • Pickle can store and reproduce dictionaries and lists very easily.
  • Stores object attributes and restore them back to the same State.

What pickle cannot do?

  • It doesn’t save an object’s code. Just it’s attribute values.
  • It can’t store file handles or connection sockets.
  • In short, we can say, pickling is an approach to store and retrieve data variables into and out from documents where files can be lists, classes, and so forth.

To Pickle something you should −

import pickle

Compose a variable to file, something like

pickle.dump(mystring, outfile, protocol)

where 3rd argument protocol is optional To unpickling something you must −

Import pickle

Write a variable to a file, something like

myString = pickle.load(inputfile)

Methods

The pickle interface provides four different methods.

  • dump() − The dump() method serializes to an open file (file-like object).
  • dumps() − Serializes to a string
  • load() − Deserializes from an open-like object.
  • loads() − Deserializes from a string.

Based on the above procedure, below is an example of “pickling”.

import pickle 

def storeData():  
    Omkar = {'key' : 'Omkar', 'name' : 'Omkar Pathak', 
             'age' : 21, 'pay' : 40000} 
    Jagdish = {'key' : 'Jagdish', 'name' : 'Jagdish Pathak','age' : 50, 'pay' : 50000} 

    db = {} 
    db['Omkar'] = Omkar 
    db['Jagdish'] = Jagdish 
    dbfile = open('examplePickle', 'ab') 

    pickle.dump(db, dbfile)
    dbfile.close() 

def loadData(): 

    dbfile = open('examplePickle', 'rb')
    db = pickle.load(dbfile) 
    for keys in db: 
        print(keys, '=>', db[keys]) 

    dbfile.close() 

if __name__ == '__main__': 

    storeData() 
    loadData() 

#output
#Omkar => {'key': 'Omkar', 'name': 'Omkar Pathak', 'age': 21, 'pay': 40000}
#Jagdish => {'key': 'Jagdish', 'name': 'Jagdish Pathak', 'age': 50, 'pay': 50000}

Unpickling

The procedure that takes a binary array and converts it to an object chain of the hierarchy is called unpickling.

The unpickling procedure is finished by utilizing the load() function of the pickle module and returns a complete object hierarchy from a simple bytes array.

JSON

JSON (or JavaScript Object Notation) is a text-based format for storing and transmitting structured data. It originates from the JavaScript language, yet it is still a language-independent: it works with practically any programming language. With JSON’s lightweight syntax, you can easily store and send to different applications everything from numbers and strings to arrays and objects. You can likewise make progressively complex data structures by linking arrays to one another.

Basic syntax and structure

JSON text can be built on one of two structures:

  • a collection of key:value pairs (associative array);
  • an orderly set of values (array or list).

JSON objects are written in curly braces {}, and their key:value pairs are separated by a comma ,. The key and the value in the pair are separated by a colon :. Here is an example for you:

{
    "first_name": "Joseph",
    "last_name": "Mendes",
    "age": 34
}

#output
#{'first_name': 'Joseph', 'last_name': 'Mendes', 'age': 34}

Here you can see some user’s information in JSON format.

Keys in an object are consistently strings, yet values can be any of seven types of values, including another object or array.

Arrays are written in square brackets [] and their values are separated by a comma ,. The vale in the array, once more, can be of any type, including another object or array. Here is an example:

["night", "street", "false", [ 345, 23, 8, "juice"], "fruit"]

#output
#["night", "street", "false", [ 345, 23, 8, "juice"], "fruit"]

NOTE: JSON does not support comments

Nested objects

JSON is a highly flexible format. You can nest objects inside other objects as properties:

{
  "persons": [
    {
      "first_name": "Mary",
      "last_name": "Mendes",
      "age": 25
    },
    {
      "first_name": "William",
      "last_name": "Lang",
      "age": 21
    },
    {
      "first_name": "Ronit",
      "last_name": "Singh",
      "age": 34
    }
  ]
}

#output
#{'persons': [{'first_name': 'Mary', 'last_name': 'Mendes', 'age': 25},
 #{'first_name': 'William', 'last_name': 'Lang', 'age': 21},
 #{'first_name': 'Ronit', 'last_name': 'Singh', 'age': 34}]}

If objects and arrays contain other objects or arrays, the data has a tree-like structure.

The nested objects are fully independent and may have different properties:

{
  "persons": [
    {
      "first_name": "Mary",
      "age": 25
    },
    {
      "first_name": "William",
      "last_name": "Lang",
    }
  ]
}


#output
#{'persons': [{'first_name': 'Mary', 'age': 25},
 #{'first_name': 'William', 'last_name': 'Lang'}]}

JSON module

You can see that there are a lot of similarities between JSON notation and Python data types: we have strings and numbers, a JSON object seems to be like a Python dictionary, an array — to list. This makes transformations among JSON and Python very simple and natural. Here’s a full conversion table for encoding Python data to JSON:

Encoding to JSON

For the most part, encoding to JSON format is called serialization. The JSON module has two methods for serializing: json.dump() and json.dumps(). The key distinction between these two methods is the type we’re serializing to: json.dump() makes a file-like object, and json.dumps() makes a string.

Assume, we have a dictionary equal to the JSON we’ve seen before.

details = {
  "persons": [
    {
      "first_name": "Mary",
      "age": 25
    },
    {
      "first_name": "William",
      "last_name": "Lang",
    }
  ]
}

Here’s how we can save it to the JSON file details.json:

import json
 
 
with open("details.json", "w") as json_file:
    json.dump(details, json_file)

As should be obvious, this method has two required arguments: the data and the file-like object that you can write to. If run this code, you’ll make a JSON file with the information about details.

Another alternative is serializing the data into a string utilizing json.dumps(). For this situation, the main required argument is the data we need to serialize:

json_str = json.dumps(details)
 
print(json_str)

#output
#{"persons": [{"first_name": "Mary", "age": 25}, {"first_name": "William", "last_name": "Lang"}]}


Careful with data types! JSON only supports strings as keys. Basic Python types like integers will get converted to strings automatically but for other types of keys, like tuple, you’ll get a TypeError because the .dump() and .dumps() functions cannot convert it to a string

Additionally the necessary parameters, the two methods have a few optional ones. You can look at them all in the official documentation, here we’ll just glance at the indent parameter. You can see that the string we got in the example above is very difficult to read, contrasted with the original dictionary. All things considered, if we determine indent (a whole number or a string), we can beautiful print our subsequent JSON:

json_str = json.dumps(details, indent=4)
print(json_str)

Output

Decoding JSON

The opposite procedure is deserialization. Similarly to serialization, the JSON module has two methods: json.load() and json.loads(). Here the difference is in the input JSONs: file-like objects or strings.

Let’s convert the JSON we’ve just created back to Python data types.

with open("details.json", "r") as json_file:
    details_from_json = json.load(json_file)
 
print(details_from_json == details)

#output
#True

You can see that the dictionary that we got as a result of json.load() equals our original dictionary. The same with json.loads():

print(details == json.loads(json_str))

#output
#True

NOTE:

If we convert a Python dictionary with non-string keys to JSON and then back to the Python object, we will not get the same dictionary.

Conclusion

We’ve seen how to work with JSON using the built-in Python module json. We can

  • convert Python objects to JSON using either json.dump() or json.dumps();
  • convert JSON to Python objects using either json.load() or json.loads().

The conversions are done according to the conversion table and not every Python object can be converted to JSON.