Topics
The pickle
module in python allows for serialization and deserialization, but it’s unsafe and comes with challenges. Deserializing previously pickled objects may break if the classes involved have changed over time (e.g., attributes have been added or removed). The copyreg
module offers a way to define functions used while pickling specific objects.
If we have some class that connects to DB or opens file (like the following):
class DatabaseConnection:
def __init__(self, db_path):
self.db_path = db_path
# In a real scenario, establish connection here
print(f"Connecting to {self.db_path}...")
self.connection = open("main.py") # Placeholder
def get_num_lines(self):
lines = self.connection.readlines()
return len(lines)
def __del__(self):
# In a real scenario, close connection here
print(f"Closing connection to {self.db_path}")
self.connection.close()
and try to vanilla pickle it:
db = DatabaseConnection("dummy.db")
pickled_db = pickle.dumps(db) # raises Error
Above raises error: TypeError: cannot pickle 'TextIOWrapper' instances
because the connection stream to file io can’t be pickled. Here’s where copyreg
can fix things. It allows us to define a function which can return a tuple. The tuple’s first value is a callable, and the second value is a tuple of arguments for the callable. There can be more values (refer the __reduce__
docs). Here’s the impl:
def reducer(conn: DatabaseConnection):
print("reducing")
return (expander, (conn.db_path, )) # this is saved
def expander(db_path):
print("Reconstructing...")
return DatabaseConnection(db_path)
copyreg.pickle(DatabaseConnection, reducer) # IMPORTANT
db = DatabaseConnection("dummy.db")
pickled_db = pickle.dumps(db)
This time, we don’t get any exception, because the pickle module finds an entry for DatabaseConnection
in the dispatch_table
and sees that a reducer
function is registered for it. Instead of trying to automatically pickle the DatabaseConnection
instance’s __dict__
(which would include self.connection
, the TextIOWrapper
), the pickler calls reducer
.
Note
The bytes result after pickling is a binary stream of opcodes and data, essentially a mini-program for the unpickler. In this case, our mini-program looks at the return value of the
reducer
and knows that to unpickle, call theexpander
func with the args:conn.db_path
.