May 2013, updated Aug 2013
Like any self-respecting data nerd, I find deleting database records abhorrent. What happens if I need to resurrect those records later? What if I want to run a survival analysis? Django’s ORM doesn’t offer any out-of-the-box support for soft-deletion, but it’s not difficult to preserve your data by overriding a few methods and flipping a Boolean field instead of actually deleting anything. After living with that system for a while, though, I’m convinced that it’s inadequate; in this post, I’ll explain its principal shortcoming and propose a slightly more complex, but vastly better, alternative.
At first blush, soft-deletion seems embarrassingly simple. If you’re like me, a system like this probably springs to mind:
1from django.db import models
2from django.db.models.query import QuerySet
3
4
5class SoftDeletionQuerySet(QuerySet):
6 def delete(self):
7 # Bulk delete bypasses individual objects' delete methods.
8 return super(SoftDeletionQuerySet, self).update(alive=False)
9
10 def hard_delete(self):
11 return super(SoftDeletionQuerySet, self).delete()
12
13 def alive(self):
14 return self.filter(alive=True)
15
16 def dead(self):
17 return self.exclude(alive=True)
18
19
20class SoftDeletionManager(models.Manager):
21 def __init__(self, *args, **kwargs):
22 self.alive_only = kwargs.pop('alive_only', True)
23 super(SoftDeletionManager, self).__init__(*args, **kwargs)
24
25 def get_queryset(self):
26 if self.alive_only:
27 return SoftDeletionQuerySet(self.model).filter(alive=True)
28 return SoftDeletionQuerySet(self.model)
29
30 def hard_delete(self):
31 return self.get_queryset().hard_delete()
32
33
34class SoftDeletionModel(models.Model):
35 alive = models.BooleanField(default=True)
36
37 objects = SoftDeletionManager()
38 all_objects = SoftDeletionManager(alive_only=False)
39
40 class Meta:
41 abstract = True
42
43 def delete(self):
44 self.alive = False
45 self.save()
46
47 def hard_delete(self):
48 super(SoftDeletionModel, self).delete()
This approach is straightforward and readable, and for nearly two years it worked well for us at Hearsay Social. However, it inevitably leads to data corruption.
The problem is simple: using a Boolean to store deletion status makes it
impossible to enforce uniqueness constraints in your database. Let’s say you’re
storing user records which should have unique email addresses; with this
soft-deletion scheme, you can only have one active record for
betty@smith.com. Including deletion status in your constraint lets you keep
both a soft-deleted and an active record with the same email address, but then
you’re out of luck — any attempt to delete another record for Betty will throw
an IntegrityError
. Luckily, there’s a better way.
At the database level, there’s a straightforward solution to this problem
(though I didn’t learn about it until a few months ago): store soft-deleted
records with nulls in the alive
column. As mandated by the ANSI SQL
standard, MySQL, Postgres, and SQLite treat each null as a unique snowflake.
However, creating a Django field with this behavior is a little tricky because
we want to forbid False
values in the database (allowing only True
and
NULL
). Here’s what I came up with:
1from django.db import models
2
3class LiveField(models.Field):
4 '''Similar to a BooleanField, but stores False as NULL.
5
6 '''
7 description = 'Soft-deletion status'
8 __metaclass__ = models.SubfieldBase
9
10 def __init__(self):
11 super(LiveField, self).__init__(default=True, null=True)
12
13 def get_internal_type(self):
14 # Create DB column as though for a NullBooleanField.
15 return 'NullBooleanField'
16
17 def get_prep_value(self, value):
18 # Convert in-Python value to value we'll store in DB
19 if value:
20 return 1
21 return None
22
23 def to_python(self, value):
24 # Misleading name, since type coercion also occurs when
25 # assigning a value to the field in Python.
26 return bool(value)
27
28 def get_prep_lookup(self, lookup_type, value):
29 # Filters with .alive=False won't work, so
30 # raise a helpful exception instead.
31 if lookup_type == 'exact' and not value:
32 msg = ("%(model)s doesn't support filters with "
33 "%(field)s=False. Use a filter with "
34 "%(field)s=None or an exclude with "
35 "%(field)s=True instead.")
36 raise TypeError(msg % {
37 'model': self.model.__name__,
38 'field': self.name})
39
40 return super(LiveField, self).get_prep_lookup(lookup_type, value)
This is a drop-in replacement for Django’s stock BooleanField
in the
abstract model above, but under the covers it stores falsy values as nulls. At
Hearsay, we just finished migrating all our models to use LiveField
, and
it’s been a huge help already — having the option to simultaneously support
soft-deletion and uniqueness constraints keeps our application code and our
data clean.
I’ve open-sourced this soft-deletion plug-in as django-livefield
. You can
install it from PyPI or check
out the code on GitHub.
Curious how to test your shiny new soft-deletion field? Check out my post on testing custom Django fields for some tips, or check out the actual test setup.