IntegrityError: duplicate key value violates unique constraint
A tale of how I learned about PostgreSQL sequence generators the hard way.
How it started
When I was developing a Django web application, I created a model to
store addresses. I was using these addresses for two purposes: billing
and shipping. The Address
model had the usual fields (street, postal
code, et cetera) and the Profile
model looked similar to this:
class Profile(models.Model):
...
shipping_address = models.ForeignKey(Address, related_name='shipping')
billing_address = models.ForeignKey(Address, related_name='billing')
Everything was fine until the validation for the addresses needed to be different for the billing and shipping address. (Post office boxes were allowed for the billing address but not for the shipping address.)
I decided to create two different models (Address
and BillingAddress
)
which both inherit from an
abstract base class
(BaseAddress
). The Profile
model changed to use these models:
class Profile(models.Model):
...
shipping_address = models.ForeignKey(Address, related_name='shipping')
billing_address = models.ForeignKey(BillingAddress, related_name='billing')
Migration
Unfortunately this change was made after the site had gone live, so I needed to migrate the database. I was already using South to handle database migrations,1 so I wasn’t too worried. While in most cases the migrations created by South are fine, this one needed some manual intervention.
The migration consisted of four steps:
- Create the table for the
BillingAddress
model. - Copy all the billing addresses to the
BillingAddress
model table. - Change the
Profile
model table to have references to theBillingAddress
model. - Remove the billing addresses from the
Address
model table.
To make it easier on myself, I decided that in the second step I would
just copy the ID from old Address
to the new BillingAddress
. That way
I didn’t have to update each profile to point to the new
BillingAddress
. A bit crude perhaps, but effective.
Initially everything seemed to work. The migration went okay and the application still worked. Everybody was happy and I moved on to a different task. That is… until I received an error a couple of days later:
IntegrityError: duplicate key value violates unique constraint "profile_billingaddress_pkey"
As the error states, the application was trying to create a
BillingAddress
with an ID that already existed.
What had happened?
Django has a nice
ORM which
abstracts away all kinds of stuff you don’t want to be bothered
with. For instance: unless specified differently, a model has an
AutoField
named id
which is an
auto-incrementing primary key. And,
as specified in the documentation, “[t]here’s no way to tell what the
value of an ID will be before you call save()
, because that value is
calculated by your database, not by Django.” (Emphasis added by me.)
For some reason, PostgreSQL was calculating the wrong ID now when
storing a BillingAddress
. It had to be related to our recent
migration. But why hadn’t I seen it during our tests? And why did it
only start occurring after a couple of days?
Some digging revealed the existence of PostgreSQL’s sequences. In hindsight a first clue could have been found by looking at the SQL generated by Django:
$ manage sql profile
...
CREATE TABLE "profile_billingaddress" (
"id" sequence NOT NULL PRIMARY KEY,
...
When I inspected the database, I got more information about the sequence:
$ manage dbshell
db=> \ds
List of relations
Schema | Name | Type | Owner
--------+-------------------------------------------------+----------+-------
... | ... | ... | ...
public | profile_billingaddress_id_seq | sequence | mark
db=> \d profile_billingaddress_id_seq
Sequence "public.profile_billingaddress_id_seq"
Column | Type | Value
---------------+---------+-------------------------------
sequence_name | name | profile_billingaddress_id_seq
last_value | bigint | 6
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 28
is_cycled | boolean | f
is_called | boolean | f
And there was my problem: the last_value
was 6 while there were many
more addresses in the database. Inspecting the related table
(profile_billingaddress
) right after performing the migration,
revealed that the IDs used were 7, 9, 12, 14, 16, et cetera. This was
a result of my decision to copy the IDs from the original Address
table.
So when the first six new BillingAddresses
were created, I was lucky
since those got a unique ID (1 through 6). That was why I hadn’t
discovered this problem earlier: I only confirmed that the migration
worked by adding a few (less than seven) new BillingAddresses
.
The solution
As is often the case: once you’ve discovered the cause of the problem,
the solution becomes trivial. In this case I just had to set the
last_value
of the sequence to the highest ID in the table.
I chose the quick-and-dirty solution to create a new, empty migration:
$ manage schemamigration profile fix_sequence_problem --empty
Then I added this code to the forwards
method:
if orm.BillingAddress.objects.count():
highest_number = db.execute('select id from profile_billingaddress order by id desc limit 1;')[0][0]
db.execute('alter sequence profile_billingaddress_id_seq restart with %s;' % (highest_number + 1))
I probably could have safely set the sequence to highest number but I
choose to increment by one just to make sure I didn’t have an
off-by-one error. The second line fails if there are no
BillingAddresses
so to prevent my tests from failing I check for the
existence of BillingAddresses
explicitly.
After I ran this migration on the production environment, users could
create profiles again without triggering an IntegrityError
.
-
Update (2021-07-19): South has been deprecated. From Django 1.7 upwards, migrations are built into the core of Django. ↩︎