Preamble
Customers occasionally ask me which option is best for automatically generated primary keys. I’ll discuss the options and offer suggestions in this article.
A primary key is necessary for every table. It’s crucial to be able to recognize a specific table row in a relational database. If you’re curious as to why, look up any of the countless questions requesting assistance in deleting duplicate entries from a table on the internet.
You would be wise to select a primary key that is both distinct and immutable over the course of a table row. This is due to the fact that primary keys are frequently referenced by foreign key constraints, which makes it difficult or time-consuming to change a primary key that is used elsewhere.
Nevertheless, a table can occasionally have a natural primary key, such as a citizen’s social security number. However, in most cases, this attribute is absent, so you must create a fake primary key. Even though there is a natural primary key, some people argue that you should use a synthetic one instead. I won’t get into that “holy war.”
Techniques for auto-generated primary keys in PostgreSQL
There are two fundamental methods:
Key generation using a sequence
A sequence is a database object whose sole function is to generate unique numbers. It does this by increasing a built-in counter.
Sequences are designed to work well with multiple users at the same time, and they will never give out the same number twice. Still, accessing a sequence from many concurrent SQL statements could become a bottleneck, so there is the CACHE option that makes the sequence hand out several values at once to database sessions.
Sequences don’t follow the usual rules for transactions in that when a transaction is rolled back, the counter does not start over. It is not a problem because this is necessary for effective performance. If you want to make a list of numbers without any gaps, a sequence is not the best way to do it. Instead, you have to use less effective and more difficult methods.
To fetch the next value from a sequence, you use the nextval function like this:
"sequence_name"); SELECT nextval;
For more information on how to manipulate sequences, consult the documentation.
producing UUIDs
UUID stands for “universally unique identifier.” It is a 128-bit number that is made by an algorithm that makes sure it is unique. For that, there are several widely used algorithms. There are several functions in PostgreSQL that produce UUIDs:
- The
uuid-osspextension offers functions to generate UUIDs. Note that because of the hyphen in the name, you have to quote the name of the extension (CREATE EXTENSION "uuid-ossp";). - From PostgreSQL v13 on, you can use the core function
gen_random_uuid()to generate version-4 (random) UUIDs.
Note that you should always use the PostgreSQL data type uuid for UUIDs. Don’t try to convert them to strings, or numeric — you will waste space and lose performance.
Defining auto-generated primary keys
A column with automatically generated values can be defined in one of four ways.
Using the DEFAULT clause
You can use this method with sequences and UUIDs. Here are some examples:
CREATE TABLE has_integer_pkey (id: bigint, nextval("integer_id_seq") as default, primary key,...); CREATE TABLE has_uuid_pkey ( id uuid DEFAULT gen_random_uuid() PRIMARY KEY, ... );
PostgreSQL uses the DEFAULT value whenever theINSERT statement doesn’t explicitly insert that column.
Using the serial and bigserial pseudo-types
This method is a shortcut for defining a sequence and setting a DEFAULT clause as above. With this technique, a table is defined as follows:
CREATE TABLE uses_serial ( id bigserial PRIMARY KEY, ... );
That amounts to the following:
CREATE TABLE (id bigint PRIMARY KEY,...); OWNED BY uses_serial.id; CREATE SEQUENCE uses_serial_id_seq; Nextval('uses_serial_id_seq'): ALTER TABLE uses_serial ALTER id SET DEFAULT;
The “OWNED BY” clause adds a dependency between the column and the sequence, so that dropping the column automatically drops the sequence.
Using serial will create an integer column, while bigserial will create a bigint column.
Using identity columns
Since identity columns are made by PostgreSQL “behind the scenes,” this is another use for a sequence.
(id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, CREATE TABLE USES_IDENTITY);
There is also “GENERATED BY DEFAULT AS IDENTITY”, which is the same except that you won’t get an error message if you try to explicitly insert a value for the column (much like with a DEFAULT clause). See below for more!
You can specify sequence options for identity columns:
CREATE TABLE uses_identity ( id bigint GENERATED ALWAYS AS IDENTITY (MINVALUE 0 START WITH 0 CACHE 20) PRIMARY KEY, ... );
Using BEFORE INSERT triggers
This is similar toDEFAULT values, but it allows you to unconditionally override a value inserted by the user with a generated value. The big disadvantage of a trigger is the performance impact.
Should I use integer(serial) or bigint(bigserial) for my auto-generated primary key?
You should always use bigint.
True, an integer occupies four bytes, while a bigint needs eight. But:
- If you have a small table, where
integerwould suffice, the four wasted bytes won’t matter much. Also, not every table that you designed to be small will remain small! - If you have a big table, you might exceed the maximum for
integer, which is 2147483647. Note that that could also happen if your table contains fewer rows than that: you might delete rows, and some sequence values can get “lost” by transactions that are rolled back.
Now it is quite complicated to change the primary key column fromintegertobigintin a big table inside an active database without causing excessive down time, so you should save yourself that pain.
With bigint, you are certain to never exceed the maximum of 9223372036854775807: even if you insert 10000 rows per second without any pause, you have almost 30 million years before you reach the limit.
Should I use bigserial or an identity column for my auto-generated primary key?
You should use an identity column, unless you have to support old PostgreSQL versions.
Identity columns were introduced in PostgreSQL v11, and they have two advantages overbigserial:
- They comply with the SQL standard, while
bigserialis proprietary PostgreSQL syntax. This will make your code more portable. - If you use
GENERATED ALWAYS AS IDENTITY, you will get an error message if you try to override the generated value by explicitly inserting a number. This avoids the common problem that manually entered values will conflict with generated values later on, causing surprising application errors.
So unless you have to support PostgreSQL v10 or below, there is no reason to use bigserial.
Should I use bigint or uuid for an auto-generated primary key?
Use a sequence unless you use database sharding or have another reason to make primary keys in a “decentralized” way (outside of a single database).
Real differences
The advantages of bigint are clear:
bigintuses only eight bytes, whileuuiduses 16- It is less expensive to retrieve a value from a sequence than to create a UUID.
The fact that a sequence is a single object in a single database is one drawback of using one. So, you can’t use a sequence if you use sharding, which is when you split your data up and put it in different databases. UUIDs are an obvious solution in this scenario. (You could use sequences defined with an INCREMENT greater than 1 and different START values, but that might lead to problems when you add additional shards.)
Of course, you will also want to use UUIDs if your primary key was made in an application that was spread across multiple application servers instead of being generated automatically by the database.
Imaginary differences
Some argue that UUIDs are superior because they distribute writes across multiple pages of the primary key index. This is meant to cut down on competition and make the index more balanced, or less broken up. The first statement is true, but it might work against you because it needs to cache the whole index to work well. The second is definitely wrong, since B-tree indices are always balanced. Additionally, a change in PostgreSQL v11 made sure that monotonically increasing values would fill an index more effectively than random inserts ever could (but obviously, subsequent deletes would result in fragmentation). In short, any such advantages are either marginal or nonexistent, and they are more than balanced by the fact that uuid uses twice as much storage, which will make the index bigger, causing more writes and occupying more of your cache.
Aspects of security
People have said that sequence-generated primary keys can leak information because they make it possible to figure out the general order in which rows were added to a table (see, for example, the comments below). That is true, even though I have a hard time imagining a situation in which this would really be a security risk. If that worries you, use UUIDs and put your worries to rest.
Benchmarking bigint versus uuid
My co-worker Kaarel ran a small performance test a while ago and found that uuid was slower thanbigint when it came to bigger joins.
I decided to run a small insert-only benchmark with these two tables:
make a test_bigint table called "unlogged" with the primary key "id bigint" generated with a 200-cell cache; Create an unlogged table with the following formula: test_uuid (id uuid DEFAULT gen_random_uuid() PRIMARY KEY);
I performed the benchmark on my laptop (SSD, 8 cores) with a pgbench custom script that had 6 concurrent clients repeatedly run transactions of 1000 prepared INSERT statements for five minutes:
INSERT INTO test_bigint /* or test_uuid */ DEFAULT VALUES;
Performance comparison bigint versus uuid autogenerated primary keys
bigint |
uuid |
|
|
inserts per second
|
107090
|
74947
|
|
index growth per row
|
30.5 bytes
|
41.7 bytes
|
Usingbigint clearly wins, but the difference is not spectacular.
Conclusion
As auto-generated primary keys, you can use both UUIDs and numbers made by a sequence. Use identity columns unless you need to generate primary keys outside a single database, and make sure all your primary key columns are of type bigint.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Drive Retail Growth Performance with Enteros AI Platform, AI SQL, and Cloud FinOps
- 2 April 2026
- Database Performance Management
Introduction The retail sector is undergoing a rapid digital transformation fueled by e-commerce expansion, omnichannel experiences, AI-driven personalization, and real-time inventory management. Modern retailers must process vast amounts of data while delivering seamless, fast, and highly personalized customer experiences. At the core of this transformation lies a complex technology ecosystem powered by databases, AI platforms, … Continue reading “How to Drive Retail Growth Performance with Enteros AI Platform, AI SQL, and Cloud FinOps”
Who Should Adopt Enteros for Healthcare AI Database Performance and Growth Optimization
Introduction The healthcare sector is rapidly evolving into a data-driven ecosystem powered by artificial intelligence, digital health platforms, and advanced analytics. From electronic health records (EHRs) and telemedicine to AI-driven diagnostics and precision medicine, healthcare organizations are increasingly dependent on high-performance databases and intelligent infrastructure. However, this transformation introduces a critical challenge: Who should adopt … Continue reading “Who Should Adopt Enteros for Healthcare AI Database Performance and Growth Optimization”
Who Should Use Enteros for Telecom Growth with Cost Estimation, Database Performance, and Cost Attribution
Introduction The telecommunications sector is the backbone of the digital economy, enabling connectivity for billions of users, devices, and applications worldwide. From 5G networks and IoT ecosystems to real-time communication platforms and digital services, telecom companies are operating in an environment defined by massive scale, high performance demands, and rapid technological evolution. As telecom providers … Continue reading “Who Should Use Enteros for Telecom Growth with Cost Estimation, Database Performance, and Cost Attribution”
How Enteros Transforms Entertainment Industry Operations with Database Management, AIOps Platform, and Cloud FinOps
Introduction The entertainment industry has undergone a massive digital evolution over the past decade. From streaming platforms and gaming ecosystems to live digital events and content distribution networks, organizations are leveraging advanced technologies to deliver immersive, real-time experiences to global audiences. At the center of this transformation lies a complex infrastructure powered by databases, cloud … Continue reading “How Enteros Transforms Entertainment Industry Operations with Database Management, AIOps Platform, and Cloud FinOps”