Preamble
xmaxis an MVCC (Multiversion Concurrency Control) system column in PostgreSQL. The documentation is rather succinct:
The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version. It is possible for this column to be nonzero in a visible row version. That usually indicates that the deleting transaction hasn’t committed yet, or that an attempted deletion was rolled back.
Even though this is true, the use of “weasel words” like “usually” implies that there is more to the story. I want to examine this in this piece.
The two meanings of xmax
PostgreSQL implements MVCC by storing multiple versions of each row in the table. To follow PostgreSQL convention, I’ll use the word “tuple” to mean “row version.”
xmax is actually employed by PostgreSQL for two purposes:
- It stores the transaction ID (“
xid”) of the transaction that deleted the tuple, like the documentation says. Remember thatUPDATEalso deletes a tuple in PostgreSQL! - Row locks are stored on the tuple.
This is possible because normal locks only last as long as the transaction does, and a tuple is only deleted after the transaction that deletes it has finished. Tuples cannot be locked and deleted simultaneously.
The main benefit of putting row locks on the tuple itself is that it prevents the “lock table” from getting too full. The lock table is a fixed-size area in shared memory that is set up when the server starts up. It might not have enough space to hold all the row locks from a larger transaction. To deal with this, you’d need techniques like “lock escalation,” which are challenging to implement, have an adverse effect on concurrency, and cause a variety of unpleasant issues.
Putting row locks in the tuple has another drawback: each row lock changes the table, and the changed blocks have to be written back to persistent storage. Therefore, row locks increase the I/O load.
But there are a few unanswered queries:
- How can you tell which of the two meanings
xmaxhas in a tuple? - How can I tell if
xmaxis valid or not? - How are row locks actually stored?
In the rest of this article, we’ll talk more about this subject and answer these questions.
An example
In the following, I’ll use a simple schema for demonstration. I am using PostgreSQL v10, but this hasn’t changed in the last couple of releases.
CREATE TABLE parent (pp_id integer PRIMARY KEY, p_val text); CREATE TABLE child (cc_id integer PRIMARY KEY, p_id integer REFERENCES parent (p_id), c_val text); INSERT INTO parent (p_id, p_val) VALUES (42, 'parent');
Now let’s look at the relevant system columns:
ctid | xmin | xmax | p_id | p_val FROM parent; ctid | xmin | xmax | p_id | p_val -------+-------+------+------+-------- (0,1) | 53163 | 0 | 42 | parent (1 row)
This is the simple view we expect to see:ctid is the physical location of the tuple (Block 0, item 1), xmin contains the ID of the inserting transaction, and xmax is zero because the row is alive.
Now let’s start a transaction in session 1 and delete the row:
session1=# BEGIN; session1=# DELETE FROM parent WHERE p_id = 42;
Then session 2 can see that xmax has changed:
ctid | xmin | xmax | p_id | p_val FROM parent; ctid | xmin | xmax | p_id | p_val -------+-------+-------+------+-------- (0,1) | 53163 | 53165 | 42 | parent (1 row)
But wait, we change our mind in session 1 and undo the change:
session1=# ROLLBACK;
To find out what xmax means in this case, let’s call in the cavalry.
pageinspect comes to the rescue
PostgreSQL comes with a “contrib” module called
that can be used to examine the actual contents of table blocks. It is installed with
CREATE EXTENSION pageinspect;
We’ll use two of its functions:
get_raw_page: reads one 8kB block from the table’s data fileheap_page_item_attrsThis method returns the metadata and data for each tuple in a data block.
Needless to say, these functions are superuser only.
heap_page_item_attrs returns an integer field named t_infomask that contains several flags, some of which tell us the meaning of xmax. To get the full story, you’ll have to read the code in src/include/access/htup_details.h.
Looking at table block 0, which contains our tuple, we see:
"session2=# SELECT" Input parameters include lp, t_ctid AS ctid, t_xmin AS xmin, t_xmax AS xmax, (t_infomask & 128)::boolean AS xmax_is_lock, (t_infomask & 1024)::boolean AS xmax_committed, (t_infomask & 2048)::boolean AS xmax_rolled_ AS p_id, t_attrs[2] AS p_val FROM heap_page_item_attrs( get_raw_page('parent', 0), 'parent' ); -[ RECORD 1 ]----+----------------- lp | 1 ctid | (0,1) xmin | 53163 xmax | 53165 xmax_is_lock | f xmax_committed | f xmax_rolled_back | f xmax_multixact | f p_id | \x2a000000 p_val | \x0f706172656e74
The attributes p_id and p_val are displayed in binary form.
The information in the tuple doesn’t tell us whether the transaction that set xmax has been committed or rolled back, so we (and PostgreSQL when it inspects the tuple) still don’t know what to make of xmax. This is because PostgreSQL doesn’t change the tuple after a transaction is finished.
We would need to look at the commit log, which records each transaction’s state, to clear up that ambiguity. The commit log is persisted in the pg_xact subdirectory of the PostgreSQL data directory (pg_clog in older versions).
A SELECT that modifies data
But any database transaction that reads the tuple and searches the commit log will keep the result in the tuple so that the next reader doesn’t have to do the same thing again. This is called “setting the hint bits.” However, we cannot examine the commit log in SQL.
Because of this, all we have to do is read the tuple.
session2=# SELECT ctid, xmin, xmax, p_id, p_val FROM parent; ctid | xmin | xmax | p_id | p_val -------+-------+-------+------+-------- (0,1) | 53163 | 53165 | 42 | parent (1 row)
The data contained in the tuple is altered as a result. Let’s have another look at pageinspect:
-[ RECORD 1 ]----+----------------- lp | 1 ctid | (0,1) xmin | 53163 xmax | 53165 xmax_is_lock | f xmax_committed | f xmax_rolled_back | t xmax_multixact | f p_id | \x2a000000 p_val | \x0f706172656e74
We can see that xmax is from a transaction that was rolled back and should not be taken into consideration because the SELECT statement set the flags on the tuple.
Aside from that, it means that the first reader changed a tuple, which caused unexpected write I/O. Although it is inconvenient, this is the cost of fast commits and rollbacks. It’s also the reason it’s a good idea to either bulk load data using copy (freeze) or vacuum the data once it’s loaded.
Now that we understand how to evaluate whether or not xmax comes from a valid transaction, what about row locks?
Row locks and xmax
Rows are locked when data-changing statements are used, but there is an easy way to lock a row without adding or removing tuples:
start session1; session1=# SELECT * FROM parent WHERE p_id = 42 FOR UPDATE; p_id | p_val ------+-------- 42 | parent (1 row)
Now what does pageinspect tell us?
-[ RECORD 1 ]----+----------------- lp | 1 ctid | (0,1) xmin | 53163 xmax | 53166 xmax_is_lock | t xmax_committed | f xmax_rolled_back | f xmax_multixact | f p_id | \x2a000000 p_val | \x0f706172656e74
The row is clearly locked. In this case, it is aFOR UPDATE lock, but the query does not distinguish between the lock modes for simplicity’s sake. You’ll notice that xmax again is neither committed nor rolled back, but we don’t care because we know it is a row lock.
xmax is set to 53166, the locking transaction's transaction ID. Let’s close that transaction to continue:
session1=# COMMIT;
If xmax has a row lock, the row is always active, no matter what the status of the transaction that locked it is. Hence, PostgreSQL does not need to set hint bits in this case.
You’re in for a surprise if you think you’ve seen it all.
Multiple locks on a single row
We saw in the last example that PostgreSQL saves the locking transaction’s transaction ID in xmax. As long as only one transaction has a lock on that tuple, this works as intended. This is always true for exclusive locks, such as the one that SELECT… FOR UPDATE accepts.
But PostgreSQL also knows about other row locks, like the FOR KEY SHARE lock, which is put on the target of a foreign key constraint to stop key changes in that row from happening at the same time. Let’s fill out the kid table with a few more rows:
session1=# BEGIN; session1=# INSERT INTO child (c_id, p_id, c_val) VALUES (1, 42, 'first'); session2=# BEGIN; session2=# INSERT INTO child (c_id, p_id, c_val) VALUES (2, 42, 'second');
Now let’s take a second look at our parent row:
(RECORD ONE) ----+----------------- lp | 1 ctid | (0,1) xmin | 53163 xmax | 3 xmax_is_lock | t xmax_committed | f xmax_rolled_back | f xmax_multixact | t p_id | \x2a000000 p_val | \x0f706172656e74
That “3” in xmax cannot be a transaction ID (they keep counting up), and the xmax_multixact flag is set.
This is the ID of a “multiple transaction object,” called “mulitxact” in PostgreSQL jargon for lack of a better word. When multiple transactions lock a row, such objects are created, and their IDs are counted (as you can see, this database only needs a few of them). Multixacts are persisted in the pg_multixact subdirectory of the data directory.
You can get information about the members of a multixact with the undocumented pg_get_multixact_members function:
session2=# SELECT * FROM pg_get_multixact_members('3'); xid | mode -------+------- 53167 | keysh 53168 | keysh (2 rows)
Now you really know what is in an xmax!
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Empowering the Cloud Center of Excellence: How Enteros Uses Generative AI for Real-Time Monitoring and Performance Optimization in the Technology Sector
- 4 November 2025
- Database Performance Management
Introduction In the era of digital transformation, the technology sector stands at the forefront of innovation, harnessing cloud computing, artificial intelligence, and big data to drive performance and efficiency. However, as cloud infrastructures scale in size and complexity, managing performance, resource allocation, and cost optimization becomes increasingly challenging. Enter the Cloud Center of Excellence (CCoE) … Continue reading “Empowering the Cloud Center of Excellence: How Enteros Uses Generative AI for Real-Time Monitoring and Performance Optimization in the Technology Sector”
AI SQL Meets Healthcare Innovation: Enteros’ Breakthrough in Database Performance Optimization
Introduction In the modern healthcare landscape, data has become both a vital asset and a formidable challenge. Hospitals, research institutions, and digital health startups generate and process massive amounts of data—from patient records and clinical trial results to real-time monitoring devices and medical imaging. Yet, the performance of these complex data ecosystems often determines how … Continue reading “AI SQL Meets Healthcare Innovation: Enteros’ Breakthrough in Database Performance Optimization”
Redefining Insurance RevOps: Enteros’ AIOps-Powered Approach to Smarter Database Performance Management
- 3 November 2025
- Database Performance Management
Introduction The insurance industry has always been data-intensive—dealing with massive volumes of customer information, risk assessments, policy records, and claims data. But in today’s fast-evolving digital landscape, traditional data management models no longer suffice. The sector now faces unprecedented demands for real-time insights, operational agility, and cost efficiency. Modern insurers must process data from a … Continue reading “Redefining Insurance RevOps: Enteros’ AIOps-Powered Approach to Smarter Database Performance Management”
Enteros for Energy Leaders: Redefining Performance Optimization Through Generative AI and Cloud FinOps Intelligence
Introduction The global energy sector is undergoing a monumental transformation. As the demand for clean, efficient, and sustainable energy solutions grows, so too does the complexity of managing massive data flows across power grids, refineries, renewables, and smart infrastructure. Energy companies are no longer just producers and distributors—they’re becoming data-driven enterprises, leveraging cloud computing, AI, … Continue reading “Enteros for Energy Leaders: Redefining Performance Optimization Through Generative AI and Cloud FinOps Intelligence”