Preamble
Have you ever wondered why VACUUM does not reduce the size of your PostgreSQL tables? Have you ever pondered the lack of data file compression in VACUUM? Maybe the article you’ve been looking for is this one. The key takeaway is that it’s crucial to comprehend why space is typically not returned to the operating system following a cleanup. People frequently assume the wrong things about how VACUUM functions internally. Investigating further and discovering the truth makes sense.
The most beneficial information regarding VACUUM is revealed in the post that follows.
Understanding tuple visibility
It’s crucial to grasp how PostgreSQL manages visibility in order to comprehend VACUUM in that database. A group of concealed columns that are a component of the row serve as the foundation for the overall idea. This is how it goes:
test=# CREATE TABLE t_test (id int); CREATE TABLE test=# INSERT INTO t_test VALUES (5), (6), (7); INSERT 0 3 test=# INSERT INTO t_test VALUES (8), (9), (10); INSERT 0 3 
To keep things simple, the table we just built has just one column. Keep in mind that the table’s data was loaded utilizing two distinct transactions. Each transaction inserted three rows, and the hidden columns make this quite clear:
test=# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_test; ctid | xmin | xmax | cmin | cmax | id ------+------+------+------+------+---- (0,1) | 764 | 0 | 0 | 0 | 5 (0,2) | 764 | 0 | 0 | 0 | 6 (0,3) | 764 | 0 | 0 | 0 | 7 (0,4) | 765 | 0 | 0 | 0 | 8 (0,5) | 765 | 0 | 0 | 0 | 9 (0,6) | 765 | 0 | 0 | 0 | 10 (6 rows) 
xmin, xmax, cmin and cmax are hidden columns containing transaction IDs. As you can see, the first three rows have been written by transaction number 764, while the rest of the data has been created using transaction number 765.
PostgreSQL will often (but not always) decide whether a row may be seen by a particular transaction based on those columns, and the hidden columns will manage visibility.
Running UPDATE statements will reveal the situation’s true nature:
test=# BEGIN; BEGIN test=*# UPDATE t_test SET id = id * 10 WHERE id > 9 RETURNING *; id ----- 100 (1 row) UPDATE 1 test=*# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_test; ctid | xmin | xmax | cmin | cmax | id ------+------+------+------+------+----- (0,1) | 764 | 0 | 0 | 0 | 5 (0,2) | 764 | 0 | 0 | 0 | 6 (0,3) | 764 | 0 | 0 | 0 | 7 (0,4) | 765 | 0 | 0 | 0 | 8 (0,5) | 765 | 0 | 0 | 0 | 9 (0,7) | 766 | 0 | 0 | 0 | 100 (6 rows) 
There has been one row change. However, let’s concentrate on the CTID, which identifies the actual location of the row on the disk. Since PostgreSQL had to replicate the row, notice that (0, 6) is no longer present. The row will be replicated once again if we do a second UPDATE:
test=*# UPDATE t_test SET id = id * 10 WHERE id > 9 RETURNING *; id ------ 1000 (1 row) UPDATE 1 test=*# UPDATE t_test SET id = id * 10 WHERE id > 9 RETURNING *; id ------- 10000 (1 row) UPDATE 1 
Since we must retain the original row, copying these rows is crucial. Otherwise, ROLLBACK would not function, so the previous version must be kept.
Let’s take another look at the table:
test=*# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_test; ctid | xmin | xmax | cmin | cmax | id ------+------+------+------+------+------- (0,1) | 764 | 0 | 0 | 0 | 5 (0,2) | 764 | 0 | 0 | 0 | 6 (0,3) | 764 | 0 | 0 | 0 | 7 (0,4) | 765 | 0 | 0 | 0 | 8 (0,5) | 765 | 0 | 0 | 0 | 9 (0,9) | 766 | 0 | 2 | 2 | 10000 (6 rows) test=*# COMMIT; COMMIT 
Dead rows are everywhere between 5 and 9, and they need to be taken out.
VACUUM: Cleaning out rows
Don’t forget that COMMIT shouldn’t kill any dead rows either. As a result, the cleanup procedure must be carried out asynchronously. The function of VACUUM is just this. Let’s test it and see what occurs:
test=# VACUUM VERBOSE t_test INFO: vacuuming "test.public.t_test" INFO: finished vacuuming "test.public.t_test": index scans: 0 pages: 0 removed, 1 remain, 1 scanned (100.00% of total) tuples: 3 removed, 6 remain, 0 are dead but not yet removable removable cutoff: 767, which was 0 XIDs old when operation ended new relfrozenxid: 764, which is 1 XIDs ahead of previous value index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 22.790 MB/s, avg write rate: 27.348 MB/s buffer usage: 6 hits, 5 misses, 6 dirtied WAL usage: 3 records, 3 full page images, 14224 bytes system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s VACUUM 
VACUUM is actively searching for rows that are no longer visible to anyone. The middle of the data file may contain those rows. VACUUM enables PostgreSQL to reuse the space; it does not, however, give it back to the operating system. This is impossible since there is no file system procedure that permits returning “the middle of the file” to the operating system if a data file is 1 GB in size and it is empty. PostgreSQL must instead keep track of this vacant space and reuse it in the future.
Exception to the VACUUM rule
There is, however, an exception to every rule. Take a look at the following code example:
test=# SELECT pg_relation_size('t_test'); pg_relation_size ------------------ 8192 (1 row) test=# DELETE FROM t_test; DELETE 6 test=# SELECT pg_relation_size('t_test'); pg_relation_size ------------------ 8192 (1 row) 
Even after the DELETE statement, the table will keep its size. Recall that cleanup is carried out asynchronously. Consequently, VACUUM can be used to remove those rows:
test=# VACUUM t_test; VACUUM 
This is a bit of an exception. The criterion is that a table may be truncated by a vacuum if, starting at a specific place, ALL rows are dead. In this instance, that is exactly what took place:
test=# SELECT pg_relation_size('t_test'); pg_relation_size ------------------ 0 (1 row) 
After all, there are frequently a few rows near the end of a huge table’s data file (under normal circumstances). Because of this, don’t rely on VACUUM to compress tables.
VACUUM FULL vs. pg_squeeze
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Redefining Insurance RevOps: Enteros’ AIOps-Powered Approach to Smarter Database Performance Management
- 3 November 2025
 - Database Performance Management
 
Introduction The insurance industry has always been data-intensive—dealing with massive volumes of customer information, risk assessments, policy records, and claims data. But in today’s fast-evolving digital landscape, traditional data management models no longer suffice. The sector now faces unprecedented demands for real-time insights, operational agility, and cost efficiency. Modern insurers must process data from a … Continue reading “Redefining Insurance RevOps: Enteros’ AIOps-Powered Approach to Smarter Database Performance Management”
Enteros for Energy Leaders: Redefining Performance Optimization Through Generative AI and Cloud FinOps Intelligence
Introduction The global energy sector is undergoing a monumental transformation. As the demand for clean, efficient, and sustainable energy solutions grows, so too does the complexity of managing massive data flows across power grids, refineries, renewables, and smart infrastructure. Energy companies are no longer just producers and distributors—they’re becoming data-driven enterprises, leveraging cloud computing, AI, … Continue reading “Enteros for Energy Leaders: Redefining Performance Optimization Through Generative AI and Cloud FinOps Intelligence”
When the Cloud Chokes: How a Configuration Change Brought Downtime to Millions
Introduction It wasn’t a dramatic server explosion.It wasn’t a hacker breach.What brought down global services last week was arguably more insidious: a configuration change, a few milliseconds of added latency, and a ripple effect that exposed how fragile modern cloud dependency really is. On 29 October 2025, Microsoft acknowledged that a change in its Azure … Continue reading “When the Cloud Chokes: How a Configuration Change Brought Downtime to Millions”
Enteros for Financial Institutions: Uniting AI Performance Management, Observability, and Cloud FinOps for Operational Excellence
- 2 November 2025
 - Database Performance Management
 
Introduction In today’s fast-paced digital finance ecosystem, agility, scalability, and operational efficiency have become the cornerstones of competitiveness. From high-frequency trading systems to AI-driven fraud detection models, financial institutions rely heavily on massive data infrastructure and complex applications to deliver real-time insights and secure, personalized services. However, this digital transformation brings forth significant challenges — … Continue reading “Enteros for Financial Institutions: Uniting AI Performance Management, Observability, and Cloud FinOps for Operational Excellence”