Preamble
I’ve already covered a few specifics regarding the PostgreSQL storage engine, zheap. The goal of zheap is to make UPDATE statements run faster so that tables don’t get too big. When an UPDATE is done on a row of data, PostgreSQL now puts two copies of that row in the same data file. While not a harmful tactic, in some circumstances it can result in table bloat. The idea behind zheap undo, which is similar to Oracle, is to handle old copies of rows. The issue was that this “zheap undo” was not cleaned up until now. We created a “discard worker” to take care of the cleanup because of this. The worker minimizes UPDATE-related table bloat by eliminating heaps of UNDO logs.
Getting PostgreSQL and zheap
You must install a PostgreSQL version with zheap support before learning more about what the discard worker does. We developed a container that enables users to test Zip immediately in an effort to make things simpler for everyone.
Zheap was originally intended to be an extension. But unlike other extensions, you can’t simply install it on top of PostgreSQL because it’s too intrusive. zheap does need core patches.
Additionally, I want to emphasize that this is a tech preview rather than a production-ready version. We’re delighted to make it available to the public in order to solicit comments and perhaps even some bug reports.
You can use the free Docker container by clicking here. The installation process is easy and clear-cut:
[hs@fedora ~]$ docker run --name zHeap -p 5432:5432 \ -d cybertecpostgresql/zheap
Run psql as usual to connect to the newly created instance. Postgres serves as both the username and the default password.
Username: postgres
Password: postgres
[hs@fedora ~]$ psql -h localhost -p 5432 -U postgres UNDO: Cleaning up using the discard worker
Once PostgreSQL with zheap support is operational, we can look at the process table as follows:
[hs@fedora ~]$ ps axf | grep post ... 2403084 ? Ss 0:00 \_ /home/hs/pgzheap/bin/postgres -D ./dbzheap 2403086 ? Ss 0:00 \_ postgres: checkpointer 2403087 ? Ss 0:00 \_ postgres: background writer 2403088 ? Ss 0:00 \_ postgres: walwriter 2403089 ? Ss 0:00 \_ postgres: autovacuum launcher 2403090 ? Ss 0:00 \_ postgres: stats collector 2403091 ? Ss 0:00 \_ postgres: undo discard worker 2403093 ? Ss 0:00 \_ postgres: logical replication launcher ...
Automatic launch of the discard worker. Check out its actions and the method used to store the undo log.
A variable needs to be changed before we can create a table and populate it with data.
test=# SHOW default_table_access_method; default_table_access_method ----------------------------- heap (1 row) test=# SET default_table_access_method TO zheap; SET
When creating a new table, this setting instructs PostgreSQL which storage engine to use by default. We usually want a lot. The preferred option in this instance, though, is zheap. To each CREATE TABLE, we can add the zheap option. You may want to set it, as I just did, for the duration of the session in many circumstances:
test=# CREATE TABLE t_large (id serial, name text); CREATE TABLE
It doesn’t really matter how the table is set up. Any layout will do.
Testing zheap UNDO with INSERT load
Let’s begin loading data:
test=# BEGIN; BEGIN test=*# INSERT INTO t_large (name) SELECT 'dummy' FROM generate_series(1, 10000000); INSERT 0 10000000 test=*# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------------+----------+----------+-------------+---------------+------------+------------- public | t_large | table | hs | permanent | zheap | 193 MB | public | t_large_id_seq | sequence | hs | permanent | | 8192 bytes | (2 rows)
The ability of zheap to rollback in the event of an error is crucial. As a result, a great deal of undo must be written: Before we commit the transaction, let’s take a look at the undo directories:
[hs@fedora dbzheap]$ ls -l ./pg_undo/ && ls -l ./base/undo/ total 8 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001A84012A0 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001C9F48970 total 432128 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065A00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065B00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0065C00000 ... -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FD00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FE00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FF00000
As you can see, we’ve created a fairly large number of files in our base/undo directory. Let’s commit and see what happens:
test=*# COMMIT; COMMIT
When the new discard worker starts, PostgreSQL will recycle (= delete] these logs. Let’s take a closer look:
[hs@fedora dbzheap]$ ls -l ./pg_undo/ && ls -l ./base/undo/ total 8 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001A84012A0 -rw-------. 1 hs hs 52 12. Nov 11:47 00000001C9F48970 total 2048 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.007FF00000 -rw-------. 1 hs hs 1048576 12. Nov 11:47 000000.0080000000
We anticipated the files to be gone, and they are.
Testing zheap UNDO with UPDATE load
The same will happen when we run UPDATE statements:
test=# BEGIN; BEGIN test=*# UPDATE t_large SET id = id - 1; UPDATE 10000000
The fact that a lot of WAL has been produced is crucial. As you can see, the file is about 600 MB in size.
[hs@fedora dbzheap]$ cd base/undo/ [hs@fedora undo]$ du -h 603M
Now, let’s commit the transaction:
test=*# COMMIT; COMMIT
What follows the commit statement is particularly intriguing:
[hs@fedora undo]$ date && du -h Fr 12. Nov 11:55:50 CET 2021 603M . [hs@fedora undo]$ date && du -h Fr 12. Nov 11:55:57 CET 2021 2,0M .
The zheap UNDO logs have been deleted, so that’s it. The cleanup is NOT a component of a commit; rather, it is carried out by the discard worker, whose job it is to ensure that the log is not terminated too soon. Keep in mind: You might not be the only transaction at work here, so the cleanup still has to be asynchronous (just like a vacuum is).
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Real Estate IT Economics with Financial Precision: Enteros’ Cost Attribution Intelligence
- 9 February 2026
- Database Performance Management
Introduction Real estate has always been an asset‑heavy, capital‑intensive industry. From commercial portfolios and residential developments to REITs and PropTech platforms, profitability depends on precise financial control. Yet while real estate organizations apply rigorous financial discipline to assets, leases, and investments, their IT and data environments often lack the same level of cost transparency. Modern … Continue reading “Real Estate IT Economics with Financial Precision: Enteros’ Cost Attribution Intelligence”
Managing Database Growth with Financial Precision: Enteros for Tech Leaders
Introduction For technology enterprises, databases are no longer just systems of record—they are engines of innovation. SaaS platforms, AI applications, digital marketplaces, analytics products, and customer-facing services all depend on rapidly growing databases that must scale continuously, remain highly performant, and stay available around the clock. But as database environments grow, so do costs. Cloud … Continue reading “Managing Database Growth with Financial Precision: Enteros for Tech Leaders”
From Performance to Profitability: Enteros Database Intelligence for Real Estate Enterprises
- 8 February 2026
- Database Performance Management
Introduction The real estate sector has undergone a dramatic transformation over the past decade. What was once an asset-heavy, relationship-driven industry is now deeply digital, data-intensive, and platform-centric. Property listing portals, smart building platforms, tenant experience apps, valuation engines, AI-driven pricing models, IoT-enabled facilities management systems, and digital transaction platforms all rely on complex, always-on … Continue reading “From Performance to Profitability: Enteros Database Intelligence for Real Estate Enterprises”
Running Retail on Data: How Enteros Transforms Database Performance Management
Introduction Retail has evolved far beyond physical stores and point-of-sale systems. Today’s retail enterprises operate complex, always-on digital ecosystems that span e-commerce platforms, mobile apps, omnichannel order management, supply chain systems, loyalty programs, personalization engines, and real-time analytics. Every product search, cart update, inventory check, price change, promotion, and payment depends on high-performing databases working … Continue reading “Running Retail on Data: How Enteros Transforms Database Performance Management”