Preamble
I recently encountered an intriguing support case that demonstrates how a problem’s root cause may occasionally be found in the most unlikely of places.
About table bloat
After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, sessions that want to read the row at the same time don’t have to wait. But eventually cleanup will be required for this “garbage.” That is what the autovacuum daemon is supposed to do.
In most cases, you won’t need to worry about that, but occasionally something will go wrong. The table then continues to expand because the old row versions are not deleted. This will not only waste storage space, but it will also make index scans take longer.
To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.
The problem
A client of mine called me after noticing table bloat in the pg attribute system catalog table, which holds the metadata for the table columns.
This may occur if table columns are often changed or removed. Most of the time, these are temporary tables that get deleted automatically when a session or transaction ends.
Temporary tables were used extensively by the client. But the problem was only seen on a small number of Linux servers that run databases and have a lot of Linux machines.
Searching the cause
I looked over the list of typical reasons for table bloat:
- The status “idle in transaction” for open database transactions.
These will prevent autovacuum from removing outdated row versions after the transaction has begun. - A data modification rate that is too fast for autovacuum to handle.
Making autovacuum more aggressive is the appropriate response in this situation.
Both were untrue; the second possibility was ruled out because it would cause bloat on all machines, not just a select few, so it could not be true.
I next examined the impacted table’s usage information:
dbname=> SELECT * FROM pg_stat_sys_tables dbname-> WHERE relname = 'pg_attribute'; -[ RECORD 1 ]-------+------------- relid | 1249 schemaname | pg_catalog relname | pg_attribute seq_scan | 167081 seq_tup_read | 484738 idx_scan | 1506941 idx_tup_fetch | 4163837 n_tup_ins | 0 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 0 n_dead_tup | 0 n_mod_since_analyze | 0 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 0
This demonstrates that the autovacuum has never operated. But what’s more intriguing is that we discover the reason why it didn’t run:
PostgreSQL made no attempt to remove any dead tuples (row versions that could be), as it believes there aren’t any.
The statistics collector
Suspicion was confirmed when the following message was found in the server logs:
Because the statistician is not responding, outdated statistics are being used instead of current ones.
The PostgreSQL backend process that gathers usage statistics is called the statistics collector process.
Following each activity, PostgreSQL backends send statistics about that activity. These statistics updates are sent through a UDP socket on localhost; that is created at PostgreSQL startup time. The statistics collector reads from the socket and aggregates the collected statistics.
Closing in on the problem
The statistics gatherer was active:
918 1 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ 947 918 postgres: logger process 964 918 postgres: checkpointer process 965 918 postgres: writer process 966 918 postgres: wal writer process 967 918 postgres: autovacuum launcher process 968 918 postgres: stats collector process 969 918 postgres: bgworker: logical replication launcher
I followed the statistics collector’s execution to see what it was doing and to identify any issues.
# strace -p 968 strace: Process 968 attached epoll_pwait(3,
On the UDP socket, the statistics collector was waiting for messages, but none were getting through.
I examined the UPD socket:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp6 0 0 ::1:59517 ::1:59517 ESTABLISHED 918/postmaster
Nothing odd has happened so far.
However, I got a different result when I tried the same thing on a machine that wasn’t having the issue:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 0 0 127.0.0.1:46031 127.0.0.1:46031 ESTABLISHED 9303/postmaster
It turned out that on all systems that experienced table bloat, the statistics collector socket was created on the IPv6 address for localhost, while all working systems were using the IPv4 address!
However, IPv6 was turned off for the loopback interface on all machines:
# ifconfig lo lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 6897 bytes 2372420 (2.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6897 bytes 2372420 (2.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Nailing the bug to the wall
PostgreSQL uses the POSIX function getaddrinfo(3) to resolve localhost.
PostgreSQL will loop through each address returned by that call, create a UDP socket, and test it until it has one that functions because it is crucial to have a functional statistics collection.
As a result, we can assume that IPv6 was functional at the time PostgreSQL was launched.
After more research, it was found that IPv6 was turned off during the boot process, but there was a race condition:
PostgreSQL would occasionally start with IPv6 disabled and other times without it. And it was on these latter machines that the statistics collector stopped working and the tables became bloated!
The problem was fixed by changing the boot order so that IPv6 is always turned off before PostgreSQL starts.
Conclusion
This demonstrates once more how a computer issue’s root cause might not be what you initially think it is. It also demonstrates the importance of having solid operating system knowledge for database administrators.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Modernize Financial Infrastructure with Enteros AIOps Platform and Cloud FinOps Intelligence
- 12 March 2026
- Database Performance Management
Introduction The financial sector is undergoing a profound digital transformation. Banks, fintech platforms, payment networks, insurance providers, and investment firms increasingly rely on digital infrastructure to deliver services at scale. From real-time payments and digital banking to fraud detection and AI-driven financial analytics, modern financial institutions operate within highly complex data ecosystems. At the core … Continue reading “How to Modernize Financial Infrastructure with Enteros AIOps Platform and Cloud FinOps Intelligence”
How Healthcare Platforms Improve Cost Attribution with Enteros Database Management, GenAI, and Agentic AI
Introduction The healthcare industry is rapidly transforming through digital innovation. Hospitals, healthcare networks, pharmaceutical companies, and health technology platforms increasingly rely on advanced digital infrastructure to deliver efficient, data-driven care. Electronic health records, telemedicine platforms, medical imaging systems, insurance processing tools, and healthcare analytics platforms all depend on large-scale data environments. Behind these digital systems … Continue reading “How Healthcare Platforms Improve Cost Attribution with Enteros Database Management, GenAI, and Agentic AI”
What Drives Growth in Technology Platforms: Enteros AI SQL, Database Management, and Performance Metrics
- 11 March 2026
- Database Performance Management
Introduction Technology platforms have become the backbone of the modern digital economy. From SaaS products and cloud-native applications to AI-powered analytics and global digital marketplaces, technology enterprises rely on robust infrastructure to deliver reliable, scalable services to millions of users. At the center of these digital ecosystems lies one of the most critical components of … Continue reading “What Drives Growth in Technology Platforms: Enteros AI SQL, Database Management, and Performance Metrics”
How to Modernize Fashion Data Platforms with Enteros Database Management and Generative AI
Introduction The global fashion industry has transformed dramatically in the digital era. Once driven primarily by seasonal collections and physical retail, fashion brands today rely heavily on digital platforms, e-commerce marketplaces, data analytics, and AI-powered customer experiences. From trend forecasting and inventory management to real-time customer engagement, modern fashion businesses are powered by complex data … Continue reading “How to Modernize Fashion Data Platforms with Enteros Database Management and Generative AI”