Preamble
I recently encountered an intriguing support case that demonstrates how a problem’s root cause may occasionally be found in the most unlikely of places.
About table bloat
After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, sessions that want to read the row at the same time don’t have to wait. But eventually cleanup will be required for this “garbage.” That is what the autovacuum daemon is supposed to do.
In most cases, you won’t need to worry about that, but occasionally something will go wrong. The table then continues to expand because the old row versions are not deleted. This will not only waste storage space, but it will also make index scans take longer.
To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.
The problem
A client of mine called me after noticing table bloat in the pg attribute system catalog table, which holds the metadata for the table columns.
This may occur if table columns are often changed or removed. Most of the time, these are temporary tables that get deleted automatically when a session or transaction ends.
Temporary tables were used extensively by the client. But the problem was only seen on a small number of Linux servers that run databases and have a lot of Linux machines.
Searching the cause
I looked over the list of typical reasons for table bloat:
- The status “idle in transaction” for open database transactions.
These will prevent autovacuum from removing outdated row versions after the transaction has begun. - A data modification rate that is too fast for autovacuum to handle.
Making autovacuum more aggressive is the appropriate response in this situation.
Both were untrue; the second possibility was ruled out because it would cause bloat on all machines, not just a select few, so it could not be true.
I next examined the impacted table’s usage information:
dbname=> SELECT * FROM pg_stat_sys_tables dbname-> WHERE relname = 'pg_attribute'; -[ RECORD 1 ]-------+------------- relid | 1249 schemaname | pg_catalog relname | pg_attribute seq_scan | 167081 seq_tup_read | 484738 idx_scan | 1506941 idx_tup_fetch | 4163837 n_tup_ins | 0 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 0 n_dead_tup | 0 n_mod_since_analyze | 0 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 0
This demonstrates that the autovacuum has never operated. But what’s more intriguing is that we discover the reason why it didn’t run:
PostgreSQL made no attempt to remove any dead tuples (row versions that could be), as it believes there aren’t any.
The statistics collector
Suspicion was confirmed when the following message was found in the server logs:
Because the statistician is not responding, outdated statistics are being used instead of current ones.
The PostgreSQL backend process that gathers usage statistics is called the statistics collector process.
Following each activity, PostgreSQL backends send statistics about that activity. These statistics updates are sent through a UDP socket on localhost; that is created at PostgreSQL startup time. The statistics collector reads from the socket and aggregates the collected statistics.
Closing in on the problem
The statistics gatherer was active:
918 1 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ 947 918 postgres: logger process 964 918 postgres: checkpointer process 965 918 postgres: writer process 966 918 postgres: wal writer process 967 918 postgres: autovacuum launcher process 968 918 postgres: stats collector process 969 918 postgres: bgworker: logical replication launcher
I followed the statistics collector’s execution to see what it was doing and to identify any issues.
# strace -p 968 strace: Process 968 attached epoll_pwait(3,
On the UDP socket, the statistics collector was waiting for messages, but none were getting through.
I examined the UPD socket:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp6 0 0 ::1:59517 ::1:59517 ESTABLISHED 918/postmaster
Nothing odd has happened so far.
However, I got a different result when I tried the same thing on a machine that wasn’t having the issue:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 0 0 127.0.0.1:46031 127.0.0.1:46031 ESTABLISHED 9303/postmaster
It turned out that on all systems that experienced table bloat, the statistics collector socket was created on the IPv6 address for localhost, while all working systems were using the IPv4 address!
However, IPv6 was turned off for the loopback interface on all machines:
# ifconfig lo lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 6897 bytes 2372420 (2.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6897 bytes 2372420 (2.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Nailing the bug to the wall
PostgreSQL uses the POSIX function getaddrinfo(3) to resolve localhost.
PostgreSQL will loop through each address returned by that call, create a UDP socket, and test it until it has one that functions because it is crucial to have a functional statistics collection.
As a result, we can assume that IPv6 was functional at the time PostgreSQL was launched.
After more research, it was found that IPv6 was turned off during the boot process, but there was a race condition:
PostgreSQL would occasionally start with IPv6 disabled and other times without it. And it was on these latter machines that the statistics collector stopped working and the tables became bloated!
The problem was fixed by changing the boot order so that IPv6 is always turned off before PostgreSQL starts.
Conclusion
This demonstrates once more how a computer issue’s root cause might not be what you initially think it is. It also demonstrates the importance of having solid operating system knowledge for database administrators.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Improving FinTech Infrastructure with AI-Powered Database Optimization
- 27 April 2026
- Database Performance Management
The financial technology (FinTech) industry has transformed the way businesses and consumers interact with financial services. From digital payments and online lending platforms to automated wealth management and real-time trading systems, FinTech platforms rely heavily on fast, scalable, and secure data infrastructure. Behind every FinTech application lies a complex network of databases processing millions of … Continue reading “Improving FinTech Infrastructure with AI-Powered Database Optimization”
How to Optimize Banking Sector Performance with Enteros Database Management Platform, Azure Cloud, Cloud Management, and Generative AI
Introduction The banking sector is in the midst of a profound digital transformation. With the rise of mobile banking, real-time payments, open banking ecosystems, and AI-driven financial services, banks are under immense pressure to deliver fast, secure, and personalized experiences. At the same time, they must navigate strict regulatory requirements, manage massive volumes of transactional … Continue reading “How to Optimize Banking Sector Performance with Enteros Database Management Platform, Azure Cloud, Cloud Management, and Generative AI”
Enhancing Digital Learning Platforms with AI-Driven Database Performance Monitoring
The global shift toward digital education has transformed how institutions deliver learning experiences. From virtual classrooms and learning management systems to AI-powered tutoring platforms, digital learning environments depend heavily on high-performing databases to function efficiently. Every interaction—logging into a course portal, submitting assignments, streaming lecture videos, accessing study materials, or participating in discussion forums—relies on … Continue reading “Enhancing Digital Learning Platforms with AI-Driven Database Performance Monitoring”
How to Optimize Fashion Sector Growth with Enteros Database Software, Cost Estimation, AI SQL, AI Enablement, and Cloud FinOps
Introduction The fashion sector is undergoing a profound transformation fueled by digital innovation, eCommerce expansion, and rapidly shifting consumer expectations. Today’s fashion brands must deliver highly personalized experiences, manage dynamic supply chains, and operate across omnichannel ecosystems—all while maintaining speed, agility, and cost efficiency. However, growth in the fashion industry is no longer just about … Continue reading “How to Optimize Fashion Sector Growth with Enteros Database Software, Cost Estimation, AI SQL, AI Enablement, and Cloud FinOps”