Preamble
I recently encountered an intriguing support case that demonstrates how a problem’s root cause may occasionally be found in the most unlikely of places.
About table bloat
After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, sessions that want to read the row at the same time don’t have to wait. But eventually cleanup will be required for this “garbage.” That is what the autovacuum daemon is supposed to do.
In most cases, you won’t need to worry about that, but occasionally something will go wrong. The table then continues to expand because the old row versions are not deleted. This will not only waste storage space, but it will also make index scans take longer.
To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.
The problem
A client of mine called me after noticing table bloat in the pg attribute system catalog table, which holds the metadata for the table columns.
This may occur if table columns are often changed or removed. Most of the time, these are temporary tables that get deleted automatically when a session or transaction ends.
Temporary tables were used extensively by the client. But the problem was only seen on a small number of Linux servers that run databases and have a lot of Linux machines.
Searching the cause
I looked over the list of typical reasons for table bloat:
- The status “idle in transaction” for open database transactions.
These will prevent autovacuum from removing outdated row versions after the transaction has begun. - A data modification rate that is too fast for autovacuum to handle.
Making autovacuum more aggressive is the appropriate response in this situation.
Both were untrue; the second possibility was ruled out because it would cause bloat on all machines, not just a select few, so it could not be true.
I next examined the impacted table’s usage information:
dbname=> SELECT * FROM pg_stat_sys_tables dbname-> WHERE relname = 'pg_attribute'; -[ RECORD 1 ]-------+------------- relid | 1249 schemaname | pg_catalog relname | pg_attribute seq_scan | 167081 seq_tup_read | 484738 idx_scan | 1506941 idx_tup_fetch | 4163837 n_tup_ins | 0 n_tup_upd | 0 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 0 n_dead_tup | 0 n_mod_since_analyze | 0 last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | 0 autovacuum_count | 0 analyze_count | 0 autoanalyze_count | 0
This demonstrates that the autovacuum has never operated. But what’s more intriguing is that we discover the reason why it didn’t run:
PostgreSQL made no attempt to remove any dead tuples (row versions that could be), as it believes there aren’t any.
The statistics collector
Suspicion was confirmed when the following message was found in the server logs:
Because the statistician is not responding, outdated statistics are being used instead of current ones.
The PostgreSQL backend process that gathers usage statistics is called the statistics collector process.
Following each activity, PostgreSQL backends send statistics about that activity. These statistics updates are sent through a UDP socket on localhost; that is created at PostgreSQL startup time. The statistics collector reads from the socket and aggregates the collected statistics.
Closing in on the problem
The statistics gatherer was active:
918 1 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/ 947 918 postgres: logger process 964 918 postgres: checkpointer process 965 918 postgres: writer process 966 918 postgres: wal writer process 967 918 postgres: autovacuum launcher process 968 918 postgres: stats collector process 969 918 postgres: bgworker: logical replication launcher
I followed the statistics collector’s execution to see what it was doing and to identify any issues.
# strace -p 968 strace: Process 968 attached epoll_pwait(3,
On the UDP socket, the statistics collector was waiting for messages, but none were getting through.
I examined the UPD socket:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp6 0 0 ::1:59517 ::1:59517 ESTABLISHED 918/postmaster
Nothing odd has happened so far.
However, I got a different result when I tried the same thing on a machine that wasn’t having the issue:
# netstat -u -n -p Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp 0 0 127.0.0.1:46031 127.0.0.1:46031 ESTABLISHED 9303/postmaster
It turned out that on all systems that experienced table bloat, the statistics collector socket was created on the IPv6 address for localhost, while all working systems were using the IPv4 address!
However, IPv6 was turned off for the loopback interface on all machines:
# ifconfig lo lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1000 (Local Loopback) RX packets 6897 bytes 2372420 (2.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6897 bytes 2372420 (2.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Nailing the bug to the wall
PostgreSQL uses the POSIX function getaddrinfo(3) to resolve localhost.
PostgreSQL will loop through each address returned by that call, create a UDP socket, and test it until it has one that functions because it is crucial to have a functional statistics collection.
As a result, we can assume that IPv6 was functional at the time PostgreSQL was launched.
After more research, it was found that IPv6 was turned off during the boot process, but there was a race condition:
PostgreSQL would occasionally start with IPv6 disabled and other times without it. And it was on these latter machines that the statistics collector stopped working and the tables became bloated!
The problem was fixed by changing the boot order so that IPv6 is always turned off before PostgreSQL starts.
Conclusion
This demonstrates once more how a computer issue’s root cause might not be what you initially think it is. It also demonstrates the importance of having solid operating system knowledge for database administrators.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Governing Cloud Economics at Scale: Enteros Cost Attribution and FinOps Intelligence for BFSI and Technology Organizations
- 25 January 2026
- Database Performance Management
Introduction Cloud adoption has become foundational for both BFSI institutions and technology-driven enterprises. Banks, insurers, fintechs, SaaS providers, and digital platforms now depend on cloud-native architectures to deliver real-time services, enable AI-driven innovation, ensure regulatory compliance, and scale globally. Yet as cloud usage accelerates, so does a critical challenge: governing cloud economics at scale. Despite … Continue reading “Governing Cloud Economics at Scale: Enteros Cost Attribution and FinOps Intelligence for BFSI and Technology Organizations”
Turning Telecom Performance into Revenue: Enteros Approach to Database Optimization and RevOps Efficiency
Introduction The telecom industry is operating in one of the most demanding digital environments in the world. Explosive data growth, 5G rollout, IoT expansion, cloud-native services, and digital customer channels have fundamentally transformed how telecom operators deliver services and generate revenue. Behind every call, data session, billing transaction, service activation, roaming event, and customer interaction … Continue reading “Turning Telecom Performance into Revenue: Enteros Approach to Database Optimization and RevOps Efficiency”
Scaling AI Without Overspend: How Enteros Brings Financial Clarity to AI Platforms
- 22 January 2026
- Database Performance Management
Introduction Artificial intelligence is no longer experimental. Across industries, AI platforms now power core business functions—recommendation engines, fraud detection, predictive analytics, conversational interfaces, autonomous decision systems, and generative AI applications. But as AI adoption accelerates, a critical problem is emerging just as fast: AI is expensive—and most organizations don’t fully understand why. Read more”Indian Country” … Continue reading “Scaling AI Without Overspend: How Enteros Brings Financial Clarity to AI Platforms”
AI-Native Database Performance Management for Real Estate Technology Enterprises with Enteros
Introduction Real estate has rapidly evolved into a technology-driven industry. From digital property marketplaces and listing platforms to smart building systems, valuation engines, CRM platforms, and AI-powered analytics, modern real estate enterprises run on data-intensive technology stacks. At the center of this transformation lies a critical foundation: databases. Every property search, pricing update, lease transaction, … Continue reading “AI-Native Database Performance Management for Real Estate Technology Enterprises with Enteros”