Preamble
After you ANALYZE a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the pg_stats system view. This article will explain the meaning of the correlation column and its impact on index scans.
Physical vs. logical ordering
Most common PostgreSQL data types have an ordering: they support the operators <, <=, =, >= and >.
These data types can be used with the “standard” index type, the B-tree index.
The values in a column of this kind tell you how the table rows should be put in order. An index on this column will be sorted according to that ordering.
A PostgreSQL table is made up of one or more 8KB block files. The physical ordering refers to the arrangement of the rows as they are stored in the file.
You can examine the physical ordering of the rows by selecting the ctid system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.
The correlation for a column is a value between -1 and 1. It reveals how well logical and physical ordering match each other.
- The rows are stored in the table file in ascending column order if the correlation is 1; otherwise, they are stored in descending order.
- Less ideal matching is denoted by values between -1 and 1.
- There is no relationship between the physical and logical orders when the value is 0.
Why should I care?
On your tables, you’ll create indexes for quicker access.
The effectiveness of an index scan is impacted by a column’s correlation.
A full index or a portion of it is read in index sequential order during an index scan. The table’s corresponding row is retrieved for each entry that is located (this is skipped in an “index only scan,” but that is another story).
The rows that are fetched will come from all over the table if the correlation of the indexed column is close to zero. This will lead to a large number of randomly distributed reads of various table blocks.
The next row fetched during the index scan, however, tends to be in the same or the following table block if the correlation is close to 1 or -1.
High correlation offers two benefits:
- If many of the table rows fetched during the index scan are contained in the same table block, only a small number of blocks need to be read from storage because blocks read by the database are cached in shared memory.
- Due to the proximity of the blocks that need to be read from storage, sequential I/O—which is much faster than random I/O on spinning disks—is enabled.
An example
Let’s make two identical tables with different correlations, but with the same content:
CREATE TABLE corr (id, val) AS SELECT i, 'some text ' || i FROM generate_series(1, 100000) AS i; CREATE INDEX corr_idx ON corr (id); VACUUM (ANALYZE) corr; SELECT correlation FROM pg_stats WHERE tablename = 'corr' AND attname = 'id'; correlation ------------- 1 (1 row) CREATE TABLE uncorr AS SELECT * FROM corr ORDER BY random(); CREATE INDEX uncorr_idx ON uncorr (id); VACUUM (ANALYZE) uncorr; SELECT correlation FROM pg_stats WHERE tablename = 'uncorr' AND attname = 'id'; correlation ------------- -0.00522369 (1 row)
In order to compare index scans on the two tables, we disable bitmap index scans.
Then we assess how index scans function:
SET enable_bitmapscan = off; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM corr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using corr_idx on corr (cost=0.29..15.23 rows=297 width=19) (actual time=0.108..0.732 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=6 Planning time: 0.456 ms Execution time: 1.049 ms (5 rows) EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM uncorr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using uncorr_idx on uncorr (cost=0.29..978.15 rows=298 width=19) (actual time=0.105..2.352 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=303 Planning time: 0.548 ms Execution time: 2.736 ms (5 rows)
Since all blocks were already in shared buffers, 2.7 milliseconds is actually not that bad.
The 303 blocks from the second query will perform much worse than the 6 from the first if some of these blocks need to be read from disk!
Each result row for the second query was located in a different table block. This resulted in the touching of 300 blocks. Index blocks make up the final three blocks.
Only three table blocks are touched by the first query:
SELECT ctid, id FROM corr WHERE id BETWEEN 1001 AND 1300; ctid | id ---------+------ (6,58) | 1001 (6,59) | 1002 (6,60) | 1003 (6,61) | 1004 (6,62) | 1005 (6,63) | 1006 (6,64) | 1007 ... (8,37) | 1294 (8,38) | 1295 (8,39) | 1296 (8,40) | 1297 (8,41) | 1298 (8,42) | 1299 (8,43) | 1300 (300 rows)
In fact, blocks 6, 7 and 8 of the table contain every row!
Correlation and the optimizer
The PostgreSQL optimizer figures out how much it will cost to run an SQL statement in each way.
Better cost estimates for an index scan can be obtained by using the correlation, which will help you make better plan decisions.
If the correlation is close to or equal to 1, the PostgreSQL optimizer will favor index scans.
Correlation and BRIN indexes
The BRIN index (block range index) was first made available in PostgreSQL 9.5.
This index functions by keeping track of the minimum and maximum values across all ranges of table block values. Only columns with perfect correlation can benefit from it. Its size advantage over the B-tree index makes it an intriguing alternative for large tables.
How to make use of correlation?
Keep the table in index order if you need to efficiently scan larger sections of an index.
In PostgreSQL, there are no “index ordered tables.”
Still, there are two ways to keep a column’s correlation high:
- Automatically:Good examples for that are primary key columns generated by sequences or measurements with a timestamp. If the table rows are inserted in logical column order and there are no updates or deletes on the table, the physical ordering will be identical to the logical ordering.
A BRIN index might be an interesting choice because correlation is always perfect in this scenario.Table partitioning can be used to delete outdated data from a table without affecting the physical ordering.
- Clustering: The SQL statement
CLUSTERcan be used to rewrite a table so that the physical ordering is identical to the logical ordering of an index.However, subsequent modifications of the table will reduce the correlation again. Because of that, you need to re-cluster the table regularly to maintain high correlation. This is annoying, becauseCLUSTERblocks all concurrent access to the table.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Optimize Retail Sector Growth with Enteros Cloud FinOps, RevOps Efficiency, Cost Attribution, Cost Estimation, and AI SQL
- 9 April 2026
- Database Performance Management
Introduction The retail sector is undergoing a rapid transformation driven by eCommerce expansion, omnichannel experiences, AI-powered personalization, and data-driven decision-making. Retailers today operate across multiple platforms—online stores, mobile apps, physical outlets, and global marketplaces—creating highly complex and data-intensive environments. As retail businesses scale, they face a critical challenge:how to drive growth while controlling costs, optimizing … Continue reading “How to Optimize Retail Sector Growth with Enteros Cloud FinOps, RevOps Efficiency, Cost Attribution, Cost Estimation, and AI SQL”
How to Optimize Healthcare Growth Management with Enteros Database Performance and AIOps Platform
Introduction The healthcare sector is rapidly evolving with the adoption of digital technologies such as electronic health records (EHRs), telemedicine, AI-driven diagnostics, and real-time patient monitoring. These innovations are transforming how healthcare providers deliver services, improve patient outcomes, and manage operations. However, as healthcare organizations scale, they face a critical challenge:how to manage growth efficiently … Continue reading “How to Optimize Healthcare Growth Management with Enteros Database Performance and AIOps Platform”
How to Drive eCommerce Revenue Growth with Enteros Growth Management, RevOps Efficiency, and Cloud FinOps
- 8 April 2026
- Database Performance Management
Introduction The eCommerce sector has witnessed explosive growth over the past decade, fueled by digital transformation, mobile shopping, AI-driven personalization, and global online marketplaces. From startups to enterprise retailers, businesses are scaling rapidly to meet rising customer expectations for speed, convenience, and seamless experiences. However, this rapid growth introduces a critical challenge:how to increase revenue … Continue reading “How to Drive eCommerce Revenue Growth with Enteros Growth Management, RevOps Efficiency, and Cloud FinOps”
How to Drive Healthcare Sector Performance Growth with Enteros Database Management and AI SQL Optimization
Introduction The healthcare sector is undergoing a significant digital transformation driven by electronic health records (EHRs), telemedicine, AI-powered diagnostics, and real-time patient monitoring systems. Healthcare organizations are increasingly relying on data to deliver better patient outcomes, improve operational efficiency, and ensure regulatory compliance. However, with the exponential growth of healthcare data comes a major challenge:how … Continue reading “How to Drive Healthcare Sector Performance Growth with Enteros Database Management and AI SQL Optimization”