Preamble
If you’re looking for the ideal combination to sabotage performance as effectively as possible, missing indexes are a crucial component. However, you should keep an eye out for missing indexes and make sure that all pertinent tables are properly taken care of if you want to make sure that your database performs well and if you generally do not like user complaints. Performance can vary greatly depending on a single PostgreSQL index.
I’ve put together this short guide to help you understand how to find missing indexes, how to fix them, and how to get good database performance.
Setting up a test database
I need to create a test database first to show how to find missing indexes, and one way to do that is to use pgbench:
[hs@hansmacbook ~]$ pgbench -i -s 100 test dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... 10000000 of 10000000 tuples (100%) done (elapsed 13.65 s, remaining 0.00 s) vacuuming... creating primary keys... done in 19.92 s (drop tables 0.00 s, create tables 0.00 s, client-side generate 13.70 s, vacuum 1.95 s, primary keys 4.27 s).
The largest table is pgbench_accounts, which was just provided to us by pgbench and contains the following information:
test=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+------------------+-------+-------+-------------+---------------+---------+------------- public | pgbench_accounts | table | hs | permanent | heap | 1281 MB | public | pgbench_branches | table | hs | permanent | heap | 40 kB | public | pgbench_history | table | hs | permanent | heap | 0 bytes | public | pgbench_tellers | table | hs | permanent | heap | 80 kB | (4 rows)
We need to remove some indexes because this database’s default indexing is perfect so that we can find something that we can fix later:
test=# \d pgbench_accounts Table "public.pgbench_accounts" Column | Type | Collation | Nullable | Default ----------+---------------+-----------+----------+--------- aid | integer | | not null | bid | integer | | | abalance | integer | | | filler | character(84) | | | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid) test=# ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey; ALTER TABLE
We have simply removed the primary key, which is essentially just a unique index that forbids NULL entries.
Running a demonstration benchmark
Make sure pg_stat_statements, the most crucial tool to handle performance issues, is installed and running before we start running our benchmark to see just how bad performance can get. It is unnecessary difficult to track down performance issues without pg_stat_statements.
Consider installing pg_stat_statements by following these instructions:
- To shared_preload_libraries (postgresql.conf), add “pg_stat_statements”
- Launch the database.
- Launch the database command “CREATE EXTENSION pg_stat_statements”
Once this is done, we are ready to run our benchmark. Let’s see what happens:
[hs@hansmacbook ~]$ pgbench -c 10 -T 60 -j 10 test pgbench (14.1) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 10 duration: 60 s number of transactions actually processed: 252 latency average = 2446.148 ms initial connection time = 8.833 ms <b>tps = 4.088061</b> (without initial connection time)
We were able to run 4 transactions per second despite opening 10 connections (-c 10) and proving pgbench with 10 threads (-j 10). One might contend that the issue is hardware, but this is untrue:
Model Name: MacBook Pro Model Identifier: MacBookPro16,1 Processor Name: 8-Core Intel Core i9 Processor Speed: 2,3 GHz Number of Processors: 1 Total Number of Cores: 8
It is a contemporary eight core computer. We would have reached our maximum of 40 transactions per second even if the clockspeed were 10 times higher. That still falls far short of what you might anticipate.
pg_stat_user_tables: An important monitoring view for your PostgreSQL indexes
The pg_stat_user_tables database contains the first indication that indexes might be missing. The following table contains the relevant columns:
test=# \d pg_stat_user_tables View "pg_catalog.pg_stat_user_tables" Column | Type ... ---------------------+---------------- … relid | oid... schemaname | name... relname | name... seq_scan | bigint "seq_tup_read" | bigint "idx_scan" | bigint "..."
Here, we can see the table’s name (relname), along with the schemaname. Then, we can see how frequently an index has been used (idx_scan) and how frequently our table has been read sequentially (seq_scan). The most important information is last and is seq_tup_read. What does that imply then? The significance of this number cannot be overstated: If “a lot” is read as “really often,” the seq_tup_read column will contain an illogical entry. That also implies that in order to read a table sequentially and repeatedly, we must process a huge number of rows. In fact, it informs us of the number of rows the system had to handle to fulfill all those sequential scans.
Now, let’s run a really important query:
test=# SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan AS avg FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC; schemaname | relname | seq_scan | seq_tup_read | idx_scan | avg ------------+------------------+----------+--------------+----------+--------- public | pgbench_accounts | 954 | 5050000000 | | 5293501 public | pgbench_branches | 254 | 25400 | 0 | 100 public | pgbench_tellers | 1 | 1000 | 252 | 1000 (3 rows)
This one really works magic. It reveals the tables that have been most frequently hit by sequential scans and the average number of rows that a sequential scan has hit. In the case of our top query, a sequential scan has typically read 5 million rows, with no use of indexes. This shows us unequivocally that there is a problem with this table. A straightforward d will reveal the most glaring issues if you happen to be familiar with the application. In order to confirm our suspicion, let’s look further:
pg_stat_statements: Finding slow queries
As already mentioned, pg_stat_statements are the gold standard for identifying slow queries. Typically, some of the worst pg_stat_statements queries have a high ranking for the tables that are present in pg_stat_user_tables.
The reality will be revealed by the response to this query:
test=# SELECT query, total_exec_time, calls, and mean_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC; -[ RECORD 1 ] ] ---+------------------------------------------------------------------- query | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2 total_exec_time | 433708.0210760001 calls | 252 mean_exec_time | 1721.0635756984136 -[ RECORD 2 ] ] ---+------------------------------------------------------------------- query | SELECT abalance FROM pgbench_accounts WHERE aid = $1 total_exec_time | 174819.2974120001 calls | 252 mean_exec_time | 693.7273706825395 …
Wow, the top query takes 1.721 seconds on average to execute! That is a lot. When you look at the query, you can see that it only has a simple WHERE clause that filters on “aid.” When we look at the table, we see that there is no index on “aid,” which is disastrous for performance.
The exact same issue will be found if the second query is investigated further.
Improve your PostgreSQL indexing and benchmarking
Let’s deploy the index, reset pg_stat_statements, and PostgreSQL’s standard system statistics:
test=# CREATE UNIQUE INDEX idx_accounts ON pgbench_accounts (aid); CREATE INDEX test=# SELECT pg_stat_statements_reset(); pg_stat_statements_reset -------------------------- (1 row) test=# SELECT pg_stat_reset(); pg_stat_reset --------------- (1 row)
We can rerun the test and observe the results after the missing indexes have been deployed:
[hs@hansmacbook ~]$ pgbench -c 10 -T 60 -j 10 test pgbench (14.1) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: simple number of clients: 10 number of threads: 10 duration: 60 s number of transactions actually processed: 713740 latency average = 0.841 ms initial connection time = 7.541 ms tps = 11896.608085 (without initial connection time)
What a progression. The database speed has increased by 3000 fold. We couldn’t have received this kind of improvement from any “better hardware” in the world. The lesson to be learned from this is that even one missing PostgreSQL index in a crucial location can completely wreck a database and keep the entire system busy without producing any useful performance.
The approach we took to the issue is what is crucial to keep in mind. An excellent indicator to use when determining where to look for issues is pg_stat_user_tables. After that, you can search through pg_stat_statements for the worst queries. The key is to sort by total_exec_time DESC.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How Enteros Transforms Healthcare Cost Management with AI Financial Intelligence and Database Performance Optimization
- 4 December 2025
- Database Performance Management
Introduction The healthcare sector is facing unprecedented financial and operational pressure. As medical organizations modernize their IT environments—embracing AI-driven diagnostics, telemedicine platforms, electronic health record (EHR) systems, imaging repositories, and cloud-native applications—the cost of operating these digital workloads continues to surge. At the same time, inefficiencies within databases, data pipelines, clinical software platforms, and analytics … Continue reading “How Enteros Transforms Healthcare Cost Management with AI Financial Intelligence and Database Performance Optimization”
Optimizing Retail Digital Operations: Enteros AI Platform for Accurate Cost Estimation and Superior Performance Management
Introduction The retail sector is undergoing one of the fastest digital transformations in history. From omnichannel commerce and predictive analytics to inventory automation and personalized customer experiences, today’s retail enterprises depend on complex, high-volume digital systems. These systems—spanning eCommerce platforms, databases, cloud services, POS solutions, and logistics software—process massive real-time workloads that directly influence customer … Continue reading “Optimizing Retail Digital Operations: Enteros AI Platform for Accurate Cost Estimation and Superior Performance Management”
How Technology Teams Improve Performance Management with Enteros’ AIOps and AI-First Operations Framework
- 3 December 2025
- Database Performance Management
Introduction The technology sector is undergoing a rapid transformation as cloud-native architectures, SaaS ecosystems, and real-time data systems redefine how organizations operate. Yet with this digital acceleration comes an overwhelming surge in complexity — distributed microservices, multi-cloud deployments, AI-augmented workflows, and massive data pipelines that demand precision, speed, and resilience. To navigate this complexity, enterprises … Continue reading “How Technology Teams Improve Performance Management with Enteros’ AIOps and AI-First Operations Framework”
The Future of Healthcare Databases: Enteros’ GenAI and AI Performance Management at Work
Introduction The healthcare sector is undergoing a digital revolution unlike anything seen before. From AI-assisted diagnostics and precision medicine to telehealth platforms and clinical research systems, today’s healthcare organizations rely heavily on massive data ecosystems. Databases power everything — electronic health records (EHRs), patient management systems, revenue cycle applications, insurance claim platforms, imaging archives, and … Continue reading “The Future of Healthcare Databases: Enteros’ GenAI and AI Performance Management at Work”