Preamble
Of course, there is a need for the OR operator in SQL, so if you can’t avoid it, use it. But you should be mindful of the effects on performance.
I’ll discuss “good” and “bad” ORs in this article, as well as what you can do to prevent the latter.
A little sample schema
We’ll illustrate using this simple setup:
CREATE TABLE a(id integer NOT NULL, a_val text NOT NULL); INSERT INTO a SELECT i, md5(i::text) FROM generate_series(1, 100000) i; CREATE TABLE b(id integer NOT NULL, b_val text NOT NULL); INSERT INTO b SELECT i, md5(i::text) FROM generate_series(1, 100000) i; ALTER TABLE a ADD PRIMARY KEY (id); ALTER TABLE b ADD PRIMARY KEY (id); ALTER TABLE b ADD FOREIGN KEY (id) REFERENCES a; VACUUM (ANALYZE) a; VACUUM (ANALYZE) b;
If, for example, we want to run queries on the text columns with equality and LIKE conditions, we need some indexes:
CREATE INDEX a_val_idx ON a(a_val text_pattern_ops); CREATE INDEX b_val_idx ON b(b_val text_pattern_ops);
If you don’t understand text pattern ops, take a look at the documentation.
The “good” OR
Most of the time, using an OR in a SQL query is OK because it has no detrimental effects on query performance as long as it is not used to filter out rows from the query result.
Therefore, don’t be concerned if your OR occurs in a CASE statement in the SELECT list.
Unfortunately, the WHERE clause is where you typically locate the OR and that hurts.
The “bad” OR
Here is an OR in a WHERE clause example that is still rather nice:
EXPLAIN (COSTS off) SELECT id FROM a WHERE id = 42 OR a_val = 'value 42'; QUERY PLAN ----------------------------------------------------------- Bitmap Heap Scan on a Recheck Cond: ((id = 42) OR (a_val = 'value 42'::text)) -> BitmapOr -> Bitmap Index Scan on a_pkey Index Cond: (id = 42) -> Bitmap Index Scan on a_val_idx Index Cond: (a_val = 'value 42'::text) (7 rows)
Because PostgreSQL can use a “bitmap OR” to combine the bitmaps for both indexes, it can use an index scan to answer the query.
But keep in mind that since the bitmap needs to be built, a bitmap index scan costs more than a regular index scan. Additionally, each of these bitmaps might require up to a gigabyte of main memory, which means it consumes a lot more RAM.
There is no less expensive way to perform this query because a multi-column index on (id, val) won’t help at all.
IN is better than OR
Now for a more stupid variant of the above query:
EXPLAIN (COSTS off) SELECT id FROM a WHERE id = 42 OR id = 4711; QUERY PLAN -------------------------------------------- Bitmap Heap Scan on a Recheck Cond: ((id = 42) OR (id = 4711)) -> BitmapOr -> Bitmap Index Scan on a_pkey Index Cond: (id = 42) -> Bitmap Index Scan on a_pkey Index Cond: (id = 4711) (7 rows)
Again, a bitmap index scan is used. But there is a simple method to rewrite that query without the pesky OR
:
EXPLAIN (COSTS off) SELECT id FROM a WHERE id IN (42, 4711); QUERY PLAN --------------------------------------------------- Index Only Scan using a_pkey on a Index Cond: (id = ANY ('{42,4711}'::integer[])) (2 rows)
You see? As soon as you get rid of the OR
, an efficient index scan can be used!
You could argue that this is advantageous for equality circumstances, but how about the following issue:
SELECT id FROM a WHERE a_val LIKE 'something%' OR a_val LIKE 'other%';
To improve that query, observe that the PostgreSQL optimizer rewrote the IN
in the previous query to = ANY
.
This is a case of the standard SQL “quantified comparison predicate”: <comparison operator> ANY
is true if the comparison is TRUE
for any of the values on the right-hand side (the standard only defines this for subqueries on the right-hand side, but PostgreSQL extends the syntax to arrays).
Now LIKE
is a comparison operator as well, so we can write:
EXPLAIN (COSTS off) SELECT id FROM a WHERE a_val LIKE ANY (ARRAY['something%', 'other%']); QUERY PLAN ---------------------------------------------------------- Seq Scan on a Filter: (a_val ~~ ANY ('{something%,other%}'::text[])) (2 rows)
Unfortunately, this situation precludes using the index.
pg_trgm to the rescue
Though not yet, we still have some more ideas! There are so many different indexes in PostgreSQL; let’s try one of them. For this, we need the pg_trgm
extension:
CREATE EXTENSION pg_trgm;
Then we can create a GIN trigram index on the column:
CREATE INDEX a_val_trgm_idx ON a USING gin (a_val gin_trgm_ops);
Now things are looking better:
EXPLAIN (COSTS off) SELECT id FROM a WHERE a_val LIKE ANY (ARRAY['something%', 'other%']); QUERY PLAN -------------------------------------------------------------------- Bitmap Heap Scan on a Recheck Cond: (a_val ~~ ANY ('{something%,other%}'::text[])) -> Bitmap Index Scan on a_val_trgm_idx Index Cond: (a_val ~~ ANY ('{something%,other%}'::text[])) (4 rows)
Trigram indices are extremely powerful!
Note 1: If the search pattern begins with, this index can also be used.%
2. The GIN index has the potential to grow significantly. You can also use a GiST index, which is much smaller but less effective for searching, to avoid that.
The “ugly” OR
Things become really bad if OR
combines conditions from different tables:
EXPLAIN (COSTS off) SELECT id, a.a_val, b.b_val FROM a JOIN b USING (id) WHERE a.id = 42 OR b.id = 42; QUERY PLAN --------------------------------------------- Merge Join Merge Cond: (a.id = b.id) Join Filter: ((a.id = 42) OR (b.id = 42)) -> Index Scan using a_pkey on a -> Index Scan using b_pkey on b (5 rows)
Here, we have to first do a full join between the two tables and then filter out any rows that don’t meet the criteria. In our example, that would mean adding up 100,000 rows and getting rid of the other 99,999 because they don’t meet the criteria.
Avoiding the ugly OR
Thankfully, there is an equivalent query that takes more time to write but costs a lot less to run:
EXPLAIN (COSTS off) SELECT id, a.a_val, b.b_val FROM a JOIN b USING (id) WHERE a.id = 42 UNION SELECT id, a.a_val, b.b_val FROM a JOIN b USING (id) WHERE b.id = 42; QUERY PLAN ---------------------------------------------------------- Unique -> Sort Sort Key: a.id, a.a_val, b.b_val -> Append -> Nested Loop -> Index Scan using a_pkey on a Index Cond: (id = 42) -> Index Scan using b_pkey on b Index Cond: (id = 42) -> Nested Loop -> Index Scan using a_pkey on a a_1 Index Cond: (id = 42) -> Index Scan using b_pkey on b b_1 Index Cond: (id = 42) (14 rows)
Since the rows are identical, UNION will combine them into a single record. Both sections of the query can employ effective index scans to return one row.
If you are sure that both parts of the query will return different sets, you should use UNION ALL instead of UNION. This will save you from having to do extra work to get rid of duplicates.
When you use this method, you should be aware that it doesn’t always give you the same query because the UNION would get rid of any identical results that the original query could have given you. Since the primary keys were returned in the query result, we don’t need to be concerned in our instance. In my experience, this rarely becomes an issue in real life.
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Harnessing Generative AI for Smarter Database Performance Management in the BFSI Sector—Powered by Enteros
- 18 September 2025
- Database Performance Management
Introduction The Banking, Financial Services, and Insurance (BFSI) sector is the backbone of the global economy. With millions of transactions occurring every second, the industry relies heavily on the ability to store, process, and analyze massive volumes of data. From real-time fraud detection and credit risk assessments to claims processing and regulatory compliance, databases play … Continue reading “Harnessing Generative AI for Smarter Database Performance Management in the BFSI Sector—Powered by Enteros”
Driving RevOps Efficiency in the Healthcare Sector with Enteros: AIops-Powered Database Performance Optimization
Introduction The healthcare sector is under immense pressure to modernize operations while delivering high-quality, cost-effective care. Hospitals, research institutions, and pharmaceutical companies are generating massive amounts of data from electronic health records (EHRs), diagnostic imaging, genomic sequencing, clinical trials, IoT-enabled medical devices, and insurance claim systems. Managing and optimizing these vast databases is critical not … Continue reading “Driving RevOps Efficiency in the Healthcare Sector with Enteros: AIops-Powered Database Performance Optimization”
Black Friday e-commerce crashes from DB latency
Introduction Black Friday is the biggest day of the year for e-commerce. Shoppers flood online stores, hunting for deals, and businesses prepare for record-breaking traffic. But too often, excitement turns into frustration as websites freeze, checkouts fail, and carts vanish. Behind the scenes, it’s not just the servers struggling—it’s the databases. When databases can’t keep … Continue reading “Black Friday e-commerce crashes from DB latency”
Space research simulations collapsing from DB overload
Introduction Space research depends on simulations that push technology to its limits. From modeling rocket launches to predicting orbital dynamics, these simulations generate massive streams of data. But increasingly, the bottleneck isn’t computing power—it’s the databases that store and process this information. When databases fail, simulations stall, research timelines slip, and millions in funding are … Continue reading “Space research simulations collapsing from DB overload”