Preamble
Alternatives for case-insensitive search
There are three recognized techniques for case-insensitive search in PostgreSQL:
Explicit conversion with lower() or upper()
A query that uses this method would look as follows:
SELECT id, col FROM tab WHERE lower(col) = lower('search string');
This can be done quickly with a B-tree indexlower(col), but has two disadvantages:
- The remedy is put into practice at the application level, so you must modify the query for case-insensitive search
- When a database column has long values, even if only a few characters need to be compared, the entire value must be converted to lower case, which has a negative impact on performance.
Using the citext extension
The extensioncitext provides a data typecitext, which stands for “case-insensitive text”. The table is defined as
CREATE TABLE tab ( col citext, ... );
The question is straightforward:
SELECT id, col FROM tab AND'search string' as col;
That is convenient and easy, but it also has some drawbacks:
- there is no data type
civarchar, so you can only implement that with a check constraint - performance for longer values can also be bad, because
citextinternally callslower(col COLLATE "default")before comparing the values - regular expression matching is not case insensitive, and you have to use the case insensitive operator
~*explicitly
Using case-insensitive ICU collations
If you are using PostgreSQL v12 or higher and PostgreSQL is configured--with-icu, you can define a case-insensitive collation like this:
CREATE COLLATION english_ci ( PROVIDER = icu, -- 'en-US@colStrength=secondary' for old ICU versions LOCALE = 'en-US-u-ks-level2', DETERMINISTIC = FALSE );
The table is described as
CREATE TABLE tab ( col text COLLATE english_ci, ... );
And once more, the question is simple:
SELECT id, col FROM tab WHERE col = 'search string';
Please read my article on ICU collations for more information.
The trouble with pattern matching and case-insensitive collations
The best solution to the issue is case-insensitive collations. There are two issues with this strategy:
- ICU collations are still not compatible with case-insensitive collations, so you still need to explicitly specify the collation for each column definition in PostgreSQL v15 even though you can use ICU collations as the database’s default collation.
- Case-insensitive collations are not compatible with pattern matching.
The first issue is merely annoying; however, the second issue will require further consideration. Here are two instances that illustrate the issue:
SELECT id, long FROM perf_coll WHERE long LIKE "abcd%"; ERROR: nondeterministic collations are not supported for LIKE; SELECT id, long FROM perf_coll WHERE long "abcd"; ERROR: nondeterministic collations are not supported for regular expressions
Why is that not supported?
German soccer’s difficult situation
The ICU documentation explains the challenges with accurate case-insensitive pattern matching. German’s “ß,” which traditionally lacks an upper-case equivalent, serves as a good illustration. Therefore, with good German collations (the GNU C library’s collation is not good in that regard), you will obtain the following result:
Upper FUSSBALL (1 row); SELECT upper('Fußball' COLLATE "de-DE-x-icu");
Now what would be the correct result for the following query in a case-insensitive collation?
SELECT 'Fußball' LIKE 'FUS%';
You could argue that it should be TRUEbecause that’s what you’d get for upper('Fußball') LIKE 'FUS%'. On the other hand,
SELECT lower('FUSSBALL' COLLATE "de-DE-x-icu"); lower ══════════ fussball (1 row)
So you could just as well argue that the result should be FALSE. The ICU library goes with the second solution for simplicity. Either solution would be hard to implement in PostgreSQL, so we have stopped thinking about it for now. Compare this quote from the mailing list:
ICU also provides regexp matching, but not collation-aware, since
character-based patterns don’t play well with the concept of collation.
About a potential collation-aware LIKE, it looks hard to implement,
since the algorithm currently used in like_match.c seems purely
character-based. AFAICS there’s no way to plug calls to usearch_*
functions into it, it would need a separate redesign from scratch.
A semiotic aside
There is no universally accepted authority for correct German. For example, German-speaking Swiss did away with the ß in the twentieth century. On the other hand, the “Rat für deutsche Rechtschreibung” (Committee for German Spelling) introduced an upper-case letter in 2017, but this attempt to simplify the problem has been widely ignored by the German-speaking world. What a ridiculous idea—simplifying German!
The end result is even more confusing than before.
SELECT upper('ß' COLLATE "de-DE-x-icu"), lower('ẞ' COLLATE "de-DE-x-icu"); upper │ lower ═══════╪═══════ SS │ ß (1 row)
A solution for case-insensitive pattern matching
We require a workable answer to the issue. Of course we could use lower() or cast to citext, but that would re-introduce the performance problems for long strings. So we evade the problem by explicitly using a different collation. We cast to the binary collation, which compares words character for character, for performance reasons and to avoid the brain-twist of German soccer:
SELECT id, col FROM tab WHERE col COLLATE "C" ILIKE 'search%pattern'; SELECT id, col FROM tab WHERE col COLLATE "C" ~* '^search.*pattern';
Note that for this to work as expected, we have to use the case-insensitive versions ofLIKE and the regular expression matching operator~.
Obviously, this solution isn’t as easy as we’d like it to be. Again, the application has to write queries that make it clear that case-sensitive processing is to be used. Another difficulty is performance: while case-sensitive pattern matching can be supported with B-tree indexes, case-insensitive pattern matching requires a trigram index:
CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE INDEX tab_col_pattern_idx ON tab USING gin (col gin_trgm_ops);
Although it can grow large and update more slowly than a typical B-tree index, such an index can accelerate both of the aforementioned statements.
A performance test for case-insensitive comparisons
I used tables with ten million rows and a column with a randomly generated string of 320 characters for this test. The column lacked an index. In the tableperf_citext, the column is defined as citext. In perf_lower, the data type is text, and comparison is made using lower(). Table perf_coll uses text with the collation english_ci as defined above. PostgreSQL v15 with US English collations served as the database. Shared buffers were used to cache all tables.
Comparison of the effectiveness of case-insensitive search methods
WHERE ... = 'abcd' |
WHERE ... LIKE 'abcd%' |
WHERE ... COLLATE "C" ILIKE 'abcd%' |
WHERE ... COLLATE "C" ~* 'abcd%'( ~ for lower()) |
|
|---|---|---|---|---|
citext |
540 ms
|
536 ms
|
1675 ms
|
2500 ms
|
lower() |
9000 ms
|
9000 ms
|
3000 ms
|
3800 ms
|
english_ci |
830 ms
|
ERROR
|
2000 ms
|
1940 ms
|
In this test,citext comes out ahead, but case-insensitive collations are a decent runner-up. Using lower() doesn’t perform well.
Conclusion
PostgreSQL doesn’t have pattern matching for collations that don’t care about case, and it’s not clear what the right thing to do would be in this case. We have a good workaround by explicitly using pattern matching operators that don’t care about case and the binary collation, but the situation is far from ideal.
About Enteros
Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of clouds, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
How to Enable Data-Driven Media Growth with Enteros Cost Attribution and Software Management
- 22 June 2026
- Software Engineering
Introduction The media industry is experiencing one of the most significant transformations in its history. Streaming services, digital publishing platforms, online advertising ecosystems, video-on-demand applications, and content distribution networks have fundamentally changed how audiences consume content. Modern media organizations now operate highly complex digital ecosystems that support: Streaming platforms Digital publishing systems Video content delivery … Continue reading “How to Enable Data-Driven Media Growth with Enteros Cost Attribution and Software Management”
How to Enable Intelligent Wealth Management Operations with Enteros Database Software, AIOps Platform, and Gen AI
Introduction The wealth management industry is undergoing a major transformation. As investors demand personalized financial services, real-time portfolio visibility, and digital-first experiences, wealth management firms are increasingly relying on technology to drive operational efficiency, improve client engagement, and accelerate business growth. Modern wealth management organizations now support: Portfolio management platforms Wealth advisory applications Digital client … Continue reading “How to Enable Intelligent Wealth Management Operations with Enteros Database Software, AIOps Platform, and Gen AI”
The Future of Database Observability in Hybrid Cloud Environments
As enterprises accelerate digital transformation, hybrid cloud infrastructure has become the preferred operating model for many organizations. Instead of relying solely on on-premises data centers or fully public cloud deployments, businesses increasingly combine both environments to achieve greater flexibility, scalability, performance, and cost efficiency. Hybrid cloud enables organizations to distribute workloads strategically across private infrastructure … Continue reading “The Future of Database Observability in Hybrid Cloud Environments”
How AI-Powered Database Analytics Improves Digital Customer Experience
In today’s digital-first economy, customer experience has become one of the strongest differentiators for businesses. Whether customers are shopping online, using banking apps, booking travel, streaming media, or accessing SaaS platforms, they expect fast, seamless, and reliable digital interactions at every touchpoint. Modern users have little tolerance for delays. A slow-loading webpage, failed transaction, delayed … Continue reading “How AI-Powered Database Analytics Improves Digital Customer Experience”