What exactly are post-growth bloom indices?

Preamble

Despite the fact that new index techniques are relatively infrequent, a new feature called Bloom indexing quietly appeared with the most recent 9.6 release. Maybe because it was one of the contributed modules (also called extensions) and was used to show off a feature of the new generic WAL interface. This will make it easier to add new features in the future. I naturally glanced at the 9.6 release notes as soon as they were available, spent some time there, and thought, “A new index type, great!” But to be completely honest, the explanations were not very thorough, and I honestly don’t think I fully understood everything that was going on. Because of this, I think its only use is to tell Postgres which search values “do not exist” in a table. But when I spent some time learning more about this new feature, I was pleasantly surprised to find that it has a fully working index method that could be very helpful for everyday problems.

Working principles and when to apply?

The intended use case for Bloom filters and indexes should be explained first. When should we use it? The short answer is when we use many of those columns in seemingly random combinations in queries that use the equality operator (“=”), such as “SELECT * FROM tbl WHERE col1 = 2 AND col9 = “x” AND col17 = 865.” And take note of the emphasis on the phrase “equality operator”; in Bloom’s case, hashes of values are stored and compared mathematically, where range operations are useless.

In essence, Postgres creates a hash for each of your column values and then stores bits of each hash as an index entry along with information about the row’s physical location (as with every other index). The efficiency of this index type really shines when multiple column values are “merged” into a single index entry, creating a signature in our Bloom context. In summary, it can greatly assist you in saving disk space! Thus, instead of 10 separate normal B-tree indexes, you can now have only one Bloom index, which is lossy, meaning that matched values must always be re-checked from the table, but from a probabilistic standpoint it is “good enough” to be useful. Of course, if it’s used and set up correctly, which, if you’ve read this blog, shouldn’t be a problem, However, now that we have given our application some thought and have a hunch about a real use case for Bloom, how does it operate?

Basic usage for bloom

So, in its simplest form, here’s how it works:

 CREATE EXTENSION bloom;  -- make sure “contrib” is installed CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3) WITH (length=80, col1=2, col2=2, col3=4);

So far, the GIN and GST indexes appear to be somewhat familiar. Create the extension, specify that your index uses the Bloom index method, and then start running queries in the hopes that PostgreSQL will use the index. Also, keep in mind that only equality comparisons were supported before and that only integer and text data types are supported right now. As humans aren’t very good at remembering or typing in floating-point numbers and millisecond precision timestamps, this shouldn’t be a deal breaker for human-input queries.

However, what on earth do the extra parameters in the WITH part mean? Evidence suggests:

The index is created with a signature length of 80 bits, with attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could have omitted the length, col1, and col2 specifications since those have the default values.

Here, things start to get hazy. How does it add up exactly? Out of a total of 80 bits, only 22 bits plus 4 bits are used. I looked online for examples or explanations, but I couldn’t find any, so I decided to go for the truth, which meant attacking the extension’s source code.

Going deeper

In the end, this stuff is pretty clever, so after an hour of poking around in the code (while gritting my teeth and feeling a lot of respect for the authors), I think I figured it out. In the process, I also gave the algorithm a nickname that could very well be described as “a wicked semi-random bit-twisting modulo game.” The “aha” moments in the code can be found in the signValue() function in blutils.c, but here is a brief excerpt that demonstrates which signature bits are set for a single column value.

/** Init hash sequence to map our value into bits. the same values in * different columns will be mapped into different bits because of step * above */ hashVal = DatumGetInt32(FunctionCall1(&state->hashFn[attno], value)); mySrand(hashVal ^ myRand()); for (j = 0; j < state->opts.bitSize[attno]; j++) { /* prevent multiple evaluation in SETBIT macro */ nBit = myRand() % (state->opts.bloomLength * SIGNWORDBITS); SETBIT(sign, nBit); }

So, what does it actually tell us? We can see that there is some randomness going on, but luckily, it answers my question about how 22 bits plus 4 bits equals 80 bits. It works by calculating separate bit storage locations for each and every indexed column and column value (two by default, but user-configurable in the WITH part)! This basically makes sure that all 80 bits, which are available by default, are used up in an even and efficient way. Additionally, the column index actually serves as the “seed” for randomness. Normally, from a security perspective, we wouldn’t want to do this, but in this case it makes perfect sense because indexing and index comparisons wouldn’t be predictable with a normal or random seed. In the end, a very sophisticated solution

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Driving Enterprise Efficiency Through AI-Based Database Performance Optimization

12 June 2026
Database Performance Management

Introduction In today’s digital-first economy, enterprises depend heavily on data-driven applications to power everything from customer transactions to real-time analytics and AI workloads. As these systems scale, database performance becomes a critical determinant of business success. Even minor inefficiencies—slow queries, resource contention, or poor scaling strategies—can lead to significant revenue loss, degraded user experience, and … Continue reading “Driving Enterprise Efficiency Through AI-Based Database Performance Optimization”

How Predictive Database Monitoring Improves Application Uptime and Business Continuity

Database Performance Management

In today’s always-on digital economy, application availability is no longer just an IT metric—it is a business imperative. Customers expect seamless digital experiences, employees depend on uninterrupted access to critical systems, and organizations rely on applications to drive revenue, operations, and customer engagement. Whether supporting e-commerce transactions, financial services, healthcare applications, SaaS platforms, or telecommunications … Continue reading “How Predictive Database Monitoring Improves Application Uptime and Business Continuity”

Preventing Database Bottlenecks with Intelligent Workload Analytics and Automation

11 June 2026
Database Performance Management

In today’s digital economy, application performance directly impacts customer satisfaction, operational efficiency, and business growth. Organizations rely on databases to power customer-facing applications, financial transactions, e-commerce platforms, analytics systems, SaaS solutions, and countless other mission-critical services. As enterprises continue to embrace cloud-native architectures, microservices, multi-cloud deployments, and real-time data processing, database workloads have become increasingly … Continue reading “Preventing Database Bottlenecks with Intelligent Workload Analytics and Automation”

The Future of AI-Powered Database Performance Management in Enterprise IT Operations

Database Performance Management

Enterprise IT operations are undergoing a significant transformation. As organizations accelerate digital transformation initiatives, adopt cloud-native architectures, expand multi-cloud deployments, and implement AI-driven business strategies, the complexity of managing database environments continues to grow. Databases have evolved from simple data repositories into mission-critical components that power applications, analytics platforms, customer experiences, and business operations. Modern … Continue reading “The Future of AI-Powered Database Performance Management in Enterprise IT Operations”