Preamble
Despite the fact that new index techniques are relatively infrequent, a new feature called Bloom indexing quietly appeared with the most recent 9.6 release. Maybe because it was one of the contributed modules (also called extensions) and was used to show off a feature of the new generic WAL interface. This will make it easier to add new features in the future. I naturally glanced at the 9.6 release notes as soon as they were available, spent some time there, and thought, “A new index type, great!” But to be completely honest, the explanations were not very thorough, and I honestly don’t think I fully understood everything that was going on. Because of this, I think its only use is to tell Postgres which search values “do not exist” in a table. But when I spent some time learning more about this new feature, I was pleasantly surprised to find that it has a fully working index method that could be very helpful for everyday problems.
Working principles and when to apply?
The intended use case for Bloom filters and indexes should be explained first. When should we use it? The short answer is when we use many of those columns in seemingly random combinations in queries that use the equality operator (“=”), such as “SELECT * FROM tbl WHERE col1 = 2 AND col9 = “x” AND col17 = 865.” And take note of the emphasis on the phrase “equality operator”; in Bloom’s case, hashes of values are stored and compared mathematically, where range operations are useless.
In essence, Postgres creates a hash for each of your column values and then stores bits of each hash as an index entry along with information about the row’s physical location (as with every other index). The efficiency of this index type really shines when multiple column values are “merged” into a single index entry, creating a signature in our Bloom context. In summary, it can greatly assist you in saving disk space! Thus, instead of 10 separate normal B-tree indexes, you can now have only one Bloom index, which is lossy, meaning that matched values must always be re-checked from the table, but from a probabilistic standpoint it is “good enough” to be useful. Of course, if it’s used and set up correctly, which, if you’ve read this blog, shouldn’t be a problem, However, now that we have given our application some thought and have a hunch about a real use case for Bloom, how does it operate?
Basic usage for bloom
So, in its simplest form, here’s how it works:
CREATE EXTENSION bloom; -- make sure “contrib” is installed CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3) WITH (length=80, col1=2, col2=2, col3=4);
So far, the GIN and GST indexes appear to be somewhat familiar. Create the extension, specify that your index uses the Bloom index method, and then start running queries in the hopes that PostgreSQL will use the index. Also, keep in mind that only equality comparisons were supported before and that only integer and text data types are supported right now. As humans aren’t very good at remembering or typing in floating-point numbers and millisecond precision timestamps, this shouldn’t be a deal breaker for human-input queries.
However, what on earth do the extra parameters in the WITH part mean? Evidence suggests:
The index is created with a signature length of 80 bits, with attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could have omitted the length, col1, and col2 specifications since those have the default values.
Here, things start to get hazy. How does it add up exactly? Out of a total of 80 bits, only 22 bits plus 4 bits are used. I looked online for examples or explanations, but I couldn’t find any, so I decided to go for the truth, which meant attacking the extension’s source code.
Going deeper
In the end, this stuff is pretty clever, so after an hour of poking around in the code (while gritting my teeth and feeling a lot of respect for the authors), I think I figured it out. In the process, I also gave the algorithm a nickname that could very well be described as “a wicked semi-random bit-twisting modulo game.” The “aha” moments in the code can be found in the signValue() function in blutils.c, but here is a brief excerpt that demonstrates which signature bits are set for a single column value.
/** Init hash sequence to map our value into bits. the same values in * different columns will be mapped into different bits because of step * above */ hashVal = DatumGetInt32(FunctionCall1(&state->hashFn[attno], value)); mySrand(hashVal ^ myRand()); for (j = 0; j < state->opts.bitSize[attno]; j++) { /* prevent multiple evaluation in SETBIT macro */ nBit = myRand() % (state->opts.bloomLength * SIGNWORDBITS); SETBIT(sign, nBit); }
So, what does it actually tell us? We can see that there is some randomness going on, but luckily, it answers my question about how 22 bits plus 4 bits equals 80 bits. It works by calculating separate bit storage locations for each and every indexed column and column value (two by default, but user-configurable in the WITH part)! This basically makes sure that all 80 bits, which are available by default, are used up in an even and efficient way. Additionally, the column index actually serves as the “seed” for randomness. Normally, from a security perspective, we wouldn’t want to do this, but in this case it makes perfect sense because indexing and index comparisons wouldn’t be predictable with a normal or random seed. In the end, a very sophisticated solution
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
The Future of Financial RevOps: Enteros’ AIOps-Powered Framework for Precision Cost Estimation
- 8 December 2025
- Database Performance Management
Introduction The financial sector is undergoing a massive transformation driven by digital acceleration, regulatory pressure, cloud migration, AI adoption, and rising customer expectations. Banks, insurance companies, fintechs, and wealth management firms now operate in a hyper-competitive landscape where agility, accuracy, and operational efficiency determine long-term success. Within this environment, Revenue Operations (RevOps) has emerged as … Continue reading “The Future of Financial RevOps: Enteros’ AIOps-Powered Framework for Precision Cost Estimation”
What Technology Teams Gain from Enteros’ GenAI-Driven Database Performance and Cloud FinOps Intelligence
Introduction The technology sector is entering a new era—one where rapid innovation, distributed architectures, and cloud-native systems fuel unprecedented digital acceleration. Yet behind this momentum sits a challenge that every CTO, DevOps leader, and cloud architect knows all too well: how do you maintain high performance, manage cost efficiency, and ensure seamless database reliability across … Continue reading “What Technology Teams Gain from Enteros’ GenAI-Driven Database Performance and Cloud FinOps Intelligence”
What Retail Tech Teams Gain from Enteros’ AI-Driven Cost Estimation and Database Optimization Platform
- 7 December 2025
- Database Performance Management
Introduction The retail industry is undergoing one of the most aggressive digital evolutions in history. From omnichannel customer experiences and real-time inventory management to personalization engines and AI-driven demand forecasting, today’s retail IT environments are powered by complex, high-volume databases and cloud ecosystems. Behind every transaction, search query, delivery update, and loyalty personalization lies a … Continue reading “What Retail Tech Teams Gain from Enteros’ AI-Driven Cost Estimation and Database Optimization Platform”
How Enteros Transforms Banking IT: Database Optimization Powered by Cloud FinOps and RevOps Intelligence
Introduction The banking sector is undergoing rapid digital modernization. Customers expect real-time transactions, instant approvals, personalized insights, mobile-first experiences, and zero downtime. At the core of this digital revolution lies one essential asset: data. Modern banks now operate massive volumes of structured and unstructured data across core banking systems, digital payments, fraud detection engines, credit … Continue reading “How Enteros Transforms Banking IT: Database Optimization Powered by Cloud FinOps and RevOps Intelligence”