You're doing it wrong if you're not monitoring your resource pools.

If You’re Not Monitoring Your Resource Pools, You’re Doing It Wrong

Developing production-ready software in today’s world entails much more than just adding functionality. It’s only half the struggle to create a “functionally complete” software system. Systems must be designed to considerably higher standards to compete in today’s market; gone are the days of deploying software as soon as it passes your QA team’s functional validation.

You must be ready to deal with third-party dependence failures malicious users, scale your system as you add customers, and meet your dependability service-level goals (SLOs), indicators (SLIs), and agreements, among other things (SLAs).

Monitoring is, of course, an essential aspect of reliability. You’ll only know something is wrong if customers call—or tweet—to complain if you don’t have visibility into the health of your system (which is terrible). And the only way you’ll figure out what’s wrong is to stumble around aimlessly (which is very bad).

But how do you know what you need to monitor when reliability experts advise you that you need to monitor the health of your systems? Throughput? How long does it take for you to respond? Latency? These are the most obvious options, and while they can often signal when you have a problem, they don’t tell you much about what’s causing the issue.

You need to take a look at your pool of resources.

Any non-trivial software system will have pools of resources ready to handle requests as they come in. A collection of database connections is required to communicate with a database. A pool of threads is required to process tasks from a queue. The work queue is a pool, although one that fills rather than drains. (Keep in mind that a single “non-pooled” connection is practically the same as a single connection in a pool.)

A collection of resource pools underpins all streaming systems, which are made up of any number of services. Even if your benefit, such as a primary windowing data aggregator, doesn’t interact with databases or make any external requests, reading and writing to your message broker require several threads and buffers.

The same is true for HTTP services. Request queue, for example, is a pool of requests waiting to be handled by a collection of request threads in an ASP.NET application running on Microsoft Internet Information Services (IIS).

The sizes of resource pools are simple calculate, and this information might be helpful. Symptoms will inevitably appear in one or more of your resource pools when something is wrong with your system.

The agent state downsampler is being monitored.

The agent state downsampler is a basic Apache Kafka service that minimises the amount of data traveling to our downstream consumers from the language agents our clients have installed in their apps. It receives a large stream of agent metadata but only sends out one message per hour per agent. It keeps track of which agents have already received a message in the last hour using Memcached.

So, how are we going to keep track of this? Let’s start with the apparent aspects, such as throughput, processing time, and lag.

This appears to be some helpful information. But what will happen to these graphs if the down sampler begins to lag? Throughput will drop, while processing time and lag will increase. That’s fantastic, but what happens next? This data can’t tell us anything other than “something’s wrong” on its own, which is lovely for alerting reasons but doesn’t help us figure out what’s causing the issue. We need to dig a little deeper.

We can think more critically about the service now that we understand it better. “How full are our queues and buffers, and how busy are our thread pools?” should have been the first question we should be asking whenever something goes wrong.

A small list of latency cases that we may immediately diagnose by monitoring our resource pools is as follows:

Symptoms	Problem	Next Steps
Throughput is down and the Memcached thread pool is fully utilised	Memcached is down/slow	Investigate the health of the Memcached cluster
Throughput is down and the Kafka producer buffer is full	The destination Kafka brokers are down/slow	Investigate the health of the destination Kafka brokers
Throughput is down and the work queue is mostly empty	The source Kafka brokers are down/slow, and the consumer thread isn’t pulling messages fast enough	Investigate the health of the source Kafka cluster
Throughput is up and the Kafka producer buffer is full	An increase in traffic has caused us to hit a bottleneck in the producer	Address the bottleneck (tune the producer, possibly by increasing the buffer) or scale the service

A tried-and-true method for keeping track of resource pools

The first step is to gather information about your resource pools. As previously stated, this is quite simple: create a background thread in your service whose sole purpose is to regularly evaluate each of your resource pools’ size and fullness. ThreadPoolExecutor.getSize() and ThreadPoolExecutor.getActiveCount(), for example, will return the size of a thread pool and the number of active threads.

Using Guava’s AbstractScheduledService and Apache’s HttpClient libraries here’s a basic example:

So that you have good data granularity, you should check the thread pool’s stats somewhat frequently (I recommend once per second).

public class ThreadPoolReporter extends AbstractScheduledService { private final ObjectMapper jsonObjectMapper = new ObjectMapper(); private final ThreadPoolExecutor threadPoolToWatch; private final HttpClient httpClient; public ThreadPoolReporter(final ThreadPoolExecutor threadPoolToWatch, final HttpClient httpClient) { this.threadPoolToWatch = threadPoolToWatch; this.httpClient = httpClient; } @Override protected void runOneIteration() { try { final int poolSize = threadPoolToWatch.getPoolSize(); final int activeTaskCount = threadPoolToWatch.getActiveCount(); final ImmutableMap<String, Object> attributes = ImmutableMap.of("eventType", "ServiceStatus", "timestamp", System.currentTimeMillis(), "poolSize", poolSize, "activeCount", activeCount); final String json = jsonObjectMapper.writeValueAsString(ImmutableList.of(attributes)); final HttpResponse response = sendRequest(json); handleResponse(response); } catch (final Exception e) { NewRelic.noticeError(e); } } private HttpResponse sendRequest(final String json) throws IOException { final HttpPost request = new HttpPost("http://example-api.net"); request.setHeader("X-Insert-Key", "secret key value"); request.setHeader("content-type", "application/json"); request.setHeader("accept-encoding", "compress, gzip"); request.setEntity(new StringEntity(json)); return httpClient.execute(request); } private void handleResponse(final HttpResponse response) throws Exception { try (final InputStream responseStream = response.getEntity().getContent()) { final int statusCode = response.getStatusLine().getStatusCode(); if (statusCode != 200) { final String responseBody = extractResponseBody(responseStream); throw new Exception(String.format("Received HTTP %s response from Insights API. Response body: %s", statusCode, responseBody)); } } } private String extractResponseBody(final InputStream responseStream) throws Exception { try (final InputStreamReader responseReader = new InputStreamReader(responseStream, Charset.defaultCharset())) { return CharStreams.toString(responseReader); } } @Override protected Scheduler scheduler() { return Scheduler.newFixedDelaySchedule(1, 1, TimeUnit.SECONDS); } }

SELECT histogram(activeTaskCount, width: 300, buckets: 30) FROM ServiceStatus SINCE 1 minute ago FACET host LIMIT 100

So that you have good data granularity, you should check the thread pool’s stats somewhat frequently (I recommend once per second).

The information can be analysed as a line graph. However, I like to display my resource pool utilisations as two-dimensional histograms (or heat maps) because it’s easier to spot problems.

Our thread pools are mainly idle during “normal” operations. As you can see, we want to have a lot of headroom for traffic bursts. If the dark squares begin to migrate to the right, it’s a clear indication that something is wrong.

Add monitoring code to each of your resource pools in the same way. If you want to limit the number of events you save, consider integrating the data from each collection into a single Insights event.

Finally, to tie everything together, create an Insights dashboard. Our whole agent state downsampler dashboard is shown below—all it takes is a quick check to see whether anything is wrong with our service or resource pools.

It’s all about taking charge!

Resource pool monitoring has helped every system I’ve worked on, but high-throughput streaming services have benefited the most. We’ve identified a slew of unpleasant problems in record speed.

For example, we recently observed a devastating issue in one of the most high-throughput streaming systems, which caused all processing to stop. It showed out to have been an issue with the Kafka producer’s buffer space, which would have been extremely difficult to diagnose without monitoring. Instead, we could access the service’s dashboards, look at the Kafka producer charts, and see that the buffer was full. Within minutes, we had adjusted the producer with a more significant buffer and were back in business.

Monitoring allows you to prevent problems before they occur. Look for historical trends in your dashboards, not just during occurrences but regularly (once a week, for example). Scale the service before it starts sluggish, and a potential incident occurs if you notice your thread pool consumption slowly increasing.

Enteros

About Enteros

Enteros offers a patented database performance management SaaS platform. It proactively identifies root causes of complex business-impacting database scalability and performance issues across a growing number of RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Using FinOps and AI-Powered Analytics to Eliminate Cloud Database Cost Overruns

28 July 2026
Software Engineering

Introduction Cloud adoption has transformed the way enterprises build, deploy, and scale applications. Organizations across banking, healthcare, retail, telecommunications, manufacturing, and SaaS increasingly rely on cloud databases to support mission-critical workloads, enable digital transformation, and deliver exceptional customer experiences. While cloud platforms provide flexibility and scalability, they also introduce a significant challenge—uncontrolled database costs. Cloud … Continue reading “Using FinOps and AI-Powered Analytics to Eliminate Cloud Database Cost Overruns”

How AIOps and Database Observability Improve SLA Compliance for Enterprise Applications

Database Performance Management

Introduction In today’s digital-first economy, enterprise applications power nearly every critical business process—from online banking and e-commerce platforms to healthcare systems, ERP applications, and customer service portals. Organizations promise customers and stakeholders high availability, rapid response times, and consistent application performance through Service Level Agreements (SLAs). However, meeting strict SLA targets has become increasingly challenging. … Continue reading “How AIOps and Database Observability Improve SLA Compliance for Enterprise Applications”

How to Optimize Hospitality Operations with Enteros Database Software, AI-Powered Analytics, and Database Observability

Database Performance Management

Introduction The hospitality industry has evolved into a highly digital, customer-centric business where every guest interaction depends on fast, reliable, and intelligent technology. Hotels, resorts, casinos, restaurants, vacation rentals, convention centers, and hospitality management companies process millions of transactions every day involving reservations, guest check-ins, room assignments, housekeeping, food and beverage services, loyalty programs, event … Continue reading “How to Optimize Hospitality Operations with Enteros Database Software, AI-Powered Analytics, and Database Observability”

How to Optimize Food and Beverage Manufacturing with Enteros Database Software, Operational Intelligence, and Cloud FinOps

Database Performance Management

Introduction The food and beverage industry operates in a highly competitive, fast-moving, and tightly regulated environment where efficiency, product quality, food safety, and supply chain reliability determine business success. Manufacturers must manage thousands of products, rapidly changing consumer demand, seasonal production cycles, strict regulatory requirements, and global distribution networks while maintaining profitability. Today’s food and … Continue reading “How to Optimize Food and Beverage Manufacturing with Enteros Database Software, Operational Intelligence, and Cloud FinOps”

You’re doing it wrong if you’re not monitoring your resource pools.

If You’re Not Monitoring Your Resource Pools, You’re Doing It Wrong

Enteros

RELATED POSTS

Using FinOps and AI-Powered Analytics to Eliminate Cloud Database Cost Overruns

How AIOps and Database Observability Improve SLA Compliance for Enterprise Applications

How to Optimize Hospitality Operations with Enteros Database Software, AI-Powered Analytics, and Database Observability

How to Optimize Food and Beverage Manufacturing with Enteros Database Software, Operational Intelligence, and Cloud FinOps

If You’re Not Monitoring Your Resource Pools, You’re Doing It Wrong

Enteros

RELATED POSTS

Using FinOps and AI-Powered Analytics to Eliminate Cloud Database Cost Overruns

How AIOps and Database Observability Improve SLA Compliance for Enterprise Applications

How to Optimize Hospitality Operations with Enteros Database Software, AI-Powered Analytics, and Database Observability

How to Optimize Food and Beverage Manufacturing with Enteros Database Software, Operational Intelligence, and Cloud FinOps

🎉 Thank you for subscribing!