Postgresql replication checking

Preamble

Despite the fact that Postgres is a fantastic all-around product with an unbeatable price-performance ratio, there is one area where things could be done a little more effectively: replica validation, or perhaps more broadly, cluster validation.

What’s the issue? There isn’t actually a “pushbutton” way to determine whether a new replica is flawless after it has been constructed. The replica building tool pg_basebackup simply streams over the datafiles and transaction logs and writes them to disk, and that’s it. There are, of course, other tools/approaches, but I suspect that the pg_basebackup tool has 90% of the “market.” Then you typically start your server, check to see if you can connect to it, and possibly run some “select count(*) from my_favorite_table” queries to make sure everything is more-or-less plausible, before grinning with satisfaction and going to get a coffee to work on other pressing issues. But how far did we go to ensure that the cluster was still intact?

What could go wrong with replicas?

What could possibly go wrong when creating a new replica? Naturally, there isn’t much cause for concern if pg_basebackup reports success; it is highly unlikely that something is wrong. But a few things still irritate me. I may be a little paranoid about, but keep reading to find out what I mean.

Starting with hard disks, disk errors should theoretically be a thing of the past, and business-grade disk systems should have error detection built in… But everything in the modern world is so virtualized and abstracted that you hardly even know what’s underneath. Perhaps there is some network storage at fault? Then, it definitely needs drivers for both the hardware and the operating system. And every piece of software has bugs, as we are all aware. Google says that there are usually 15 to 50 errors for every 1000 lines of code that is sent. Therefore, when writing datafiles, it is impossible to completely rule out the possibility of silent data corruption, especially since replicas frequently use new hardware that has not yet been thoroughly tested.

In these circumstances, Postgres’ “data-checksums” flag can be useful, but it must be configured during cluster initialization and has a negligible performance impact, so it isn’t always used.

What else? Postgres makes it possible to create replicas across different operating systems (for instance, master on Ubuntu and replica on CentOS, which is obviously a great feature in and of itself), as well as the potential for minor version differences, such as master on 9.6.2 and replica accidentally on 9.6.0. Furthermore, there is a chance that the replication code will malfunction; I can think of at least one instance from a few years ago in which committed rows showed up on replicas as uncommitted. This is very rare of course but could for sure technically happen again.

Status Quo

In other words, there are a few “ifs,” and the issue is, “What can be done there?” The only option that is 100 percent trustworthy is to perform a SQL dump/restore because, as was previously stated, there is sadly no tool from the Postgres project to perform full verification (and I’m not even sure if there should be one or if it can even be done). However, if you need to switch over the master in a covert manner, this obviously won’t work and could take hours and hours for databases that are 1TB or larger.

Therefore, simply dumping the database is a good compromise there to feel even marginally better before making the switchover! At the very least, this will confirm that there isn’t any silent data corruption. However, depending on your DB-size and hardware, this could also take hours. But there are some tricks to make it go faster—more on that later.

Speeding up dummy SQL dumps on replicas

1. Using “/dev/null” is a great performance booster since we only need to confirm that data can be read out, not actually store the SQL dump. When using the default plain SQL mode of pg_dumpall or pg_dump, it is very simple to:

 pg_dumpall -h /var/run/postgresql >/dev/null

2. Connecting to the local server via the Unix socket rather than the “localhost” TCP stack should already yield a 10-15% speedup.

3. Using several processes to dump the data! When time is of the essence, this can be very helpful for larger databases. Simply set the “jobs” flag to an appropriate value to place a strain on the IO-subsystem. NB! For a number of tasks, the “directory” output format for the pg_dump (-Fd/-format=directory + -f/-file=dirname) is a must, but it also has some drawbacks:

There are of course workarounds, e.g. check out the nullfs FUSE filesystem driver but it’s tedious to say the least. pg_dump “directory” format does not get along with /dev/null as it wants to set up a directory structure.
For our “quick verification” goal, this is not too catastrophic, and we can ignore the initial error by providing the “-no-synchronized-snapshots” flag. This will be fixed in upcoming Postgres 10, but for now, it is not possible to obtain a consistent snapshot when dumping replicas with multiple processes.
Pg_dump only works “per database,” so if your cluster contains a large number of DBs, you may need to use some lines of Bash, for example.

4. Create a quick custom script that takes care of the unfavorable effects of approach number 3 so you don’t have to. It’s a straightforward Python script that simply spawns a predetermined number of worker processes (by default, half of the CPUs) and dumps all tables sequentially to /dev/null; all that’s needed is the Postgres “bindir” and the Unix socket as the host.

NB: If there is activity on the master, make sure the query conflict situation has been resolved before starting the dumping process on the replica as it will fail quickly otherwise! Configuring hot standby feedback or allowing the replica to “fall behind” by setting the maximum standby streaming delay to -1 or a large value are two options.

That’s all, I sincerely hope you followed along and found some food for thought. Feedback is always welcome!

About Enteros

Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Driving Enterprise Efficiency Through AI-Based Database Performance Optimization

12 June 2026
Database Performance Management

Introduction In today’s digital-first economy, enterprises depend heavily on data-driven applications to power everything from customer transactions to real-time analytics and AI workloads. As these systems scale, database performance becomes a critical determinant of business success. Even minor inefficiencies—slow queries, resource contention, or poor scaling strategies—can lead to significant revenue loss, degraded user experience, and … Continue reading “Driving Enterprise Efficiency Through AI-Based Database Performance Optimization”

How Predictive Database Monitoring Improves Application Uptime and Business Continuity

Database Performance Management

In today’s always-on digital economy, application availability is no longer just an IT metric—it is a business imperative. Customers expect seamless digital experiences, employees depend on uninterrupted access to critical systems, and organizations rely on applications to drive revenue, operations, and customer engagement. Whether supporting e-commerce transactions, financial services, healthcare applications, SaaS platforms, or telecommunications … Continue reading “How Predictive Database Monitoring Improves Application Uptime and Business Continuity”

Preventing Database Bottlenecks with Intelligent Workload Analytics and Automation

11 June 2026
Database Performance Management

In today’s digital economy, application performance directly impacts customer satisfaction, operational efficiency, and business growth. Organizations rely on databases to power customer-facing applications, financial transactions, e-commerce platforms, analytics systems, SaaS solutions, and countless other mission-critical services. As enterprises continue to embrace cloud-native architectures, microservices, multi-cloud deployments, and real-time data processing, database workloads have become increasingly … Continue reading “Preventing Database Bottlenecks with Intelligent Workload Analytics and Automation”

The Future of AI-Powered Database Performance Management in Enterprise IT Operations

Database Performance Management

Enterprise IT operations are undergoing a significant transformation. As organizations accelerate digital transformation initiatives, adopt cloud-native architectures, expand multi-cloud deployments, and implement AI-driven business strategies, the complexity of managing database environments continues to grow. Databases have evolved from simple data repositories into mission-critical components that power applications, analytics platforms, customer experiences, and business operations. Modern … Continue reading “The Future of AI-Powered Database Performance Management in Enterprise IT Operations”