Article
A query about how to fork PostgreSQL databases like you can on Heroku was sent to me. Since I could not find any good examples of how to do this, I made the decision to write a brief article.
Making a copy of a database so you can make changes to it without affecting the original source database is known as “forking” a database. In this case, the easiest thing to do is make a copy of the database, which is also called a backup, and start a new database instance on it. I won’t linger on this point because there are enough articles and documents that cover backup procedures. The obvious solution has a performance problem because copying a lot of data takes a long time. We can’t afford to have to wait several hours every time we use database forks to update our staging or testing environments (depending on whether or not they need to be kept secret).
We must find a way to copy everything without actually making copies of it. The good news is that you can accomplish this with your storage layer’s help. The technical term for what we’re after is a copy-on-write snapshot. Specialized storage units and the Linux LVM layer, Btrfs, and ZFS filesystems all do this in software.
Btrfs is the easiest to set up for this demonstration, and it’s also what I just so happened to have on hand. I have a 99% filled up RAID1 Btrfs filesystem mounted at /mnt/data, backed by three Western Digital Green spinning disks (<6000RPM). To put it another way, a storage system as slow as you can make it.
I first created a database on which to run our tests. I make a fresh PostgreSQL database, start it up on port 6000, and create a fresh subvolume for the master database.
/mnt/data/dbforking$ btrfs subvolume create master Create subvolume './master /mnt/data/dbforking$ initdb master/ ... snipped /mnt/data/dbforking $ sed -i 's/# port =.*/port = 6000/' master/postgresql.conf /mnt/data/dbforking $ pg_ctl -D master/ -l master/postgres.log start server starting /mnt/data/dbforking $ createdb -p 6000 /mnt/data/dbforking $ psql -p 6000 -c "Select "Hello, World"?" column? ------------- hello world (1 row)
Let’s create some fake data on the master server now that it is operational. For this, I’ll use pgBench.
/mnt/data/dbforking$ time pgbench -p 6000 -i -s 1000 ... few minutes of progress reports go by 100000000 of 100000000 tuples (100%) done (elapsed 173.58 s, remaining 0.00 s). vacuum... set primary keys... done. real 8m35.011s user 0m21.746s sys 0m0.739s /mnt/data/dbforking /mnt/data/dbforking $ du -sh master 15G master/
I currently have a master database that contains 15 GB of data. Let’s update some master data as well so we can keep track of our forks. We will set thefiller column on a row in pgbench_branches table to do this.
/mnt/data/dbforking$ psql -p 6000 -c "UPDATE pgbench_branches" Set filler = "master before forking" WHERE bid = 1" UPDATE 1 /mnt/data/dbforking$ psql -p 6000 -c " SELECT * FROM pgbench_branches WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | 0 | master before forking (1 row)
Normal PostgreSQL hot backup procedure is to start your backup with pg_start_backup(), copy database contents over, end backup with pg_stop_backup() and then copy xlogs over. However, Btrfs supports atomic snapshots. With atomic snapshots, we can quickly create a snapshot with the exact same contents as a regular copy that would have been made if PostgreSQL had been SIGKILLed at that time. PostgreSQL’s mechanisms for durability make sure that we will get a consistent state that only shows the successful commits up to that point. In our situation, the backup management commands are not necessary.
To add some spice, I will launch a workload on the master in a separate shell to show that we can do this on a production database with ease. It should be possible to maintain a steady load of 50 transactions per second with 16 clients. 50 transactions per second is slow, but it’s a big challenge for the slow storage system we’re using here.
/mnt/data/dbforking$ pgbench -p 6000 --rate=50 --client=16 --progress=5 --time=1200 starting vacuum... end. progress: 5.0 s, 44.6 tps, lat 129.484 ms, stddev 30.302, lag 0.304 ms.
Let’s fork the database now that we are prepared to do so:
/mnt/data/dbforking$ time btrfs subvolume snapshot master fork1 Create a snapshot of "master" in "./fork1". real 0m6.295s user 0m0.000s sys 0m0.202s /mnt/data/dbforking $ du -sh fork1/ 15G fork1/
And these are the performance metrics from that period:
progress: 30.0 s, 54.5 tps, lat 134.819 ms stddev 34.012, lag 0.500 ms progress: 35.1 s, 44.4 tps, lat 199.910 ms stddev 75.235, lag 3.243 ms progress: 40.1 s, 40.5 tps, lat 1281.642 ms stddev 791.303, lag 970.009 ms progress: 45.0 s, 62.1 tps, lat 349.229 ms stddev 313.576, lag 145.631 ms progress: 50.0 s, 50.0 tps, lat 146.155 ms stddev 45.599, lag 2.543 ms progress: 55.1 s, 53.0 tps, lat 146.554 ms stddev 40.694, lag 0.562 ms
So, we were able to split a 15 GB database in 6 seconds with only a small hiccup in performance. The forked database can now be launched.
We must do a few things before the fork can begin. The master’s pid file must first be deleted. PostgreSQL can usually remove it by itself, but since the master is still running on the same machine, it doesn’t know if it’s safe to do so in this case, so we have to do it ourselves. Second, the fork needs to be set up to use a different port from the master. When this is finished, we can restart the server and see that it recovers successfully from a crash.
/mnt/data/dbforking$ rm fork1/postmaster.pid /mnt/data/dbforking $ sed -i "s/port =. */port = 6001/" fork1/postgresql.conf /mnt/data/dbforking /mnt/data/dbforking $ pg_ctl -D fork1 -l fork1/postgres.log start /mnt/data/dbforking $ tail fork1/postgres.log LOG: The database system was interrupted; it was last known to be up at 2015-01-09 14:47:29 EET. LOG: database system was not properly shut down; automatic recovery in progress LOG: Undo starts at 0/7086E90 LOG: record with zero length at 0/8BDD888 LOG: redo done at 0/8BDD858 LOG: last completed transaction was at log time 2015-01-09 14:51:44.56637+02 Log: The database system is ready to accept connections. LOG: autovacuum launcher started
Next, we can confirm that we are actually using two different databases. Change the tracking row to ensure that it is distinct in both the fork database and the master database.
Let’s update the database that was forked:
/mnt/data/dbforking$ psql -p 6001 -c "UPDATE pgbench_branches" SET filler = "fork1 after forking" WHERE bid = 1" UPDATE 1 /mnt/data/dbforking$ psql -p 6001 -c " SELECT * FROM pgbench_branches WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | 1025 | fork1 after forking (1 row)
And check that the master database still has the old data (except for thebbalance column that pgbench has updated):
/mnt/data/dbforking$ psql -p 6000 -c "SELECT * FROM pgbench_branches" WHERE bid = "1" bid | bbalance | filler -----+----------+------------------------------------------------------------------------------------------ 1 | -33546 | master before forking (1 row)
With this, we have a plan for how to fork a production database without any major problems. On the same machine as the master, you probably don’t want to run your staging tests or your development environment. To copy new databases from the streaming standby and put them on different machines, you would need a streaming replication standby running in your staging or development environment. Just make sure to replace or delete the recovery.conf file. If you don’t, your snapshot will keep running in standby mode.
Additionally, it’s a good idea to move the fork to a new timeline so PostgreSQL is aware of the fork and can alert you to mistakes like having the fork replicate data from master (or vice versa) if necessary. To do this create a recovery.conf that contains the single line restore_command = '/bin/false'. This will switch PostgreSQL to point-in-time recovery mode (as opposed to regular crash recovery), creating a timeline switch at the end of the transaction log./bin/false Is there any sign that there is no archive where more transaction logs can be found?
About Enteros
Enteros offers a patented database performance management SaaS platform. It finds the root causes of complex database scalability and performance problems that affect business across a growing number of cloud, RDBMS, NoSQL, and machine learning database platforms.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
From Cloud Costs to Competitive Advantage: AI-Driven Retail Growth with Enteros
- 6 January 2026
- Database Performance Management
Introduction The retail industry is operating in one of the most competitive and digitally complex environments in its history. Omnichannel commerce, real-time inventory visibility, personalized customer journeys, AI-driven recommendations, dynamic pricing engines, and always-on digital storefronts have become table stakes for modern retailers. Behind every seamless customer experience lies a massive and ever-expanding technology backbone—cloud … Continue reading “From Cloud Costs to Competitive Advantage: AI-Driven Retail Growth with Enteros”
AI-Driven Performance Management Across Financial and Retail Platforms with Enteros
Introduction Financial services and retail enterprises are operating at the center of the digital economy. Real-time payments, mobile banking, eCommerce transactions, omnichannel retail, loyalty platforms, fraud detection, personalization engines, and digital marketplaces all rely on complex, always-on technology ecosystems. While the business models of financial institutions and retailers differ, their technology challenges are increasingly similar. … Continue reading “AI-Driven Performance Management Across Financial and Retail Platforms with Enteros”
Improving Patient Access While Controlling Digital Costs: Enteros’ Website Optimization Platform for Healthcare
- 5 January 2026
- Database Performance Management
Introduction Healthcare delivery has become deeply digital. From hospital websites and patient portals to telehealth platforms, appointment scheduling systems, online billing, and digital front doors, healthcare organizations increasingly rely on web-based experiences to connect patients with care. For patients, these digital touchpoints are no longer optional—they are essential. Patients expect fast-loading websites, seamless navigation, secure … Continue reading “Improving Patient Access While Controlling Digital Costs: Enteros’ Website Optimization Platform for Healthcare”
The New Economics of Entertainment Platforms: Enteros’ Cloud FinOps and AIOps-Driven Database Optimization
Introduction The entertainment industry is experiencing a profound digital transformation. Streaming platforms, OTT services, gaming ecosystems, live events, digital content marketplaces, and immersive experiences now operate on massive, always-on technology infrastructures. Millions of users expect seamless playback, real-time interaction, personalized recommendations, and uninterrupted access—regardless of geography or device. Behind every stream, game session, recommendation, and … Continue reading “The New Economics of Entertainment Platforms: Enteros’ Cloud FinOps and AIOps-Driven Database Optimization”