🚀 About
- The project serves as a research platform to explore streaming WAL archiving with a target of RPO=0 during recovery.
- It’s primarily designed for use in containerized environments.
- The utility replicates all key features of pg_receivewal, including automatic reconnection on connection loss, streaming into partial files, extensive error checking and more.
- The tool is easy to install as a single binary and simple to debug – just use your preferred editor and a Docker container running PostgreSQL.
🔐 Features
- ✅ Streaming WAL archiving with replication slots
- ✅ Safe .partialfile handling (every server message is ‘fsynced’)
- ✅ S3/SFTP backends with optional GZIP/ZSTD compression + AES-GCM encryption
- ✅ Built-in HTTP server for serving WALs + metrics and alerting (planned)
- ✅ Minimal and composable configuration
- ✅ Fully testable with Docker-based integration tests
🛠️ Usage
Receive mode (the main loop of the WAL receiver)
cat <<EOF >config.yml
main:
  listen_port: 7070
  directory: wals
receiver:
  slot: pgrwl_v5
log:
  level: trace
  format: text
  add_source: true
EOF
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres
export PGRWL_MODE=receive
pgrwl -c config.yml
Serve mode (used during restore to serve archived WAL files from storage)
cat <<EOF >config.yml
main:
  listen_port: 7070
  directory: wals
log:
  level: trace
  format: text
  add_source: true
EOF
export PGRWL_MODE=serve
pgrwl -c config.yml
restore_command example for postgresql.conf
# where 'k8s-worker5:30266' represents the host and port 
# of a 'pgrwl' instance running in 'serve' mode. 
restore_command = 'pgrwl restore-command --serve-addr=k8s-worker5:30266 %f %p'
⭐ See also: examples (step-by-step archive and recovery), and k8s (basic setup)
⚙️ Configuration Reference
The configuration file is in JSON or YML format (*.json is preferred).It supports environment variable placeholders like ${PGRWL_SECRET_ACCESS_KEY}.
main:                                    # Required for both modes: 'receive' / 'serve'
  listen_port: 7070                      # HTTP server port (used for management)
  directory: "/var/lib/pgwal"            # Base directory for storing WAL files
receiver:                                # Required for 'receive' mode
  slot: replication_slot                 # Replication slot to use
  no_loop: false                         # If true, do not loop on connection loss
uploader:                                # Optional (used in receive mode)
  sync_interval: 10s                     # Interval for the upload worker to check for new files
  max_concurrency: 4                     # Maximum number of files to upload concurrently
log:                                     # Optional
  level: info                            # One of: trace / debug / info / warn / error
  format: text                           # One of: text / json
  add_source: true                       # Include file:line in log messages (for local development)
storage:                                 # Optional
  name: s3                               # One of: s3 / sftp
  compression:                           # Optional
    algo: gzip                           # One of: gzip / zstd
  encryption:                            # Optional
    algo: aesgcm                         # One of: aes-256-gcm
    pass: "${PGRWL_ENCRYPT_PASSWD}"      # Encryption password (from env)
  sftp:                                  # Required section for 'sftp' storage
    host: sftp.example.com               # SFTP server hostname
    port: 22                             # SFTP server port
    user: backupuser                     # SFTP username
    pass: "${PGRWL_VM_PASSWORD}"         # SFTP password (from env)
    pkey_path: "/home/user/.ssh/id_rsa"  # Path to SSH private key (optional)
    pkey_pass: "${PGRWL_SSH_PKEY_PASS}"  # Required if the private key is password-protected
  s3:                                    # Required section for 's3' storage
    url: https://s3.example.com          # S3-compatible endpoint URL
    access_key_id: AKIAEXAMPLE           # AWS access key ID
    secret_access_key: "${PGRWL_AWS_SK}" # AWS secret access key (from env)
    bucket: postgres-backups             # Target S3 bucket name
    region: us-east-1                    # S3 region
    use_path_style: true                 # Use path-style URLs for S3
    disable_ssl: false                   # Disable SSL
🚀 Installation
Manual Installation
- Download the latest binary for your platform from the Releases page.
- Place the binary in your system’s PATH(e.g.,/usr/local/bin).
Installation script for Unix-Based OS (requires: tar, curl, jq):
(
set -euo pipefail
OS="$(uname | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/(arm)(64)?.*/12/' -e 's/aarch64$/arm64/')"
TAG="$(curl -s https://api.github.com/repos/hashmap-kz/pgrwl/releases/latest | jq -r .tag_name)"
curl -L "https://github.com/hashmap-kz/pgrwl/releases/download/${TAG}/pgrwl_${TAG}_${OS}_${ARCH}.tar.gz" |
tar -xzf - -C /usr/local/bin && 
chmod +x /usr/local/bin/pgrwl
)
🗃️ Usage In Backup Process
The full process may look like this (a typical, rough, and simplified example):
- You have a cron job that performs a base backup of your cluster every three days.
- You run pgrwlas a systemd unit or a Kubernetes pod (depending on your infrastructure).
- You have a configured retention worker that prunes WAL files older than three days.
- With this setup, you’re able to restore your cluster – in the event of a crash – to any second within the past three days.
🧱 Architecture
Design Notes
pgrwl is designed to use the local filesystem exclusively. This is a deliberate choice, because – as mentioned earlier – we must rely on fsync after each message is written to disk.
This ensures that *.partial files always contain fully valid WAL segments, making them safe to use during the restore phase (after simply removing the *.partial suffix).
pgrwl supports compression and encryption as optional features for completed WAL files (during upload on remote storage).
However, streaming *.partial files to any location other than the local filesystem can introduce numerous unpredictable issues.
In short: PostgreSQL waits for the replica to confirm commits, so we cannot afford to depend on external systems in such critical paths.
💾 Notes on fsync (since the utility works in synchronous mode only):
- After each WAL segment is written, an fsyncis performed on the currently open WAL file to ensure durability.
- An fsyncis triggered when a WAL segment is completed and the*.partialfile is renamed to its final form.
- An fsyncis triggered when a keepalive message is received from the server with thereply_requestedoption set.
- Additionally, fsyncis called whenever an error occurs during the receive-copy loop.
🔁 Notes on archive_command and archive_timeout
There’s a significant difference between using archive_command and archiving WAL files via the streaming replicationprotocol.
The archive_command is triggered only after a WAL file is fully completed—typically when it reaches 16 MiB (the default segment size). This means that in a crash scenario, you could lose up to 16 MiB of data.
You can mitigate this by setting a lower archive_timeout (e.g., 1 minute), but even then, in a worst-case scenario,you risk losing up to 1 minute of data.Also, it’s important to note that PostgreSQL preallocates WAL files to the configured wal_segment_size, so they arecreated with full size regardless of how much data has been written. (Quote from documentation:It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage.).
In contrast, streaming WAL archiving—when used with replication slots and the synchronous_standby_namesparameter—ensures that the system can be restored to the latest committed transaction.This approach provides true zero data loss (RPO=0), making it ideal for high-durability requirements.
👷 Developer Notes
🧪 Integration Testing:
Here an example of a golden fundamental test.It verifies that we can restore to the latest committed transaction after an abrupt system crash.It also checks that the WAL files generated are byte-for-byte identical to those generated by pg_receivewal.
Test Steps:
- Initialize and start a PostgreSQL cluster
- Run WAL receivers (pgrwlandpg_receivewal)
- Create a base backup
- Create a table, and insert the current timestamp every second (in the background)
- Run pgbench to populate the database with 1 million rows
- Generate additional data (~512 MiB)
- Concurrently create 100 tables with 10000 rows each.
- Terminate the insert-script job
- Run pg_dumpall and save the output as plain SQL
- Terminate all PostgreSQL processes and delete the PGDATAdirectory (termination is force and abnormal)
- Restore PGDATAfrom the base backup, add recovery.signal, and configure restore_command
- Rename all *.partialWAL files in the WAL archive directories
- Start the PostgreSQL cluster (cluster should recover to the latest committed transaction)
- Run pg_dumpall after the cluster is ready
- Diff the pg_dumpall results (before and after)
- Check the insert-script logs and verify that the table contains the last inserted row
- Compare WAL directories (filenames and contents must match 100%)
- Clean up WAL directories and rerun the WAL archivers on a new timeline (cleanup is necessary since we run receivers with –no-loop option)
- Compare the WAL directories again
To contribute or verify the project locally, the following make targets should all pass:
# Compile the project
make build
# Run linter (should pass without errors)
make lint
# Run unit tests (should all pass)
make test
# Run integration tests (slow, but critical)
# Requires Docker and Docker Compose to be installed
make test-integ-scripts
# Run GoReleaser builds locally
make snapshot
✅ All targets should complete successfully before submitting changes or opening a PR.
🗂️ Source Code Structure
internal/xlog/pg_receivewal.go
  → Entry point for WAL receiving logic.
    Based on the logic found in PostgreSQL:
    https://github.com/postgres/postgres/blob/master/src/bin/pg_basebackup/pg_receivewal.c
internal/xlog/receivelog.go
  → Core streaming loop and replication logic.
    Based on the logic found in PostgreSQL: 
    https://github.com/postgres/postgres/blob/master/src/bin/pg_basebackup/receivelog.c
internal/xlog/xlog_internal.go
  → Helpers for LSN math, WAL file naming, segment calculations.
    Based on the logic found in PostgreSQL:
    https://github.com/postgres/postgres/blob/master/src/include/access/xlog_internal.h
internal/xlog/walfile.go
  → Manages WAL file descriptors: open, write, close, sync.
internal/xlog/streamutil.go
  → Utilities for querying server parameters (e.g. wal_segment_size),
    replication slot info, and streaming setup.
internal/xlog/fsync/
  → Optimized wrappers for safe and efficient `fsync` system calls.
📐 Main Loop
⏮️ Links
- pg_receivewal Documentation
- pg_receivewal Source Code
- Streaming Replication Protocol
- Continuous Archiving and Point-in-Time Recovery
- Setting Up WAL Archiving
✅ TL;DR
If you’re building reliable PostgreSQL backup pipelines and want streaming, durability, and developer control, give pgrwl a try.
💬 Questions or feedback? Drop a GitHub Issue or comment here!
🔖 Licensed under MIT
