Overview
- The CS Data Standby & Recovery (DSR) is a standby service in the cloud, which offers a number of services including a standby database, regularly scheduled PG dumps, Point In Time Recovery based backups, and recovery of specific database components upon request.
- Each DSR server can host multiple standby instances producing even greater cost savings
- Each database instance standby is configured with a warm standby, we then use the warm standby to generate PITR backups and pg_dump based dumps on a regular basis
- A HOT standby instance via streaming replications or SLONY can be added to any DSR server
- Compressed PG dumps and PITR Based backups are executed on a regular basis (based on a schedule defined by the customer). Both the PG Dumps and the PITR based backups can be executed against any of the standby types
- Automated Test Recoveries are executed on a regular basis (at least once a quarter), and a recoverability status report is sent to the customer – additionally the recovered instance can remain on the cloud for up to a week if desired so the customer can validate the recovery
- Customer's have access to pull PG dumps or PITR based backups and related WAL segments via sftp or rsync at any time via a secure connection
- Customers can execute a failover at any time which brings the standby into a live state and re-directs applications to the standby
- Customers can request any type of recovery as needed to be prepared and delivered in a variety of formats as needed, such as PG dump recoveries, PITR Recoveries, recoveries of specific data structures based on a Point In Time, etc
- Secondary standby's can be rolled out in separate data centers to enforce multiple layers of redundancy
- The monthly cost covers all setup and maintenance efforts
- Time expended on our side to fulfill recovery requests (such as custom table dumps based on a Point in Time, copies of the backup staged for a download to QA, etc) is billed hourly – or can be pulled from existing monthly Admin-Packs
How It Works
Standby Setup
The initial standby is setup, we ship the cluster data via secure connection. We have several methods of implementing this which minimize any needed downtime to ensure that the customer cluster and the standby are fully in sync.
The “standby” is a Warm-Standby, A HOT standby can be added as an Add-On feature via Streaming Replication, or SLONY.
Once the base cluster data is in place then we setup the continuous data sync processes based on the type of standby, including the shipping of WAL segments from the original cluster to facilitate both the standby cluster and the creation of PITR based backups.
Regular pg_dump's and PITR based backups
Once the standby is setup we create both PG Dumps and PITR based backups on a schedule defined by the customer. Both the PG Dumps and the PITR base backups with their relevant WAL segments are staged in an area accessible by the client via a secure, key based rsync connection. All of the PG dump files and the PITR relevant files are managed via customer selected retention periods.
Customer Requested Recoveries
ny custom recovery needed can be created upon request and staged for download. Someone truncated a critical table yesterday at 2:15pm? No problem, we can recover the table as of 2:14:59 and stage the table dump for download. As a customer you can request any specific recovery data that is possible from either your currently available PG Dumps, or your currently available PITR backups, such as:
- Recover a set of tables based on a Point in time
- Recover a schema based on a Point in Time
- Recover all triggers
- Recover DDL only for a schema
Test Recoveries and Statistics
On a regular basis (generally quarterly) we perform a test recovery in the cloud from one of the recent PG dump files, or via the PITR files to ensure that recovery files are valid.
Each test recovery is documented via a recovery report. Once the test recovery is completed, the recovered database cluster is dropped and the recovery report is sent to the customer. Also the current status of all recent test recoveries, dumps, backups, etc are available anytime on a customer specific web portal.
Customer Requested Recoveries
Each customer is given a login to our ticketing system. At any time a customer can request a specific recovery, such as:
- Create a PITR recovery based on a specific time, create a PG dump of the recovered cluster and stage it for download
- Recover a specified table to a point in time and stage the table dump for download
- Etc
Failover
Each customer is given code, created specific to their environment that will allow failover to the standby at any time. The code can create alerts and then allow the customer to manually execute the failover or the failover can be completely automated. Once the crisis or scenario that forced the need for the failover has passed then we simply reset the standby. If the customer has multiple standby's on a single standby server then the failover processes will allow the failover of individual clusters as needed.
Redundant services
If needed we can setup separate standby's, each in a completely different data center in order to provide maximum redundancy and failover options. The secondary standby can be a standby only (i.e. no PG dumps, no PITR backups, etc) thus reducing the cost of the secondary standby. If a secondary standby is setup then the failover code can be customized to failover based on availability, first to the primary and then the secondary. Alternatively we can code any custom failover options desired thus allowing the customer to leverage the standby's in any way that meets their needs.
