backup a greenplum instance: custom script - backup

We have a greenplum 6.x instance. About 40 segment servers.
We have to backup the instance.
I know there are two main methods to backup greenplum instance
gpbackup for parallel backup. The main recommended method as I understand.
pg_dump non parallel backup that has to go through the master. (Not recommended because of slow performance) pg_dump and pg_restore is available for compatibility with standard postgres databases. (pg_dump / pg_restore)
But we cannot use a gpbackup: we do not have free space to keep a backup files. There is no enough free space on greenplum servers. And yet we don't have S3, NAS shared folder or data domain.
The only way that - theoretically - we have is to backup directories /pgdata/ - on all servers of the greenplum instance.
So the idea: perform a pg_start_backup command - copy entire directory /pgdata/ - copy WAL files to keep the backup consistency.
But I cannot understand how to perform a pg_start_backup command - on all PostgreSQL instances - members of the greenplum instance

i'm confused how you have enough space to copy the entire pgdata and WAL files but do not have space to do a backup? Considering that you seem to believe backups are important (which they are if this is a critical system), there must be a way to cobble some storage either via NFS or S3 (cloudian or minio would work here) to do a proper gpbackup - it will save you a ton of time should you ever need to use this backup.

Related

How to implement Snapshot Replication

I have data on several machines that I want to backup in away that I can restore to certain points in time.
From what I read Snapshot Replication achieves this (as opossed to back-up that clobbers previous results).
The main motivation is that if the data files are ransacked, and encoded, then if I just back-up I can end up in a state where the backed up files are also encrypted.
One way to do this is by using 2 Synology NAS machines where I can have:
rsync processes to back-up files from multiple machines into a NAS1
apply Snapshot Replication from NAS1 to NAS2
In this way, if the data is hijacked at certain point, I can restore the data to the last good state by restoring NAS2 to previous point in time.
I would like to know if:
Snapshot Replication is the way to go, or there are other solutions?
are there other ways to achieve Snapshot Replication, e.g. with single NAS?
I have an older Synology 2-Bay NAS DS213j.
Assuming that I buy a second, newer, NAS (e.g. DS220j), are the 2 NAS machines expected to work together?
Thanks
I found out that Hyper Backup can save snapshots in time, so I'm using it instead of Snapshot Replication

How can I dump a single Redis DB index?

Redis SAVE and BGSAVE commands dump the complete Redis data to a persistent file.
But is there a way to dump only one DB index?
I am using the same Redis server with multiple DB indices.
I use DB 0 as config which is edited manually and contains just a small number of keys. I wish to dump this to a file as a config snapshot (versioned) to keep track of manual changes in the prod environment.
The rest of the DBs have a large number of items, that will take too long to dump, and I don't need to back them up.
Redis' persistence scope is the entire instance, meaning all shared/numbered databases and all keys in them. Saving only a subset of these is not supported.
Instead, use two independent Redis instances and configure each to persist (or not) per your needs. The overhead of running an insurance is a few megabytes so it is practically negligible.

Jackrabbit repository incremental backup

I'm using Jackrabbit v2.2.x. I want to know if is there a way to take incremental backup of a jackrabbit repository? I mean, just the delta (difference) based on date or something else. Actually the problem is that the repository size is in TeraBytes and every time we have to take production data it takes a lot of time to copy full repository.
If the storage backend support incremental backups, an incremental low level backup might be the easiest solution.
If not, possibly you could use the EventJournal to iterate over the changes since the last backup, and just backup those changes. Most likely this will require more work however.
Another solution is to do an incremental backup of the data store (if this is what uses most of the disk space), and do a full backup of the node data (persistence managers).

SQL Server 2008 differencing databases

So here is the deal:
Background
A Hyper-V VM can handle a differencing disk mode, where one can set the original VHD file in a read-only state and create a new vhd which keeps track of, and persists, the changes. The advantage here is you can easily create new VMs without having to reinstall Windows, etc.
Problem
What I am looking for is something similar, but for SQL Server databases. We do all of our development locally and then we have a box that has X instances run on it (1 for each developer). We then have a process which copies the production backups that are made and restores them to these instances. After this is complete, it checks-out a branch that a developer chooses (of SQL scripts) and runs the scripts on the instance. This way they can test their code on production data prior to it actually hitting production. However, it is a real pain to have a copy of all our production dbs for each instance-- it would be nice to have 1 set of them and have a differential option which just persists the changes made. Is this possible or am I dreaming?
Possible solution
One solution I thought of is just to use an actual differencing disk VHD. I would create a base VHD that has our production backup databases, which would be modified/created night with the production database. I then would have it modify/create differencing disks and apply the scripts to each differencing disk. This way we have 1 copy of the dbs, and the developer's changes are recorded to a separate differencing disk. However, I was hoping to accomplish this in SQL server.
Basically the conclusion I have come to is to try and automate the process of differencing disks as below:
Create a new VHD on a network share- we'll call this NAS1.
Mount the VHD from NAS1 on a machine that acts as a SQL processor (we'll call this SQLPROCESS1.
SQLPROCESS1 performs the following actions.
Copy BAK SQL files from production to SQLPROCESS1 (this might take a while, but this entire #3 could be put into a threaded application, so it could be copying multiple and restoring at the same time).
Restore files on SQLPROCESS1 and point data files (mdf, ldf) to reside on the new VHD.
Optional: Change SQL dbs to SIMPLE backup mode and use SHRINKFILE since we'll be using them solely for development (and don't need backups). This can save us a lot of space.
Detach all dbs.
Detach the VHD.
Create differencing disk from parent on NAS1.
Copy differencing disk X number of times (as needed per instance or developer).
Optional: We use a central server called TEST1 for testing and this is where we are going to mount each differencing disk-- 1 per instance or developer.
We'll first need to detach all dbs from each instance.
Then we'll need to unmount/detach the existing differencing VHDs if there are any.
Attach differencing disk(s).
Reattach all dbs in SQL Server.
Optional: run SQL scripts from a code repository branch as specified per developer.
References:
http://obligatorymoniker.wordpress.com/2010/08/21/how-to-create-a-differencing-vhd-that-refers-to-a-parent-vhd-that-is-on-a-network-share-from-windows-7/
To automate I'd use a simple set of batch files, VBS, or PowerShell.
Edit: Just tried this and it works great! Developers now have their own instance and it only records their changes.

Performance Impact of Empty file by migrating the data to other files in the same filegroup

We have a database currently sitting on 15000 RPM drives that is simply a logging database and we want to move it to 10000 RPM drives. While we can easily detach the database, move the files and reattach, that would cause a minor outage that we're trying to avoid.
So we're considering using DBCC ShrinkFile with EMPTYFILE. We'll create a data and a transaction file on the 10000 RPM drive slightly larger than the existing files on the 15000 RPM drive and then execute the DBCC ShrinkFile with EMPTYFILE to migrate the data.
What kind of impact will that have?
I've tried this and had mixed luck. I've had instances where the file couldn't be emptied because it was the primary file in the primary filegroup, but I've also had instances where it's worked completely fine.
It does hold huge locks in the database while it's working, though. If you're trying to do it on a live production system that's got end user queries running, forget it. They're going to have problems because it'll take a while.
Why not using log shipping. Create new database on 10.000 rpm disks. Setup log shipping from db on 15K RPM to DB on 10k RPM. When both DB's are insync stop log shipping and switch to the database on 15K RPM.
Is this a system connected to a SAN, or is it direct attached storage? If its a SAN do a SAN side migration to the new raid group and the server won't ever know there was a change.