We've started using the Periodic Backup bundle in RavenDB to export data from our Server to an Azure Blob Storage Container incrementally.
The source database uses the Encryption bundle to safeguard data at rest, however after testing the restore it's clear the backup files either do not encrypt the data or keep a copy of the key in the backup file for restoration.
How can we configure RavenDB to encrypt backups, periodic or otherwise, so they remain safeguarded at rest?
Periodic Backup uses an export method to do that, which store the data in clear text.
Actually backing up (Raven.Backup.exe) will store the data in encrypted format, but will include the key in the backup (there is an option to prevent that), so you can actually restore the backup on another machine.
To expand on #Ayende Rahien's answer, use Raven backup instead of export.
If you need to remove the encryption key in the backup, use a curl Raven backup request with DatabaseDocument containing an empty SecuredSettings property.
Example from the link above:
curl -X POST "http://localhost:8080/databases/Northwind/admin/backup?incremental=false" \
-d "{\"BackupLocation\":\"c:\temp\backup\Northwind\",\"DatabaseDocument\":{\"SecuredSettings\":{},\"Settings\":{\"Raven/ActiveBundles\": \"Encryption\"},\"Disabled\":false,\"Id\":null}}"
Here's the posted data in a readable format:
{
"BackupLocation": "c:\temp\backup\Northwind",
"DatabaseDocument": {
"SecuredSettings": {}, // <- EncryptionKey not present here.
"Settings": {
"Raven/ActiveBundles": "Encryption"
},
"Disabled":false,
"Id":null
}
}
Related
Our system is currently backing up tplogs to S3. From what I have read, simply making sure these files are in the place that kdb expects them will allow for recovery if there is an issue with RDB during the day.
However, I did not see an explanation of how to use the tplogs to recover HDB. I tempted to create another backup system to sync the hdb folders to S3 also. That will be more work to set up and use at least double the storage, as well as being redundant. So if its not necessary then I would like to avoid that extra step.
Is there a way to recover the HDB from the tplogs in the event that we lose access to our HDB folders, or do I need to add another backup system for the HDB folders? Thanks.
To replay log file to HDB.
.Q.hdpf[`::;get `:tpLOgFile;.z.d;`sym]
As per my experience if you are building a HDB from TP logfile load tp log file using get function and save it using dpft that is efficient.
If you want to use -11! function then you have to provide a upd function(-11! read each row from tp log file and call upd function then insert data to in memory table) to load data in memory and then save data on disk.
In both case you have to load data in memory but by using get function you can skip upd function call
-11! function is efficient for building the RDB because it will not load the full log file.
For more details read Below link http://www.firstderivatives.com/downloads/q_for_Gods_July_2014.pdf
OK, actually found a forum answer to a similar question, with a script for replaying log files.
https://groups.google.com/forum/#!topic/personal-kdbplus/E9OkvJKGrLI
Jonny Press says:
The usual way of doing it is to use -11! to replay the log file. A basic script would be something like
// load schema
\l schema.q
// define upd
upd:insert
// replay log file
-11!`:schema2015.07.09
// save
.Q.hdpf[`::;`:hdb;2015.07.09;`sym]
This will read the full log file into memory. So you will need to have RAM available.
TorQ has a TP log replay script:
https://github.com/AquaQAnalytics/TorQ/blob/master/code/processes/tickerlogreplay.q
Given a snapshot of an existing redis database in a dump.rdb (or in .json format) file, I want to restore this data in my own machine to run some tests on it.
Any pointers on how to do this would be greatly appreciated.
I have resorted to trying to parse the data in the dump.rdb and then save it in a redis DB manually. I feel like there is/should be a cleaner way.
If you want to restore the entire file, simply copy it to the right directory specified in redis.conf and restart redis server. But if you want to load a subset of keys/databases, you'd have to parse the dump file.
SO:
I continued doing it the "hacky" way and found that using the parser code found here:
https://github.com/sripathikrishnan/redis-rdb-tools was a great help.
using the parser sample code i could:
1) set up a redis client
2) use the parser to parse the data
3) use the client to "set" parsed data into a new redis database.
the rdd tools can also do that,
it work independantly of .rdb files and dump/restore working redis instances
it can apply merge, split, rename, search, filter, insert, delete on dumps and/or redis
Dropbox claims that during syncing only the portion of files that changes are transmitted back to main server, which is obviously a great functionality, but how do they perform changes to files stored in Amazon S3 cloud? So for example, lets say a 30 page document on user's desktop contains changes to only page 4. Dropbox now syncs the blocks representing the changes and what happens on the backend if they files that they store are in the cloud? Does that mean they have to download the 30 page document stored in S3 to their server, then perform replacement of blocks representing page 4, and then uploading back to the cloud? I doubt this would be the case because that would be somewhat inefficient. The other option I could think of is if Amazon S3 provides update of file stored in the cloud based on byte ranges, so for example, make a PUT request to file X from bytes 100-200 which will replace all the bytes from 100 to 200 with value of PUT request. So I was curious how companies that use other cloud services such as Amazon, implement this type of syncing.
Thanks
As S3 and similar storages don't offer filesystem capabilities, anything that pretends to store files and directories needs to emulate a file system. And when doing this files are often split to pages of certain size, where each page is stored in a separate file in the storage. This way the changed block requires uploading only one page (for example) and not the whole file. I should note, that with files like office documents this approach can be faulty if file size is changed - for example, if you insert a page at the beginning or delete a page, then the whole file will be changed and the complete file would need to be re-uploaded. We didn't analyze how Dropbox in particular does his job, and I just described the common scenario. There exist also different "patch algorithms", where a patch can be created locally (if Dropbox has an older local copy in the cache) and then applied to one or more blocks on the server.
There are several synchronizing tools which transfer deltas over the wire like rsync, rdiff, rdiff-backup, etc. For bi-directional synchronising with S3 there are paid services like s3rsync for example. For pure client-side synchronising, tools like zsync can be considered (which is what many people employ to roll-out app updates).
An alternative approach would be to tar-ball a directory, generate a delta file (using rdiff or xdelta3), and upload the delta file by using a timestamp as part of the key. In order to sync, all you need to do is to perform these 2 checks client-side:
You have all the delta files from S3. If not pull them and apply them to generate the latest backup state.
Your last backup state corresponds to your current directory. If not generate a new delta file and push to S3.
The concerning factor here would be the at least 100% additional space utilization, client-side. But this approach will help you revert changes if needed.
I'm writing a backup program for personal (for the moment at least) use.
For some directories (network directories / protected directories) credentials are needed to access them.
I can setup different jobs in the program to run at specific times.
These jobs are stored in an XML file.
I want to also store the usernames and passwords which the jobs will need.
What and where would be the best way to store these?
Changing permissions on the directories is not an option.
Thanks in advance!
You should never store the logon password for a user in Windows in order to be able to access a local directory. Instead, your backup program should run as a user that has the SeBackupPrivilege enabled (i.e. run the backup from a service that runs as the local system). This means that you won't need to change the permissions.
You may also need to make sure that you are doing a Volume Shadow Copy first that you are copying from - don't copy directly from the disk since that may cause your backup to be inconsistent.
Also, you need to take special care for encrypted files and will need to use ReadEncryptedFileRaw for this.
You could execute the backup program as a scheduled task, running as a specific user.
As for storing passwords you can store them using IsolatedStorage and using a two way encryption to make it harder for someone to decipher the file if they manage to find it.
Check out this SO question for implementing two-way encryption.
How is authentication handled in CouchDB? Say I create Admin users and Readers, and assign them roles. Say also that I assign them to an individual database. On the file system level, is there a way for someone who is not authenticating, to look at the data that is stored in the database? Is the data stored as plain text in a file? How is this handled in CouchDB?
Through the database interface, roles are just as strong as they are in any other database. As long as they can't get hold of the files, it's absolutely as secure as your permissions and passwords. However, if they do, there's absolutely no compression or encryption built into CouchDB. Encrypt the data in your code (or your abstraction layer if you use one) if file system access control is a concern - of course anyone who gets hold of your DB filesystem could probably find your code's decryption keys, as well.
It's not a plain text file, it's a binary file that combines the data and indices, but you could copy it to a local CouchDB install and view it that way, or just open it in a good text editor. The data chunks are stored in plain text (JSON, actually) and isn't hard to read, though binary attachments remain binary.