How to Recover Redis from Multiple RDB Files - redis

The multiple rdb files are from different redis servers. Now I want to combine the data files to a single redis server. By far I only find the answers to recover with a single dump.rdb file.

The simplest way to do this is by using DEBUG RELOAD, an undocumented command.
DEBUG RELOAD [MERGE] [NOFLUSH] [NOSAVE]
Save the RDB on disk and reload it back in memory. By default it will
save the RDB file and load it back.
With the NOFLUSH option the current database is not removed before loading the new one, but
conficts in keys will kill the server with an exception.
When MERGE is
used, conflicting keys will be loaded (the key in the loaded RDB file
will win).
When NOSAVE is used, the server will not save the current
dataset in the RDB file before loading.
Use DEBUG RELOAD NOSAVE when
you want just to load the RDB file you placed in the Redis working
directory in order to replace the current dataset in memory.
Use DEBUG RELOAD NOSAVE NOFLUSH MERGE when you want to add what is in the
current RDB file placed in the Redis current directory, with the
current memory content.
Use DEBUG RELOAD when you want to verify Redis
is able to persist the current dataset in the RDB file, flush the
memory content, and load it back.",
The above is taken from debug.c, applied friendly format.
So, use DEBUG RELOAD NOSAVE NOFLUSH if you want to ensure there are no duplicate keys in different RDBs. Use DEBUG RELOAD NOSAVE NOFLUSH MERGE if you know you have duplicates, load last the one you want to prevail.

Related

How to recover HDB with tplogs?

Our system is currently backing up tplogs to S3. From what I have read, simply making sure these files are in the place that kdb expects them will allow for recovery if there is an issue with RDB during the day.
However, I did not see an explanation of how to use the tplogs to recover HDB. I tempted to create another backup system to sync the hdb folders to S3 also. That will be more work to set up and use at least double the storage, as well as being redundant. So if its not necessary then I would like to avoid that extra step.
Is there a way to recover the HDB from the tplogs in the event that we lose access to our HDB folders, or do I need to add another backup system for the HDB folders? Thanks.
To replay log file to HDB.
.Q.hdpf[`::;get `:tpLOgFile;.z.d;`sym]
As per my experience if you are building a HDB from TP logfile load tp log file using get function and save it using dpft that is efficient.
If you want to use -11! function then you have to provide a upd function(-11! read each row from tp log file and call upd function then insert data to in memory table) to load data in memory and then save data on disk.
In both case you have to load data in memory but by using get function you can skip upd function call
-11! function is efficient for building the RDB because it will not load the full log file.
For more details read Below link http://www.firstderivatives.com/downloads/q_for_Gods_July_2014.pdf
OK, actually found a forum answer to a similar question, with a script for replaying log files.
https://groups.google.com/forum/#!topic/personal-kdbplus/E9OkvJKGrLI
Jonny Press says:
The usual way of doing it is to use -11! to replay the log file. A basic script would be something like
// load schema
\l schema.q
// define upd
upd:insert
// replay log file
-11!`:schema2015.07.09
// save
.Q.hdpf[`::;`:hdb;2015.07.09;`sym]
This will read the full log file into memory. So you will need to have RAM available.
TorQ has a TP log replay script:
https://github.com/AquaQAnalytics/TorQ/blob/master/code/processes/tickerlogreplay.q

(OS X) Determine if file is being written to?

My app is monitoring a "hot" folder somewhere on the local filesystem for newly added files to push to a network location. I'm running into a problem when very large files are being written into the hot folder: the file system event notifying me of changes in the hot folder will fire well before the file completes writing. When my app tries to upload the file, it mis-reads the file size as the current number of copied bytes, not the eventual total number of bytes.
Things I've tried:
NSURL getResourceValue:forKey:error: to read NSURLAllocatedFileSizeKey (same value as NSURLFileSizeKey while the file is being written).
NSFileManager attributesOfItemAtPath:error: to look at NSFileBusy (always NO).
I can't seem to find any mechanism short of repeatedly polling a file for its size to determine if the file is finished copying and can be uploaded.
There aren't great ways to do this.
If you can be certain that the writer is using NSFileCoordinator, then you can also use that to coordinate your access to the file.
Likewise, if you're sure that the writer has opted in to advisory locking, you could try to open the file for shared access by calling open() with the O_SHLOCK and O_NONBLOCK flags. If you succeed, then there are no other descriptors open for exclusive access. You can either use the file descriptor you've got or close it and then use some other API to access the file.
However, if you can't be sure of any of those, then your best bet may be to set a timer to repeatedly check the file's metadata (size, date modified, etc.). Only when you see that it has stopped changing over a reasonable time interval (2 seconds, maybe) would you attempt to access it (and cancel the timer).
You might want to do all three. Wait for the file's metadata to settle down, then use a NSFileCoordinator to read from the file. When it calls your reader block, use open() with O_SHLOCK | O_NONBLOCK to make sure there are no other processes which have exclusive access to it.
You need some form of coordinated file locking.
fcntl() and flock() are common functions for this.
Read up on it first.
Then see what options you have.
If you can control the code base of those other processes, all the better.
The problem with really large files is that what's changed or changing inside them is opaque and isn't always at the end.
Good processes should generally be doing atomic writes. (Write to a temp file then swap it out) but if these files are actually databases then you will want to look at using the db's server app for this sort of thing.
If the files are wrappers containing other files then it gets extra messy as those contents might have dependencies on one another to be in a usable state.

IOMeter doesn't write log files due to full disk

Does anyone have a workaround for IOMeter not writing logs to disk? I believe this is because the iobw.tst file takes up the whole disk. I have had the test running, then manually created a temporary 1MB file while the disk was filling up, then deleted that 1MB file after the disk is full and while the reads and writes are being performed and this consistently produces the full log file for the test. Similarly, clearing the Recycle Bin or temporary files at this time produces the same result.
Does anyone know of a way to reserve this space for the logfile using a configuration file or something along these lines? IOMeter is part of an automated suite of tests that I'm working on and this issue is preventing full automation.
You have to compile Dynamo with "DETAILS" and/or "DEBUG" flags "on".
Then dynamo will store all the info into ~/std.out log (if you're under linux)

Can I use an rsync log file from a dry run as an input to a real run?

I like rsync. I can see what files will be deleted first. But what happens if during the backup, a sector of the source disk fails? Files could be deleted from the destination that should not be. However, if I check the log file for all deletion files first, then use the log file as instructions to rsync, then a source disk failure during backup should result in a lower probability of data loss.
I've read the man page and have to conclude that the answer is no. If not rsync, then what?
You can mitigate source disk failure risk using
--delete-after receiver deletes after transfer, not during
That will not delete files if a IO error is produced during copy.
But for ensuring integrity of your backup, I think the right way is using:
--only-write-batch=FILE like --write-batch but w/o updating destination
That will write diffs into a file. Once batch is created, you move it to destination machine, and apply diffs with:
--read-batch=FILE read a batched update from FILE

Linking Redis database with a dump.rdb or dump.json file

Given a snapshot of an existing redis database in a dump.rdb (or in .json format) file, I want to restore this data in my own machine to run some tests on it.
Any pointers on how to do this would be greatly appreciated.
I have resorted to trying to parse the data in the dump.rdb and then save it in a redis DB manually. I feel like there is/should be a cleaner way.
If you want to restore the entire file, simply copy it to the right directory specified in redis.conf and restart redis server. But if you want to load a subset of keys/databases, you'd have to parse the dump file.
SO:
I continued doing it the "hacky" way and found that using the parser code found here:
https://github.com/sripathikrishnan/redis-rdb-tools was a great help.
using the parser sample code i could:
1) set up a redis client
2) use the parser to parse the data
3) use the client to "set" parsed data into a new redis database.
the rdd tools can also do that,
it work independantly of .rdb files and dump/restore working redis instances
it can apply merge, split, rename, search, filter, insert, delete on dumps and/or redis