Linking Redis database with a dump.rdb or dump.json file - redis

Given a snapshot of an existing redis database in a dump.rdb (or in .json format) file, I want to restore this data in my own machine to run some tests on it.
Any pointers on how to do this would be greatly appreciated.
I have resorted to trying to parse the data in the dump.rdb and then save it in a redis DB manually. I feel like there is/should be a cleaner way.

If you want to restore the entire file, simply copy it to the right directory specified in redis.conf and restart redis server. But if you want to load a subset of keys/databases, you'd have to parse the dump file.

SO:
I continued doing it the "hacky" way and found that using the parser code found here:
https://github.com/sripathikrishnan/redis-rdb-tools was a great help.
using the parser sample code i could:
1) set up a redis client
2) use the parser to parse the data
3) use the client to "set" parsed data into a new redis database.

the rdd tools can also do that,
it work independantly of .rdb files and dump/restore working redis instances
it can apply merge, split, rename, search, filter, insert, delete on dumps and/or redis

Related

Use RStudio to connect to, and run queries on, a locally stored, compressed SQL databse

I'm trying to connect to and run queries on two large, locally-stored SQL databases with file extensions like so:
filename.sql.zstd.part
filename2.sql.zstd
My preference is to use the RMySQL package- however i am finding it hard to find documentation of a) how to access locally stored SQL files, and b) how to deal with the zstd extension.
This may be very basic but help is appreciated!
Seems like you have problems understanding the file extensions.
filename.sql.zstd.part
.part usually means you are downloading a file from the internet, but the download isn't complete yet (so downloads that are in progress or have been stopped)
So to get from filename.sql.zstd.part to filename.sql.zstd you need to complete your download
.zstd means it is a compressed file (to save disk space). You need a decompression program to get from filename.sql.zstd to filename.sql
The compression algorithm used is called Zstandard so you need a decompressor specifically for this program. Look here https://facebook.github.io/zstd/ for such a program.
There was also once an R package for this - but it has been archived. But you could also download an older version
(https://cran.r-project.org/web/packages/zstdr/index.html)
In filename.sql is actually not a database. In an .sql file are usually SQL statements for creating / modifying database structures. You'd have to install a database e.g. MariaDB and then import this .sql file to actually really have the files in a database on your computer. And then you would access this database via R.

How to Recover Redis from Multiple RDB Files

The multiple rdb files are from different redis servers. Now I want to combine the data files to a single redis server. By far I only find the answers to recover with a single dump.rdb file.
The simplest way to do this is by using DEBUG RELOAD, an undocumented command.
DEBUG RELOAD [MERGE] [NOFLUSH] [NOSAVE]
Save the RDB on disk and reload it back in memory. By default it will
save the RDB file and load it back.
With the NOFLUSH option the current database is not removed before loading the new one, but
conficts in keys will kill the server with an exception.
When MERGE is
used, conflicting keys will be loaded (the key in the loaded RDB file
will win).
When NOSAVE is used, the server will not save the current
dataset in the RDB file before loading.
Use DEBUG RELOAD NOSAVE when
you want just to load the RDB file you placed in the Redis working
directory in order to replace the current dataset in memory.
Use DEBUG RELOAD NOSAVE NOFLUSH MERGE when you want to add what is in the
current RDB file placed in the Redis current directory, with the
current memory content.
Use DEBUG RELOAD when you want to verify Redis
is able to persist the current dataset in the RDB file, flush the
memory content, and load it back.",
The above is taken from debug.c, applied friendly format.
So, use DEBUG RELOAD NOSAVE NOFLUSH if you want to ensure there are no duplicate keys in different RDBs. Use DEBUG RELOAD NOSAVE NOFLUSH MERGE if you know you have duplicates, load last the one you want to prevail.

How to recover HDB with tplogs?

Our system is currently backing up tplogs to S3. From what I have read, simply making sure these files are in the place that kdb expects them will allow for recovery if there is an issue with RDB during the day.
However, I did not see an explanation of how to use the tplogs to recover HDB. I tempted to create another backup system to sync the hdb folders to S3 also. That will be more work to set up and use at least double the storage, as well as being redundant. So if its not necessary then I would like to avoid that extra step.
Is there a way to recover the HDB from the tplogs in the event that we lose access to our HDB folders, or do I need to add another backup system for the HDB folders? Thanks.
To replay log file to HDB.
.Q.hdpf[`::;get `:tpLOgFile;.z.d;`sym]
As per my experience if you are building a HDB from TP logfile load tp log file using get function and save it using dpft that is efficient.
If you want to use -11! function then you have to provide a upd function(-11! read each row from tp log file and call upd function then insert data to in memory table) to load data in memory and then save data on disk.
In both case you have to load data in memory but by using get function you can skip upd function call
-11! function is efficient for building the RDB because it will not load the full log file.
For more details read Below link http://www.firstderivatives.com/downloads/q_for_Gods_July_2014.pdf
OK, actually found a forum answer to a similar question, with a script for replaying log files.
https://groups.google.com/forum/#!topic/personal-kdbplus/E9OkvJKGrLI
Jonny Press says:
The usual way of doing it is to use -11! to replay the log file. A basic script would be something like
// load schema
\l schema.q
// define upd
upd:insert
// replay log file
-11!`:schema2015.07.09
// save
.Q.hdpf[`::;`:hdb;2015.07.09;`sym]
This will read the full log file into memory. So you will need to have RAM available.
TorQ has a TP log replay script:
https://github.com/AquaQAnalytics/TorQ/blob/master/code/processes/tickerlogreplay.q

Large file to LookUp other large file when files are dependent- Mule ESB

Could you please suggest. I have two files each have 80 to 90k product and these two files are interlinked with each other(one file have information on other) and i need to generate one single file by looking up the other files. These files probably comes in the sameTime with different name.
Both the files are csv and i need to generate the new csv.
Is that the only way I should keep any one of these files in memory and keep looking by iterating.
I planned to use Batch inside dataMapper. Is there any way we can keep the first file in Datamapper userDefined table or something like that.And the getting the new file to make a look up on it.( I'm not provided with external DB)
If any one of the file have some 5000 or 10k lines it the sense, i can keep that in memory and make the 80k file to look on it. I'm not comfortable to keep 80 or 90k file in memory.
Have reference this link: Mule ESB - design a multi file processing flow when files are dependent on each other.
Could you please suggest me the best solution.
Also any idea How long to process the file it does take, Thanks in advance.
Mule studio:5.3.1 and Runtime: 3.7.2
I would think of the problem as two distinct events from Mule's perspective, and plan to keep state from the first one in a "database" of some kind. This doesn't have to be an Oracle cluster or anything, you can run H2 in process or Redis on the same server as Mule for example.
I think you're on the right track with the Batch idea. When the first file is received, I'd create a record for each in a batch job. Then when the second file is received, I'd run a second batch job that looks up the relevant information from the database, and generates the CSV file you need. It could also remove the records that have been matched from the database in a subsequent batch step.
For the transformations, I'd recommend trying DataWeave instead of DataMapper. It's a better way to write transformation logic, and Mulesoft has deprecated DataMapper, to be removed as of Mule 4.0.

Storing Drupal SQL in Git

I have a drupal site, and I am storing the codebase in a git repository. This seems to be working out well, but I'm also making changes to the database. I'm considering doing periodic dumps of the database and committing to git. I had a few questions about this.
If I overwrite the file, will git think it is a brand new file or will it recognize that it is an altered version of the same file.
Will this potentialy make my repo huge (the database is 16mb)
Can I zip this file? or will this mess Git up ... the zipped version is only 3mb
Any other suggestions?
If you have enough space, a non-compressed dump in source control is pretty handy because you can compare using a diff program what rows were added/modified/deleted.
Another solution is to use the features module which is supposed to capture drupal config in code. It stores this captured data as a feature module which you can put into version control.
For my database applications, I store scripts of DDL statements (like CREATE TABLE) in some sort of version control system. These scripts sometimes include static "seed" data as well. All the version control systems I use are good at recognizing differences in these files, and they are much smaller than the full database with data.
For the dynamically-generated data, I store backups (e.g. from mysqldump) in an appropriate location (depending on the importance of the data, that may include offsite backups).
1) It's all text, so GIT will just see it as it would any other file.
2) No, due to the above it should add 16mb to the repo (or less, due to GITs own compression), it won't add a new file every time, just the changes, so the repo will change by the size of the additions to the repository
3) No, or GIT won't be able to see the differences - GIT does it's own compression anyway