How to migrate Redis database to Aerospike? - redis

We have a large redis database. The number of keys exploded recently as we have ~160M keys which take 50GB+ of RAM.
What would be the best migration strategy to move all this data from Redis to Aerospike? We are planning to use Jedis later so hopefully after the migration it will be as simple as pointing our services to a new port.
Ideally we can somehow import the dump.rdb file into Aerospike.

You need to put a little bit of extra work. Aerospike now supports Redis like list and map APIs. So, the migration will not be painful. However, you need to migrate your data and application.
To migrate data, you can export Redis data in csv format using the redis-cli utility and load it into aerospike using the aerospike csv loader utility. You can parallelize the loading if you split the data into multiple csv files.
To migrate the application, it's best to use aerospike native client library for better integration. You can pick language of your choice. You should find equivalent api for most of your needs. If you already abstracted the basic calls in your application, the migration should be even more smoother as there will be few places where you need to change the calls.

Related

How to export data into cdv from GemFire

I am trying to build a ETL process that extracts data out to GemFire and load in Teradata. However, I am not finding a good mechanism to export data out. The only thing I have found so far is rest api that gets all entries from a region. However, is this good for bulk export ? It will give data back in json which has to be parsed before loading in table, which I assume won't be very performant for large volume of data. Is there any other solution to this ? Like exporting data as csv from GemFire? Or ODBC/JDBC connection to GemFire ? I found both the bulk export and ODBC/JDBC in Gemfire XD documentation but not in core GemFire? So are they not supported in core GemFire? What is the difference between core GemFire and the XD version ?
These are two different products designed with different things in mind. GemFire XD provides a low-latency SQL interface to in-memory table data, so it's generally used as an in-memory RDBMS. GemFire, on the other hand, is an in-memory data grid with "no restrictions" regarding the data you insert into the regions, you basically deal with custom java objects, not with tables. Also, I know GemFire XD was built on top of GemFire in the past, not sure what's the current status of that (you might want to have a look at Snappy Data for more details).
That said, and strictly speaking of GemFire, you can export snapshots of your regions and import them afterwards into another cluster. Even better, you can read a snapshot entry by entry for further processing or transformation into other formats, which I believe is exactly what you're looking for. Please have a look at Cache and Region Snapshots to get the details.
Hope this helps. Cheers.

How do I distribute data into multiple nodes of redis cluster?

I have large number of key-value pairs of different types to be stored in Redis cache. Currently I use a single Redis node. When my app server starts, it reads a lot of this data in bulk (using mget) to cache it in memory.
To scale up Redis further, I want to set up a cluster. I understand that in cluster mode, I cannot use mget or mset if keys are stored on different slots.
How can I distribute data into different nodes/slots and still be able to read/write in bulk?
It's handled in redis client library. You need to find if a library exists with this feature in the language of your choice. For example, if you are using golang - per docs redis-go-cluster provides this feature.
https://redis.io/topics/cluster-tutorial
redis-go-cluster is an implementation of Redis Cluster for the Go language using the Redigo library client as the base client. Implements MGET/MSET via result aggregation.

How to handle index files in a distributed Lucene cluster?

We are using Lucene in our application, and the index files saved in the disk of the same server where the application run.
The index files are almost 2Gb at the moment, and they maybe updated sometime, for example, when new data are inserted into the database, we may have to rebuild that part of index and add them.
So far so good since there is only one application server, now we have to add another to make a cluster, so I wonder how to handle the index files?
BTW, out application should be platform independent, since our clients use different os like Linux, and some of them even use the cloud platform with different storage like Amazon EFS or Azure storage.
Seems I have two opinions:
1 Every server hold a copy of the index files, and the make them synchronized with each other.
But the synchronized mechanism will depend on the OS, we tried to avoid this. And I am not sure if it will cause conflict if two server update the index files with different documents at the sometime.
2 Make the index file shared.
Like 1), the file share mechanism is platform aware. Maybe save them to the database is an alternative, but how about the performance? I have thought to use memcached to save them, but I have not find any examples.
How do you handle this kind of problem?
Possibly you should look into Compass project. Compass allowed to store Lucene index in database and distributed in memory data grids like GigaSpaces, Coherence and Terracotta. Unfortunately this project is outdated and last version was released at 2009. But you can try adapt it for your propose.
Another option, to look at HdfsDirectory that support a storing a index in HDFS file systems. I see only 5 classes in package org.apache.solr.store.hdfs , so it will be relatively easy to adapt them to storing index into in-memory caches like memcached or redis.
Aslo I find a project on github for RedisDirectory, but it initial stage and last commit was at 2012. I can recommend it only for reference.
Hope this help you find a right solution.

Is this a good use-case for Redis on a ServiceStack REST API?

I'm creating a mobile app and it requires a API service backend to get/put information for each user. I'll be developing the web service on ServiceStack, but was wondering about the storage. I love the idea of a fast in-memory caching system like Redis, but I have a few questions:
I created a sample schema of what my data store should look like. Does this seems like it's a good case for using Redis as opposed to a MySQL DB or something like that?
schema http://www.miles3.com/uploads/redis.png
How difficult is the setup for persisting the Redis store to disk or is it kind of built-in when you do writes to the store? (I'm a newbie on this NoSQL stuff)
I currently have my setup on AWS using a Linux micro instance (because it's free for a year). I know many factors go into this answer, but in general will this be enough for my web service and Redis? Since Redis is in-memory will that be enough? I guess if my mobile app skyrockets (hey, we can dream right?) then I'll start hitting the ceiling of the instance.
What to think about when desigining a NoSQL Redis application
1) To develop correctly in Redis you should be thinking more about how you would structure the relationships in your C# program i.e. with the C# collection classes rather than a Relational Model meant for an RDBMS. The better mindset would be to think more about data storage like a Document database rather than RDBMS tables. Essentially everything gets blobbed in Redis via a key (index) so you just need to work out what your primary entities are (i.e. aggregate roots)
which would get kept in its own 'key namespace' or whether it's non-primary entity, i.e. simply metadata which should just get persisted with its parent entity.
Examples of Redis as a primary Data Store
Here is a good article that walks through creating a simple blogging application using Redis:
http://www.servicestack.net/docs/redis-client/designing-nosql-database
You can also look at the source code of RedisStackOverflow for another real world example using Redis.
Basically you would need to store and fetch the items of each type separately.
var redisUsers = redis.As<User>();
var user = redisUsers.GetById(1);
var userIsWatching = redisUsers.GetRelatedEntities<Watching>(user.Id);
The way you store relationship between entities is making use of Redis's Sets, e.g: you can store the Users/Watchers relationship conceptually with:
SET["ids:User>Watcher:{UserId}"] = [{watcherId1},{watcherId2},...]
Redis is schema-less and idempotent
Storing ids into redis sets is idempotent i.e. you can add watcherId1 to the same set multiple times and it will only ever have one occurrence of it. This is nice because it means you don't ever need to check the existence of the relationship and can freely keep adding related ids like they've never existed.
Related: writing or reading to a Redis collection (e.g. List) that does not exist is the same as writing to an empty collection, i.e. A list gets created on-the-fly when you add an item to a list whilst accessing a non-existent list will simply return 0 results. This is a friction-free and productivity win since you don't have to define your schemas up front in order to use them. Although should you need to Redis provides the EXISTS operation to determine whether a key exists or a TYPE operation so you can determine its type.
Create your relationships/indexes on your writes
One thing to remember is because there are no implicit indexes in Redis, you will generally need to setup your indexes/relationships needed for reading yourself during your writes. Basically you need to think about all your query requirements up front and ensure you set up the necessary relationships at write time. The above RedisStackOverflow source code is a good example that shows this.
Note: the ServiceStack.Redis C# provider assumes you have a unique field called Id that is its primary key. You can configure it to use a different field with the ModelConfig.Id() config mapping.
Redis Persistance
2) Redis supports 2 types persistence modes out-of-the-box RDB and Append Only File (AOF). RDB writes routine snapshots whilst the Append Only File acts like a transaction journal recording all the changes in-between snapshots - I recommend adding both until your comfortable with what each does and what your application needs. You can read all Redis persistence at http://redis.io/topics/persistence.
Note Redis also supports trivial replication you can read more about at: http://redis.io/topics/replication
Redis loves RAM
3) Since Redis operates predominantly in memory the most important resource is that you have enough RAM to hold your entire dataset in memory + a buffer for when it snapshots to disk. Redis is very efficient so even a small AWS instance will be able to handle a lot of load - what you want to look for is having enough RAM.
Visualizing your data with the Redis Admin UI
Finally if you're using the ServiceStack C# Redis Client I recommend installing the Redis Admin UI which provides a nice visual view of your entities. You can see a live demo of it at:
http://servicestack.net/RedisAdminUI/AjaxClient/

Best way to manage redis data

Just getting started with redis, and I'm having a hard time managing the redis data.
Are there any tools that help give a visualization of my applications redis data?
Try Keylord - cross-platform GUI application for manage key-value databases like Redis, LevelDB, etc.
support Redis and LevelDB key-value databases
SSH tunnels for Redis connections
display keys in flat and hierarchical views
can load millions of keys in background (use SCAN* command)
can create/read/update/delete keys of different types
clear and predictable UI
There is Redis Admin UI / on github, it is .NET based.
I have not tried it myself, but the screenshots and the live demo look promising.
There is also phpRedisAdmin from ErikDubbelboer, which is working according to the poster of this very similar question: phpMyAdmin equivalent to MySQL for Redis?
At my company, when developing in Redis and dealing with a large number of keys, we create and maintain a custom management page while developing. The reason for this is that it allows us to create the best 'custom' representation of the data.
I think AnotherRedisDesktopManager is a useful tool for managing redis, faster, better and more stable. What's more, it won't crash when loading a large number of keys.
I am working on a tool like that (phpMyRedis), but there are no current working tools like that I know of.
You can try FastoRedis site programm - crossplatform Redis GUI client based on redis-cli.