When/how to write data in redis cache to SQL database? - sql

I have a fairly small relational database currently setup (SQLite, changing to PostgreSQL) that has some relatively simple many-to-one and many-to-many relations. The app uses Websockets to give real-time updates to any clients so I want to make any operations as quick as possible. I was planning on using redis to cache parts of the data in memory as required (parts that will be read/written frequently) so that queries will be faster. I know with the database currently so small, performance gains aren't going to be noticeable but I want it to be scalable.
There seems to be a lot of material/information suggesting using redis as a cache is a good idea, but I'm struggling to find much information about when it is suitable to write updates to the SQL database and how is best to do it.
For example, should I write updates to the redis-store, then send updated data out to clients and then write the same update to the SQL database all in the same request (in that order)? (i.e. more frequent smaller writes)
Or should I simply just write updates to the redis-store and send the updated data out to clients. Then, periodically (every minute?) read back from the redis-store and subsequently save it in the SQL database? (i.e. less frequent but larger writes)
Or is there some other best practice for keeping a redis store and SQL database consistent?
Would my first example perform poorly due to the larger number of writes to disk and the CPU being more active or would this be negligible?

Related

Using Redis as main database in production

I have used Redis before in many projects, usually as a temporary cache, or to share small pieces of data between different subsystems. you get the idea. But now, I'm building a service that mainly revolves around the concept of a key-value store and needs super-fast data retrieval, so I'm using Redis. And because of this, all my data would be on Redis (it's not just a temporary cache or something of that sort). So I'm wondering, is this OK or should I take backups to another more "stable" database like MongoDB or MySQL?
Note that I'm not talking about the usual periodic backups. I would do that regardless of what database I end up using.

Cons of using MemoryCache as a temporary copy of DB table

I have a site where you can list your car for sale. There is a list and a map with filtering on car types and other car specifications. My idea was to cache cars table and use that to filter on when user is searching for a car on the website. Currently, especially when zooming in/out on the map, each time user does that, http request is made and it's querying the database, and that can be slow and heavy on the server.
As an experiment with 1 000 items, I have cached map data (trimmed data with only basic info) and it's working fine. I was thinking of doing a basically copy of cars table instead with all needed joins added in Memory Cache and use that instead of querying the DB every request for both list and the map. I would have Cron Job every 5 minutes (as data can change, but it doesn't have to be immediate) to update Memory Cache with latest cars data from DB.
What would be the cons of using this approach in long term and for using it for example storing 100 000 records? Beside server needing more RAM, would there be any concerns about scalability or usability of this approach? Would it be better to use Redis instead?
I do have in place now "search as you type" service, but I don't really need that functionality as filtering is pretty exact, I have added it more as a caching server but I think I would be better off just using Memory Cache until a real need for that kind of service is required.
Thank you
Since memory isn’t infinite, we need to limit the number of items stored in the In-Memory cache.
MemoryCache VS Redis
MemoryCache
MemoryCache is embedded in the process , hence can only be used as a plain key-value store from that process.
Redis
Redis is a remote data structure server. It is certainly slower than just storing the data in local memory.
I conclude that MemoryCache is running in the web server of the current application, and it is limited by the performance of the web server. Of course, it will be very fast under the same configuration. I think the disadvantage is that the stored data cannot be shared with other applications.
If redis is used, reading data directly from memory is not as fast as memorycache, but it has high reliability and high scalability.
Related Post:
1. How to update redis after updating database?
2. how to keep caching up to date
3. How can MySQL update data in real time in redis cache?

What is a recommended scalable DB platform to use in AWS for large amounts of volatile data sets - elasticsearch, Redis or DynamoDB?

Users of our platform will have large amounts of stored data on our system. Through an application, once connected, that data will be transferred to them and no longer need to remain on our servers. There could potentially be hundreds or thousands of users connected at any given time, performing their downloads.
Here's the proposed architecture:
User management, configuration, and data download statistics will be maintained in a SQL Server database, while using either Redis or DynamoDB for the large data sets.
The reason for choosing either Redis or DynamoDB is based on cost - cheaper than running another SQL Server instance, and performance. The data format will be similar to a datamart - flat table with no joins.
Initially the queries would be simple - get all data for user X between a date range, and optionally delete.
Since we may want to add free text searching for certain fields of that data using elasticsearch may be a better option to use from the get-go.
I want this to be auto-scaling but not sure which database would be best to use for this scenario.
Here's some great discussion on Database + Search tier from AWS ReInvent:
https://youtu.be/K7o5OlRLtvU?t=1574
I would not take Elastic-search alone because it does not provide auto-scaling for writing capacity. In fact, it's not trivial to augment the number of shard of an index. Secondly it can only handle the JSON format, which could be an issue for you.
Redis could be a good idea because it is really fast, everything is done in RAM, and it provides keys with a limited time-to-live which could be interesting for you. Unfortunately, if your data size exceeds the capacity in RAM of your amazon instance you will have to shard your Redis database. And Redis does not support it, you will have to deal it on your application code. Moreover, as far as I know Redis does not handle complex queries. You will also need to save your data in a Redis data structure which could be an issue for you
DynamoDB handles auto-scaling really well but on the other hand it is a key/value database so it does not allow you to make queries like "get all data for user X between a date range". DynamoDB also allows you to save your data in any format.
The solution will be to use either DynamoDB or either Redis depending of the size of your datas, and to use ElasticSearch in order to index your key with only the meta-data (user and dates). Like that your index will be small, and if you lost the ability to index because of ElasticSearch get too buzy, you keep the ability to save user's datas.

Strategy for keeping separate Databases in Sync

I have a NoSQL database that we are using for data processing, as it can be used for my application faster than SQL can. I'm treating our NoSQL database almost like a cache of information, with the SQL being the authority of data, and the NoSQL store being updated with changes. Right now this is being done through our application, so when a request comes in for a change, it is made in the SQL database, and the NoSQL database. This is failing at times as sometimes the NoSQL update fails, or other situations cause the NoSQL database to get out of sync.
I could do a batch update every X minutes, however it is a lot of information in the data stores, and it would take hours to ensure that they are in sync. We have some timestamps to do a difference of what has been changed, but this is not always accurate.
I'm wondering what some recommended strategy for keeping a data store(secondary database cache) in sync with my main store are?
I know I've done with this with messaging in the past - specifically JMS with ActiveMQ. I would send the updates to a NoSQL store (Mongo) by using a queue. This way messages could accumulate in the queue and if the connection to the NoSQL store ever got severed, it could pick up where it left off.
It worked really well because ActiveMQ was really stable and simple to work with.
I've always seen this done with diffs like you mentioned. You introduce date fields all over and then keep track of the latest sync. The nice thing about this approach is that it easily allows you to replay transactions by modifying the last sync date.
One last piece of advice ... write good tools around pumping data from point A to point B (in this case SQL to NoSQL). I wrote several tools to bulk load the NoSQL store from SQL at my last job and it made life easy if anything got really out of sync. Between scripts and bulk loading processes, I could always recover.

Replication or Message Queue?

my application (in US) is generating data to one of the tables in database (20 record/min)
i want to replicate the data incrementally to server located in Germany
Whats the best solution to incrementally replicate those records newly added. (Application like messaging queue or using SQL Server replication)
Is there any incremental replication exists in SQL server that also zipping data for best performance or i should do it with custom application for gaining better berformance ???
My answer would depend upon your current application architecture.
Are you already using a message queue or other middleware layer? If so, then it should be fairly straightforward to duplicate those messages for a cross-ocean queue. If you do not have such a layer, then adding one is likely going to add complexity that is unwarranted.
Otherwise, most database systems have a replication engine of some sort. You don't actually specify, but I am guessing Microsoft SQL server in this case?
Unless your data is quite large, 20 updates per minute should not tax any replication scheme, nor cross ocean link; "zip"ped or otherwise for that matter. Zip or other compression schemes will help the most if your data is sparse or repetitive, is it mostly text? If numeric and a mix of data in those 20 records, the savings will not be as large.
Good Luck