Caching temporary data - PostgreSQL and Mongo - sql

I have some data from an API I need to cache. This data I want invalidated after X days, but I want it available locally to save time querying and compiling things for the end user.
Presently I have a PostgreSQL database. I want to keep this around because there's permanent data like user records I don't want to put in Mongo (unless you guys can convince me otherwise). I really have nothing against Mongo, but I can normalize some things with users and the only way I could think to do it without massive amounts of duplication is via PostgreSQL.
Now my API data is flat, and in JSON. I don't need to create any sort of link to any other table and it has a field that I can use as a key pretty easily. My idea is to literally "throw" the data into a Mongo instance and query as needed, invaliding every X days. This also offers some persistence should the server go down for whatever reason.
So my questions to you guys are this. Is this a good use case for Mongo over memcached? Should I just memcached the raw data instead? If you guys do suggest Mongo, should I move my users table and the relations over to Mongo as well?
Thanks!

This is the sort of thing Redis is really good for. Redis, possibly with selective cache invalidation via PostgreSQL's LISTEN and NOTIFY, is a pretty low pain way to manage caching.
Another option is to use UNLOGGED tables in PostgreSQL.

Related

Proper way of caching data in Redis

I'm trying to build a freelance platform with using MongoDB as main database and RedisDB for caching, but really couldn't figure out which way is the proper way of caching. Basically I'll store jwt tokens, verification codes and other stuff with expiration date. On the other hand, let's say I'll store 5 big collection as Gigs, JobOffers, Reviews, Users, Companies. I also want to use query them.
Example Use Case 1
Getting job offers only categorised as "Web Design"
Example Use Case 2
Getting job offers only posted by Company X
Option 1
for these two queries i can create two hashes
hash1
"job-offers:categoryId", jobOfferId, JobOffer
hash2
"job-offers:companyId", jobOfferId, JobOffer
Option 2
Using RedisJson and RedisSearch for querying and holding everything in JSON format
Option 3
Using redisSearch with creating multiple hashes
I couldn't figure out which approach will be best, or is there any other approach which is better than both of them.
Option 1 seems like suitable for your scenario. Binding job offers with category or company ids is the smartest solution.
You can use HGETALL to get all fields data from your hashset.
When using redis as a request caching mechanism, please remember that you have to keep redis cache updated consistently if it is generated from sql or no-sql db.
good question
as far as I can see, data of redis/part of mongo is stored on RAM, and RAM is more expensive than hard disk, if you don`t care about the price, and you can handle the situations by redis/mongo, and the data can be recovered from AOF/RDB files(or things like that), you can use whichever you want
If you do care about the price of RAM, probably just use a mysql and use
engine of InnoDB cuz it is cheap and on disk and it can recover and you know a lot of people use them(mysqls,postgres)
If I were you, I probably would choose mysql InnDB, and make the right index, it is fast enough for tables that hold millions of rows.(will get not so good if there are hundreds million rows)

How to find an item in a Redis queue

I use Redis as my queuing engine, and now there are millions of items in my queue. I need to find an item there, and watch its properties.
If it was SQL Server or any type of RDBMS, I could use SQL language and execute a query against database to find the record. But in Redis queue, I can only push from one side, and pop from the same side, or the other side.
How can I do that?
With the vague nature of the question, we can only give you a vague answer.
You need to create secondary indexes to store find your data. As you already have it figured out that you cannot run SQL like queries therefore you should look at the following link
http://redis.io/topics/indexes
One important point that you should consider is the following (taken from the above link)
Implementing and maintaining indexes with Redis is an advanced topic, so most users that need to perform complex queries on data should understand if they are better served by a relational store. However often, especially in caching scenarios, there is the explicit need to store indexed data into Redis in order to speedup common queries which require some form of indexing in order to be executed.

Redis full text search : reverse indexing or sunspot?

I have 3,5 millions records (readonly) actually stored in a MySQL DB that I would want to pull out to Redis for performance reasons. Actually, I've managed to store things like this into Redis :
1 {"type":"Country","slug":"albania","name_fr":"Albanie","name_en":"Albania"}
2 {"type":"Country","slug":"armenia","name_fr":"Arménie","name_en":"Armenia"}
...
The key I use here is the legacy MySQL id, so with some Ruby glue, I can break as less things as possible in this existing app (and this is a serious concern here).
Now the problem is when I need to perform a search on the keyword "Armenia", inside the value part. Seems like there's only two ways out :
Either I multiplicate Redis index :
id => JSON values (as shown above)
slug => id (reverse indexing based on the slug, that could do the basic search trick)
finally, another huge index specifically for autocomplete, as shown in this post : http://oldblog.antirez.com/post/autocomplete-with-redis.html
Either I use sunspot or some full text search engine (unfortunatly, I actually use ThinkingSphinx which is too much tied to MySQL :-(
So, what would you do ? Do you think the MySQL to Redis move of a single table is even a good idea ? I'm afraid of the Memory footprint those gigantic Redis key/values could take on a 16GB RAM Server.
Any feedback on a similar Redis usage ?
Before I start with a real answer, I wanted to mention that I don't see a good reason for you to be using Redis here. Based on what types of use cases it sounds like you're trying to do, it sounds like something like elasticsearch would be more appropriate for you.
That said, if you just want to be able to search for a few different fields within your JSON, you've got two options:
Auxiliary index that points field_key -> list_of_ids (in your case, "Armenia" -> 1).
Use Lua on top of Redis with JSON encoding and decoding to get at what you want. This is way more flexible and space efficient, but will be slower as your table grows.
Again, I don't think either is appropriate for you because it doesn't sound like Redis is going to be a good choice for you, but if you must, those should work.
Here's my take on Redis.
Basically I think of it as an in-memory cache that can be configured to only store the least recently used data (LRU). Which is the role I made it to play in my use case, the logic of which may be applicable to helping you think about your use case.
I'm currently using Redis to cache results for a search engine based on some complex queries (slow), backed by data in another DB (similar to your case). So Redis serves as a cache storage for answering queries. All queries either get served the data in Redis or the DB if it's a cache-miss in Redis. So, note that Redis is not replacing the DB, but merely being an extension via cache in my case.
This fit my specific use case, because the addition of Redis was supposed to assist future scalability. The idea is that repeated access of recent data (in my case, if a user does a repeated query) can be served by Redis, and take some load off of the DB.
Basically my Redis schema ended up looking somewhat like the duplication of your index you outlined above. I used sets and sortedSets to create "batches / sets" of redis-keys, each of which pointed to specific query results stored under a particular redis-key. And in the DB, I still had the complete data set and an index.
If your data set fits on RAM, you could do the "table dump" into Redis, and get rid of the need for MySQL. I could see this working, as long as you plan for persistent Redis storage and plan for the possible growth of your data, if this "table" will grow in the future.
So depending on your actual use case and how you see Redis fitting into your stack, and the load your DB serves, don't rule out the possibility of having to do both of the options you outlined above (which happend in my case).
Hope this helps!
Redis does provide Full Text Search with RediSearch.
Redisearch implements a search engine on top of Redis. This also enables more advanced features, like exact phrase matching, auto suggestions and numeric filtering for text queries, that are not possible or efficient with traditional Redis search approaches.

What's the best way to get a 'lot' of small pieces of data synced between a Mac App and the Web?

I'm considering MongoDB right now. Just so the goal is clear here is what needs to happen:
In my app, Finch (finchformac.com for details) I have thousands and thousands of entries per day for each user of what window they had open, the time they opened it, the time they closed it, and a tag if they choose one for it. I need this data to be backed up online so it can sync to their other Mac computers, etc.. I also need to be able to draw charts online from their data which means some complex queries hitting hundreds of thousands of records.
Right now I have tried using Ruby/Rails/Mongoid in with a JSON parser on the app side sending up data in increments of 10,000 records at a time, the data is processed to other collections with a background mapreduce job. But, this all seems to block and is ultimately too slow. What recommendations does (if anyone) have for how to go about this?
You've got a complex problem, which means you need to break it down into smaller, more easily solvable issues.
Problems (as I see it):
You've got an application which is collecting data. You just need to
store that data somewhere locally until it gets sync'd to the
server.
You've received the data on the server and now you need to shove it
into the database fast enough so that it doesn't slow down.
You've got to report on that data and this sounds hard and complex.
You probably want to write this as some sort of API, for simplicity (and since you've got loads of spare processing cycles on the clients) you'll want these chunks of data processed on the client side into JSON ready to import into the database. Once you've got JSON you don't need Mongoid (you just throw the JSON into the database directly). Also you probably don't need rails since you're just creating a simple API so stick with just Rack or Sinatra (possibly using something like Grape).
Now you need to solve the whole "this all seems to block and is ultimately too slow" issue. We've already removed Mongoid (so no need to convert from JSON -> Ruby Objects -> JSON) and Rails. Before we get onto doing a MapReduce on this data you need to ensure it's getting loaded into the database quickly enough. Chances are you should architect the whole thing so that your MapReduce supports your reporting functionality. For sync'ing of data you shouldn't need to do anything but pass the JSON around. If your data isn't writing into your DB fast enough you should consider Sharding your dataset. This will probably be done using some user-based key but you know your data schema better than I do. You need choose you sharding key so that when multiple users are sync'ing at the same time they will probably be using different servers.
Once you've solved Problems 1 and 2 you need to work on your Reporting. This is probably supported by your MapReduce functions inside Mongo. My first comment on this part, is to make sure you're running at least Mongo 2.0. In that release 10gen sped up MapReduce (my tests indicate that it is substantially faster than 1.8). Other than this you can can achieve further increases by Sharding and directing reads to the the Secondary servers in your Replica set (you are using a Replica set?). If this still isn't working consider structuring your schema to support your reporting functionality. This lets you use more cycles on your clients to do work rather than loading your servers. But this optimisation should be left until after you've proven that conventional approaches won't work.
I hope that wall of text helps somewhat. Good luck!

Versioning data in SQL Server so user can take a certain cut of the data

I have a requirement that in a SQL Server backed website which is essentially a large CRUD application, the user should be able to 'go back in time' and be able to export the data as it was at a given point in time.
My question is what is the best strategy for this problem? Is there a systematic approach I can take and apply it across all tables?
Depending on what exactly you need, this can be relatively easy or hell.
Easy: Make a history table for every table, copy data there pre update or post insert/update (i.e. new stuff is there too). Never delete from the original table, make logical deletes.
Hard: There is an fdb version counting up on every change, every data item is correlated to start and end. This requires very fancy primary key mangling.
Just add a little comment to previous answers. If you need to go back for all users you can use snapshots.
The simplest solution is to save a copy of each row whenever it changes. This can be done most easily with a trigger. Then your UI must provide search abilities to go back and find the data.
This does produce an explosion of data, which gets worse when tables are updated frequently, so the next step is usually some kind of data-based purge of older data.
An implementation you could look at is Team Foundation Server. It has the ability to perform historical queries (using the WIQL keyword ASOF). The backend is SQL Server, so there might be some clues there.