Redis Sorted Set ... store data in "member"? - redis

I am learning Redis and using an existing app (e.g. converting pieces of it) for practice.
I'm really struggling to understand first IF and then (if applicable) HOW to use Redis in one particular use-case ... apologies if this is super basic, but I'm so new that I'm not even sure if I'm asking correctly :/
Scenario:
Images are received by a server and info like time_taken and resolution is saved in a database entry. Images are then associated (e.g. "belong_to") with one Event ... all very straight-forward for a RDBS.
I'd like to use a Redis to maintain a list of the 50 most-recently-uploaded image objects for each Event, to be delivered to the client when requested. I'm thinking that a Sorted Set might be appropriate, but here are my concerns:
First, I'm not sure if a Sorted Set can/should be used in this associative manner? Can it reference other objects in Redis? Or is there just a better way to do this altogether?
Secondly, I need the ability to delete elements that are greater than X minutes old. I know about the EXPIRE command for keys, but I can't use this because not all images need to expire at the same periodicity, etc.
This second part seems more like a query on a field, which makes me think that Redis cannot be used ... but then I've read that I could maybe use the Sorted Set score to store a timestamp and find "older than X" in that way.
Can someone provide come clarity on these two issues? Thank you very much!
UPDATE
Knowing that the amount of data I need to store for each image is small and will be delivered to the client's browser, can is there anything wrong with storing it in the member "field" of a sorted set?
For example Sorted Set => event:14:pictures <time_taken> "{id:3,url:/images/3.png,lat:22.8573}"
This saves the data I need and creates a rapidly-updatable list of the last X pictures for a given event with the ability to, if needed, identify pictures that are greater than X minutes old ...

First, I'm not sure if a Sorted Set can/should be used in this
associative manner? Can it reference other objects in Redis?
Why do you need to reference other objects? An event may have n image objects, each with a time_taken and image data; a sorted set is perfect for this. The image_id is the key, the score is time_taken, and the member is the image data as json/xml, whatever; you're good to go there.
Secondly, I need the ability to delete elements that are greater than
X minutes old
If you want to delete elements greater than X minutes old, use ZREMRANGEBYSCORE:
ZREMRANGEBYSCORE event:14:pictures -inf (currentTime - X minutes)
-inf is just another way of saying the oldest member without knowing the oldest members time, but for the top range you need to calculate it based on current time before using this command ( the above is just an example)

Related

sdiff - limit the result set to X items

I want to get the diff of two sets in redis, but I don't need to return the entire array, just 10 items for example. Is there any way to limit the results?
I was thinking something like this:
SDIFF set1 set2 LIMIT 10
If not, are there any other options to achieve this in a performant way, considering that set1 can be millions of objects and set2 is much much smaller (hundreds).
More info would be helpful on what you want to achieve. Something like this might require you to duplicate your data. Though I don’t know if it’s something you want.
An option is chunking them.
Create a set with a unique generated id that can hold a max of 10 items
Create a sorted set like so…
zadd(key, timestamp, chunkid)
where your timestamp is a unix time and the chunkid is the key the connects to the set. The key can be the name of whatever you would like it to be or it could also be a uniquely generated id.
Use zrange to grab a specific one
(Repeat steps 1-3 for the second set)
Once you have your 1 result from both your sorted sets “zset”. You can now do your sdiff by using the chunkid.
Note that there is advantages and disadvantages in doing this. Like more connection consumption (if calling from a a client), and the obvious being a little more processing. Though it will help immensely if you put this in a lua script.
Hope this helps or at least gives you an idea on how to model your data. Though if this is critical data you might need to use a automated script of some sort to move your data around to meet the modeling requirement.

REDIS usecase using large keys with small values

I have a use-case for using redis that is a little bit different.
In my MySQL I have an entity, let's call it HumanEntity. this HumanEntity has many to many relations.
HumanEntity.Urls - Many URLs per HumanEntity.
HumanEntity.UserNames - Many UserNames per HumanEntity.
HumanEntity.Phones ...
HumanEntity.Emails ...
in a normal one hour, the application creates hundreds of these many values.
The use-case is that, the application receives an HTTP call (100 per one second) with a HumanEntity value (Url or UserName or Phone or Email).
I need to scan my MySQL (1,000,000 records) and return back the HumanEntity.Id(integer) .
Since its ok to have some latency in the data integrity I thought about REDIS.
Can I store the values as a Redis key and the and the HumanEntity.Id(integer) as the value.
My API needs to return back the HumanEntity.Id(integer).
does it make sense to have such long key and such short value? The URL, for example, maybe 1500 bytes and the value can be 1 byte.
What is the best redis method to implement that?
Thanks
If the values are not unique then you may have some problem. Phones, emails or usernames maybe unique for user but i am not sure about url or any other property stored in your database. You may overwrite the value of an identifier with another user's.
If you don't have any problem like that; you may proceed with string types, Time complexity of GET and SET is O(1) - that's the best you may get.
In some cases such as checking whether the user used any coupon, you may use long(let's say 64 chars) user id as key, and 1 as value and use EXISTS to determine it. So it's valid to use long key and short value.

Best data structure to store temperature readings over time

I used to work with SQL like MySQL, Postgres or MSSQL.
Now I want to play with Redis. I'm working on a little home project, that I think is the best choice for starting using Redis.
I have a machine that reads temperature (indoor and outdoor) and humidity. I need to store the readings into Redis. Can you help me to understand the best data structure to do so?
Other than this data I need to store the time (ex. unix timestamp) of the temperature reading for use plotting a graphic.
I installed Redis read the documentation, so I understand the commands and data types.
Since this is your first Redis project and it's a home project, I'd be careful about being to careful. Here's a couple ways to consider designing it (NOTE: I only dug deep into REDIS this past weekend so hopefully others will weigh in).
IDEA 1:
Four ordered sets
KEY for sets are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
VALUES are the temperatures / humidities
SCORE is the date stored as EPOCH
IDEA 2:
Four types of keys (best shown by example)
datetime_key = /year:2014/month:07/day:12/hour:07/minute:32/second:54
type_keys = [indoor_temps, outdoor_temps, indoor_humidity, outdoor_humidity]
keys are of form type + "/" + datetime_key
values are the temp and humidity itself
You probably want to implement some initial design and then work with the data immediately - graph it, do stats, etc. Whatever you plan to do with it. That will expose flaws and if they are major, flush the database and try again. These designs should really only take ~1 hour to implement since the only thing you're really changing is a few Redis commands and some string manipulation to convert the data to keys.
I like Tony's suggestions, but I'll also throw out another possibility.
4 lists
keys are "indoor_temps", "outdoor_temps", "indoor_humidity", "outdoor_humidity"
values are of the form < timestamp >_< reading > ie.( "1403197981_27.2" )
Push items onto the front of the list using LPUSH. Get a set of readings using LRANGE. The list will always be ordered by the time of the reading. Obviously split the value on "_" to get your time and reading...
In all honesty, this will give the same properties as Tony's first example, with slightly worse lookup performance, but better memory usage. I'm guessing for this project you'll be neither memory, nor CPU constrained, so the choice is probably not an issue. That said, if you expect to be saving 100's of thousands or more readings, I would suggest the list unless you want to consume a large portion of your system's memory.
Also, it's a good idea to call EXPIRE on your entries with some reasonable TTL that encompasses the length of time you want to save the readings for. If your plan is to have them live in perpetuity then you may want to look at backing them up to a disk DB over time, and just use Redis as a quick lookup cache for recent readings.
Thank to all answer, I choose this strucure:
4 lists: tempIN, tempOut, humidIN and humidOUT
values are: [value]:[timestamp]. For example: "25.4:1403615247"
As suggested from wallacer i want to backup old entries out from Redis.
For main frontend i need only last two days of sample.
For example i can create Redis RDB file snapshot and "trim" the live lists. This solution is not convenient in the event that, in the future you want to recover old values​​.
Do you have any tips on what kind of procedure to adopt to store the data? Maybe use of SQLIte DB?

Suggestions/Opinions for implementing a fast and efficient way to search a list of items in a very large dataset

Please comment and critique the approach.
Scenario: I have a large dataset(200 million entries) in a flat file. Data is of the form - a 10 digit phone number followed by 5-6 binary fields.
Every week I will be getting a Delta files which will only contain changes to the data.
Problem : Given a list of items i need to figure out whether each item(which will be the 10 digit number) is present in the dataset.
The approach I have planned :
Will parse the dataset and put it a DB(To be done at the start of the
week) like MySQL or Postgres. The reason i want to have RDBMS in the
first step is I want to have full time series data.
Then generate some kind of Key Value store out of this database with
the latest valid data which supports operation to find out whether
each item is present in the dataset or not(Thinking some kind of a
NOSQL db, like Redis here optimised for search. Should have
persistence and be distributed). This datastructure will be read-only.
Query this key value store to find out whether each item is present
(if possible match a list of values all at once instead of matching
one item at a time). Want this to be blazing fast. Will be using this functionality as the back-end to a REST API
Sidenote: Language of my preference is Python.
A few considerations for the fast lookup:
If you want to check a set of numbers at a time, you could use the Redis SINTER which performs set intersection.
You might benefit from using a grid structure by distributing number ranges over some hash function such as the first digit of the phone number (there are probably better ones, you have to experiment), this would e.g. reduce the size per node, when using an optimal hash, to near 20 million entries when using 10 nodes.
If you expect duplicate requests, which is quite likely, you could cache the last n requested phone numbers in a smaller set and query that one first.

Rails an MongoDB, how to get the last document inserted and be sure it is thread safe?

I need when I add a new document in my collection X to get the last document that was inserted in that same collection, because some values of that document must influence the document I am currently inserting.
Basically as a simple example I would need to do that:
class X
include Mongoid::Document
include Mongoid::Timestamps
before_save :set_sum
def set_sum
self.sum = X.last.sum + self.misc
end
field :sum, :type => Integer
field :misc, :type => Integer
end
How can I make sure that type of process will never break if there are concurrent insert? I must make sure that when self.sum = X.last.sum + self.misc is calculate that X.last.sum absolutely represents that last possible document inserted in the collection ?
This is critical to my system. It needs to be thread safe.
Alex
ps: this also needs to be performant, when there are 50k documents in the collections, it can't take time to get the last value...
this kind of behavior is equivalent to having an auto increment id.
http://www.mongodb.org/display/DOCS/How+to+Make+an+Auto+Incrementing+Field
The cleanest way is to have a side collection with one (or more) docs representing the current total values.
Then in your client, before inserting the new doc, do a findAndModify() that atomically updates the totals AND retrieves the current total doc.
Part of the current doc can be an auto increment _id, so that even if there are concurrent inserts, your document will then be correctly ordered as long as you sort by _id.
Only caveat: if your client app dies after findAndModify and before insert, you will have a gap in there.
Either that's ok or you need to add extra protections like keeping a side log.
If you want to be 100% safe you can also get inspiration from 2-phase commit
http://www.mongodb.org/display/DOCS/two-phase+commit
Basically it is the proper way to do transaction with any db that spans more than 1 server (even sql wouldnt help there)
best
AG
If you need to keep a running sum, this should probably be done on another document in a different collection. The best way to keep this running sum is to use the $inc atomic operation. There's really no need to perform any reads while calculating this sum.
You'll want to insert your X document into its collection, then also $inc a value on a different document that is meant for keeping this tally of all the misc values from the X documents.
Note: This still won't be transactional, because you're updating two documents in two different collections separately, but it will be highly performant, and fully thread safe.
Fore more info, check out all the MongoDB Atomic Operations.