I am attempting to get waveform data into a redis stream.
The waveform consists of 250,000 floats and is being ingested at 1hz
What I have attempted so far is to do a single xadd with said waveform as a byte string. This fits in nicely with my time constraint, but
makes it difficult to digest into applications reading from redis since they have to read the entire waveform.
This xadd with 1MB payload takes ~5ms
What I'd like to do is a single xadd per element. That way I could leverage xrange when digesting the data.
This is taking a long time, even on the loopback. Utilizing a pipeline , I can get the xadd to average out around 50 microseconds; however I am still off by a factor of 10 to even be close to fitting into 1 second.
Is this even possible with redis / redis plus plus? Am I using the wrong data structure?
Thanks in advance
Side note:
i can get roughly 10 % better by providing a timestamp to xadd compared to using "*"
Environment :
Ubuntu-latest docker container
gcc 8.1
redis-stable
hiredis-master
redispluplus-master
Related
I've been trying to make replay system. So basically when player moves, system saves his datas(movements, location, animation etc.) into JSON file. In the end of the record, JSON file may be over 50 MB. I'd want to save this data into Redis with expire date (24-48 hours).
My questions are;
Is it bad to save over 50 MB into Redis with expire date?
How many datas that over 50 MB can Redis handle without performance loss?
If players make 500 records in 48 hours, may it be bad for Redis?
How many milliseconds does it takes 50 MB data from Redis with average VDS/VPS?
Storing a large object(in terms of size) is not a good practice. You may read it from here. One of the problem is network. You need to send 50MB payload to a redis server in a single call. Also if you save them as one big object, then while retrieving, updating it (a single field, element etc), you need to get 50 MB back from server and parse it to get a single field, update it back end send back to server. That's a serious problem in terms of network.
Instead of redis strings, you may prefer sorted sets or lists depending on your use case. If you are going to store them with timestamps and get the range of events between these timestamps, then sorted sets may be an ideal solution for you. It's good for pagination etc. One of the crucial drawback is the complexity of adding a new element is O(log(N)).
lists may also provide a good playground for your case. You may use LPUSH/RPUSH to add new events to your list, and since Redis lists are implemented with linked lists, both adding a message to the beginning or end of the list is same, O(1), which is great.
Whenever an event happens, you either call ZADD or RPUSH/LPUSH to send the events to redis. If you need to query those to you may use available functions such as ZRANGEBYSCORE or LRANGE depending on your choice.
While designing your keys you may use an identifier such as user-id just like you mentioned in the comments. You will not have the problems with lists/sorted sets like you will have in strings. But choosing which one is most suitable for your depends on your use case for reads/writes or business rules.
Here some useful links to read;
Redis data types intro
Redis data types
Redis labs documentation about data types
Am I missing something, or is there no way to generate backpressure with Redis streams? If a producer is pushing data to a stream faster consumers can consume it, there's no obvious way to signal to the producer that it should stop or slow down.
I expected that there would be a blocking version of XADD, that would block the client until room became available in a capped stream (similar to the blocking version of XREAD that allows consumers to wait until data becomes available), but this doesn't seem to be the case.
How do people deal with the above scenario — signaling to a producer that it should hold off on adding more items to a stream?
I understand that some data stream systems such as Kafka do not require backpressure, but Redis doesn't appear to have a comparable solution, and it seems like this would be a relatively common problem for many Redis streams use cases.
If you have persistence (either RDB or AOF) turned on, your stream messages will be persisted, hence there's no need for backpressure.
And if you use replicas, you have another level of redudancy.
Backpressure is needed only when Redis does not have enough memory (or enough network bandwidth to the replicas) to hold the messages.
And, honestly, I have never seen this scenario.
Why would you want to ? Unless you run out of memory it is not an issue and each consumer slow and fast can read at their leisure.
Note not using consumer groups just publishing via XADD and readers read via XRANGE via position stored in a key which is closer to Kafka. Using one stream per partition.
Producer can check if the table size gets too big every 1K messages (via XLEN) to slow down if this is an issue and cant you cant throw HW at it 5 nodes with 20 Gig each is pretty easy with the streams spread across the cluster .. Don't understand this should be easy so im probably missing something.
There is also an XADD version that trims the size of the table to ensure you don't over fill with the above but that world require some pretty extreme stuff. For us this is 2 days worth for frequent stuff which sends the latest state and 9 months for others.
Another thing dont store large messages in the stream , use a blob or separate key/ store.
We got "device overload" error after the program ran successfully on production for a few months. And we find that some maps' sizes are very big, which may be bigger than 1,000.
After I inspected the source code, I found that the reason of "devcie overload" is that the write queue is beyond limitations, and the length of the write queue is related to the effiency of processing.
So I checked the "particle_map" file, and I suspect that the whole map will be rewritten even if we just want to insert one pair of KV into the map.
But I am not so sure about this. Any advice ?
So I checked the "particle_map" file, and I suspect that the whole map will be rewritten even if we just want to insert one pair of KV into the map.
You are correct. When using persistence, Aerospike does not update records in-place. Each update/insert is buffered into an in-memory write-block which, when full, is queued to be written to disk. This queue allows for short bursts that exceed your disks max IO but if the burst is sustained for too long the server will begin to fail the writes with the 'device overload' error you have mentioned. How far behind the disk is allowed to get is controlled by the max-write-cache namespace storage-engine parameter.
You can find more about our storage layer at https://www.aerospike.com/docs/architecture/index.html.
I have a buffer that needs to read all values(hash, field and values) from redis after reboot, is there a way to do that in a fast way? I have approximately 100,000 hashes with 4 fields each.
Thanks!
EDIT:
The Slow way: Current Implementation is getting all the hashes using
Keys *
then
HGETALL xxx
to get all the fields' values.
There are two ways to approach this problem.
The first one is to try to optimize the KEYS/HGETALL combination you have described. Because you do not have millions of keys (100K is not so high by Redis standard), the KEYS command will not block the instance for a long time, and the output buffer size required to return 100K items is probably acceptable. Once the list of keys have been received by your program, then the next challenge is to run many HGETALL commands as fast as possible. The key is to pipeline them (for instance in synchronous batches of 1000 items) which is quite easy to implement with hiredis (just use redisAppendCommand / redisGetReply). The 100K items will be retrieved in 100 roundtrips only. Because most Redis instances can sustain 100K op/s or more, it should not last more than a few seconds. A more efficient solution would be to use the asynchronous interface of hiredis to try to maximize the throughput, but it is more complex to implement. I'm not sure it is worth it for 100K items.
The second approach is to use a BGSAVE command to take a snapshot of Redis content, retrieve the generated dump file, and then parse the file to extract the data. You can have a look at the excellent redis-rdb-tools package for a Python implementation. The main benefit of this approach is there is no impact on the Redis instance (no KEYS command to block the event loop) while still retrieving consistent data.
I've got jobs that I'm planning to send to workers via REDIS Pub/Sub. Job involves processing an image (JPEG, 20KB-800KB, typically around 150KB).
Is it a good idea to send the image directly as the message's payload?
I don't see this as a problem at all. If you are confident your subscriber(s)/worker(s) will be able to keep up and you won't risk running out of RAM then I think this is a valid approach. I don't know if its better than nginx streaming as suggested, but being an in-memory data store redis should scale pretty close to the hardware and network limits.
Keep in mind that redis pub/sub is not "durable" so if an image is published to a channel no one is currently subscribed to it won't get picked up. The image would just go nowhere.
You could build a durable queue pretty easy using a redis List if you need durability.
You can encode the JPEG file by base64 into string, and publish the string to the channel.
The size of send data(payload JPEG file) would be increased to about 1.5x to 2x.