i have key-values like following example
KEY VALUE
key1 1
key2 2
key3 3
. .
. .
keyN N
each of my key needs to map a unique number so i am mapping my keys to auto incremented numbers then inserting it to Redis via redis mass insertion which works very well and then using GET command for internal processing of all the key value mapping.
but i have more than 1 billion key so i was wondering is there even more efficient(mainly lesser memory usage) way for using Redis for this scenario?
Thanks
You can pipeline commands into Redis to avoid the round-trip times like this:
{ for ((i=0;i<10000000;i++)) ; do printf "set key$i $i\r\n"; done ; sleep 1; } | nc localhost 6379
That takes 80 seconds to set 10,000,000 keys.
Or, if you want to avoid creating all those processes for printf, generate the data in a single awk process:
awk 'BEGIN{for(i=0;i<10000000;i++){printf("set key%d %d\r\n",i,i)}}'; sleep 1; } | nc localhost 6379
That now takes 17 seconds to set 10,000,000 keys.
The auto-increment key allows a unique number to be generated when a new record is inserted into a table/redis.
There is other way using UUID.
But I think auto-increment is far better due to reason like it need four time more space, ordering cannot be done based on key,etc
I'm doing exactly the same thing.
here is an simple example.
if you have a better one, welcome to discuss :)
1. connect to redis
import redis
pool = redis.ConnectionPool(host=your_host, port=your_port)
r = redis.Redis(connection_pool=pool)
2.define a function to incr, use pipe
def my_incr(pipe):
next_value = pipe.hlen('myhash')
pipe.multi()
pipe.hsetnx(
name='myhash',
key=newkey, value=next_value
)
3.make the function become a transaction
pipe = r.pipeline()
newkey = 'key1'
r.transaction(my_incr, 'myhash')
In order to be more memory efficient, you can use HASH to store these key-value pairs. Redis has special encoding for small HASH. It can save you lots of memory.
In you case, you can shard your keys into many small HASHs, each HASH has less than hash-max-ziplist-entries entries. See the doc for details.
B.T.W, with the INCR command, you can use Redis to create auto-incremented numbers.
I would like to answer my own question.
If you have sorted key values, the most efficient way to bulk insert and then read them is using a B-Tree based database.
For instance, with MapDB I am able to insert it very quickly and it takes up less memory.
Related
I have a dozen of REDIS Keys of the type SET, say
PUBSUB_USER_SET-1-1668985588478915880,
PUBSUB_USER_SET-2-1668985588478915880,
PUBSUB_USER_SET-3-1668988644477632747,
.
.
.
.
PUBSUB_USER_SET-10-1668983464477632083
The set contains a userId and the problem statement is to check if the user is present in any of the set or not
The solution I tried is to get all the keys and append with a delimiter (, comma) and pass it as an argument to lua script wherein with gmatch operator I split the keys and run sismember operation until there is a hit.
local vals = KEYS[1]
for match in (vals..","):gmatch("(.-)"..",") do
local exist = redis.call('sismember', match, KEYS[2])
if (exist == 1) then
return 1
end
end
return 0
Now as and when the number of keys grows to PUBSUB_USER_SET-20 or PUBSUB_USER_SET-30 I see an increase in latency and in throughput.
Is this the better way to do or Is it better to batch LUA scripts where in instead of passing 30keys as arguments I pass in batches of 10keys and return as soon as the user is present or is there any better way to do this?
I would propose a different solution instead of storing keys randomly in a set. You should store keys in one set and you should query that set to check whether a key is there or not.
Lets say we've N sets numbered s-0,s-1,s-2,...,s-19
You should put your keys in one of these sets based on their hash key, which means you need to query only one set instead of checking all these sets. You can use any hashing algorithm.
To make it further interesting you can try consistent hashing.
You can use redis pipeline with batching(10 keys per iteration) to improve the performance
I am looking on LPOP and LPOPRPUSH as a valid options for an atomic action to pop value.
However I have a job that every 2 seconds pops 1000 values from that list - which is 1000 requests to Redis.
I would have used SPOP which can return X values back in one request. But those are randon ones and not the most left ones.
I do need to pop them from the left side of the list.
What are my options to do it the fastest, without locking and atomic? I have multiple servers that access this list and I can't retrieve duplicate values (That's why LRANGE doesn't work for me)
EDIT
The more I'm thinking about it the more I see that I need to compromise and use SPOP.
The scenario is batching inserts into the DB with Redis. Instead of thousands inserts a sec to MySQL - I'm pushing to Redis and every 2sec get the values and insert in one go to MySQL.
I guess I can use SPOP if I will add timestamp to the actual value in Redis, and to avoid the possibility where a value can be stuck in the set forever I will run a loop of SPOP x 1000 until null.
There're to options:
Use Lua script to pop N elements in a single EVAL command:
EVAL 'local result = {}; for i = 1, ARGV[1] do result[i] = redis.call("lpop", KEYS[1]) end; return result' 1 key N
Use Redis pipeline to send N LPOP commands to reduce RTT.
In my case I upload a lot of records to Redis sorted set, but I need to store only 10 highest scored items. I don't have ability to influence on the data which is uploaded (to sort and to limit it before uploading).
Currently I just execute
ZREMRANGEBYRANK key 0 -11
after uploading finish, but such approach looks not very optimal (it's slow and it will be better if Redis could handle that).
So does Redis provide something out of the box to limit count of items in sorted sets?
Nopes, redis does not provide any such functionality apart from ZREMRANGEBYRANK .
There is a similar problem about keeping a redis list of constant size, say 10 elements only when elements are being pushed from left using LPUSH.
The solution lies in optimizing the pruning operation.
Truncate your sorted set, once a while, not everytime
Methods:
Run a ZREMRANGEBYRANK with 1/5 probability everytime, using a random integer.
Use redis pipeline or Lua scripting to achieve this , this would even save the two network calls happening at almost every 5th call.
This is optimal enough for the purpose mentioned.
Algorithm example:
ZADD key member1 score1
random_int = some random number between 1-5
if random_int == 1: # trim sorted set with 1/5 chance
ZREMRANGEBYRANK key 0 -11
I have around 256 keys. Against each key I have to store a large number of non-repitative integers.
Following are the top 7 keys with number of total values (entries) against each key. Each value is a unique integer with large value.
Key No. of integers (values) in the list
Key 1 3394967
Key 2 3385081
Key 3 2172866
Key 4 2171779
Key 5 1776702
Key 6 1772936
Key 7 1748858
By default Redis consumes lot of memory in storing this data. I read that changing following parameters can result in memory usage reduction highly.
list-max-zipmap-entries 512
list-max-zipmap-value 64
Can anyone please explain me these above configuration commands (are 512 and 64 bytes?) and what changes I can make in the above configuration settings for my case to achieve memory usage reduction?
What should be kept in mind while selecting the values for entries and value in above command?
list-max-mipmap-entries 512:
list-max-zipmap-value 64
If the number of entries in a List exceeds 512, or if the size of any given element in the list > 64 bytes, Redis will switch to a less-efficient in-memory storage structure. More specifically, below those thresholds it will use a ziplist, and above it will use a linked list.
So in your case, you would need to use an entries value of > 1748858 to see any change (and then only in keys 8-end). Also note that for Redis to re-encode them to the smaller object size you would also need to make the change in the config and restart Redis as it doesn't re-encode down automatically.
To verify a given key is using a ziplist vs. linked list, use the OBJECTcommand.
For more details, see Redis Memory Optimization
IMO you can't achieve redis' memory optimization. In your case the entries in each list/set is around 3 million. In order to achieve memory optimization if you give the value of list-max-zipmap-entries as 3 million.
Redis doc says,
This operation is very fast for small values, but if you change the
setting in order to use specially encoded values for much larger
aggregate types the suggestion is to run some benchmark and test to
check the conversion time.
As per this encoding and decoding will take more time/CPU for that huge number. So it is better to run a benchmark test and then decide.
One alternative suggestion, if you only look up this sets to see whether a key is available or not. then you can change the Structure to a bucket kind of a thing.
For example a value 123456 set to key1 can be stored like this
Sadd key1:bucket:123 456
123 = 123456/1000
456 = 123456%1000
Note this won't work if you want to retrieve all the values for key1. In that case you would be looping through 1000 of sets. similarly for total size of key1 you have to loop through 1000 keys.
But the memory will be reduced to about 10 times.
For example i have an array/json with 100000 entries cached with Redis / Predis. Is it posible to update or delete 1 or more entries or do i have to generate the whole array/json of 100000 entries? And how can I achieve that?
It is about how you store them if you are storing it as a string then no,
set key value
get key -> will return you value
Here value is your json/array with 10000 entries.
Instead if you are storing it in a hash . http://redis.io/commands#hash
hmset key member1 value1 member2 value2 ...
then you can update/delete member1 separately.
If you are using sets/lists you can achieve it with similar commands like lpush/lpop, srem etc.
Do read the commands section to know more about redis data structures which will give you more flexibility in selecting your structure.
Hope this helps
If you are using cache service, you have to:
get data from cache
update some entries
save data back in cache
You could use advanced Redis data structures like Hashes, but you it is not supported by Cache service, you would need to write you own functions.
Thanks Karthikeyan Gopall, i made an example:
Here i changed field1 value and it works :)
$client = Redis::connection();
$client->hmset('my:hash', ['field1'=>'value1', 'field2'=>'value2']);
$changevalue= Redis::hset('my:hash' , 'field1' , 'newvaluesssssssssss');
$values1 = Redis::hmget('my:hash' , 'field1');
$values2 = Redis::hmget('my:hash' , 'field2');
print_r($values1);
print_r($values2);