I was wondering what the benefits of using user IDs were rather than just using the unique username for the keys.
user:USERNAME => {password: HASH, age: 45}
user:0 => {username: USERNAME, password: HASH, age: 45}
Note that I would wish to lookup users by name, therefore requiring a second set of key-values to relate usernames to IDs.
Given that every username is unique and has to be alphanumeric is there any particular reason why it may be beneficial to use the ID system.
The main reason I ask this is because I primarily used IDs as they decreased lookup times in my database, especially when the database was normalized in some way. But Redis doesn't gain this benefit and so I wondered what other reasons there may be to use user IDs instead of just the usernames.
Thanks for any help,
Pluckerpluck
p.s. Due to the way Redis handles hashes I am very actually unsure about the memory differences between the two methods so information on this may be good, though I may go and test this myself later.
Does not matter in the use case you defined, but using integer user ids has other advantages.
Sooner or later, you would want to reference user ids in other objects. For example, "Friends of user pluckerpluck". To model that, you'd have two alternatives -
friendsof:pluckerpluck -> SET{tom, dick, harry}
VS
friendsof:1 --> SET{2, 3, 4}
The former approach uses a regular Hash table to store the elements, and uses a lot of memory. The latter approach uses an integer array and is extremely memory efficient.
This special encoding for sets of integers is called IntSet. You can control the behaviour of Redis using the property set-max-intset-entries
# Sets have a special encoding in just one case: when a set is composed
# of just strings that happens to be integers in radix 10 in the range
# of 64 bit signed integers.
# The following configuration setting sets the limit in the size of the
# set in order to use this special memory saving encoding.
set-max-intset-entries 512
Related
For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?
Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)
This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.
What are the different key types supported by Redis? The documentation mentions all the various types (strings, set, hashmap, etc) of values supported by Redis, but I couldn't quiet find the key type information.
From redis documentation (Data types intro):
Redis keys
Redis keys are binary safe, this means that you can use any binary sequence as a key, from a string like "foo" to the content
of a JPEG file. The empty string is also a valid key. A few other
rules about keys:
Very long keys are not a good idea. For instance a key of 1024 bytes is a bad idea not only memory-wise, but also because the
lookup of the key in the dataset may require several costly
key-comparisons. Even when the task at hand is to match the
existence of a large value, hashing it (for example with SHA1) is a
better idea, especially from the perspective of memory and
bandwidth.
Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write
"user:1000:followers". The latter is more readable and the added
space is minor compared to the space used by the key object itself
and the value object. While short keys will obviously consume a bit
less memory, your job is to find the right balance.
Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word
fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".
The maximum allowed key size is 512 MB.
From my experience any binary sequence typically means a String, but I may not be familiar with languages where you can achieve this by using other data types.
Keys in Redis are all strings, so it doesn't really matter what kind of value you pass into a client. Under-the-hood the RESP protocol is used and it will pass the value as a string to the engine.
Example:
ZADD some_key 1 some_value
some_key is always a string, even if you pass 3 as key, it is handled as a string. This is true for every client.
I have a SQL table that is accessed continually but changes very rarely.
The Table is partitioned by UserID and each user has many records in the table.
I want to save database resources and move this table closer to the application in some kind of memory cache.
In process caching is too memory intensive so it needs to be external to the application.
Key Value stores like Redis are proving inefficient due to the overhead of serializing and deserializing the table to and from Redis.
I am looking for something that can store this table (or partitions of data) in memory, but let me query only the information I need without serializing and deserializing large blocks of data for each read.
Is there anything that would provide Out of Process in memory database table that supports queries for high speed caching?
Searching has shown that Apache Ignite might be a possible option, but I am looking for more informed suggestions.
Since it's out-of-process, it has to do serialization and deserialization. The problem you concern is how to reduce the serialization/deserizliation work. If you use Redis' STRING type, you CANNOT reduce these work.
However, You can use HASH to solve the problem: mapping your SQL table to a HASH.
Suppose you have the following table: person: id(varchar), name(varchar), age(int), you can take person id as key, and take name and age as fields. When you want to search someone's name, you only need to get the name field (HGET person-id name), other fields won't be deserialzed.
Ignite is indeed a possible solution for you since you may optimize serialization/deserialization overhead by using internal binary representation for accessing objects' fields. You may refer to this documentation page for more information: https://apacheignite.readme.io/docs/binary-marshaller
Also access overhead may be optimized by disabling copy-on-read option https://apacheignite.readme.io/docs/performance-tips#section-do-not-copy-value-on-read
Data collocation by user id is also possible with Ignite: https://apacheignite.readme.io/docs/affinity-collocation
As the #for_stack said, Hash will be very suitable for your case.
you said that Each user has many rows in db indexed by the user_id and tag_id . So It is that (user_id, tag_id) uniquely specify one row. Every row is functional depends on this tuple, you could use the tuple as the HASH KEY.
For example, if you want save the row (user_id, tag_id, username, age) which values are ("123456", "FDSA", "gsz", 20) into redis, You could do this:
HMSET 123456:FDSA username "gsz" age 30
When you want to query the username with the user_id and tag_id, you could do like this:
HGET 123456:FDSA username
So Every Hash Key will be a combination of user_id and tag_id, if you want the key to be more human readable, you could add a prefix string such as "USERINFO". e.g. : USERINFO:123456:FDSA .
BUT If you want to query with only a user_id and get all rows with this user_id, this method above will be not enough.
And you could build the secondary indexes in redis for you HASH.
as the above said, we use the user_id:tag_id as the HASH key. Because it can unique points to one row. If we want to query all the rows about one user_id.
We could use sorted set to build a secondary indexing to index which Hashes store the info about this user_id.
We could add this in SortedSet:
ZADD user_index 0 123456:FDSA
As above, we set the member to the string of HASH key, and set the score to 0. And the rule is that we should set all score in this zset to 0 and then we could use the lexicographical order to do range query. refer zrangebylex.
E.g. We want to get the all rows about user_id 123456,
ZRANGEBYLEX user_index [123456 (123457
It will return all the HASH key whose prefix are 123456, and then we use this string as HASH key and hget or hmget to retrieve infomation what we want.
[ means inclusive, and ( means exclusive. and why we use 123457? it is obvious. So when we want to get all rows with a user_id, we shoud specify the upper bound to make the user_id string's leftmost char's ascii value plus 1.
More about lex index you could refer the article I mentioned above.
You can try apache mnemonic started by intel. Link -http://incubator.apache.org/projects/mnemonic.html. It supports serdeless features
For a read-dominant workload MySQL MEMORY engine should work fine (writing DMLs lock whole table). This way you don't need to change you data retrieval logic.
Alternatively, if you're okay with changing data retrieval logic, then Redis is also an option. To add to what #GuangshengZuo has described, there's ReJSON Redis dynamically loadable module (for Redis 4+) which implements document-store on top of Redis. It can further relax requirements for marshalling big structures back and forth over the network.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key. But please be aware, scanning keys interrupt other Redis processes. So use this method if you really have to.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
using Memcache and REDIS combination on top of MYSQL comes to Mind.
What is the most convenient/fast way to implement a sorted set in redis where the values are objects, not just strings.
Should I just store object id's in the sorted set and then query every one of them individually by its key or is there a way that I can store them directly in the sorted set, i.e. must the value be a string?
It depends on your needs, if you need to share this data with other zsets/structures and want to write the value only once for every change, you can put an id as the zset value and add a hash to store the object. However, it implies making additionnal queries when you read data from the zset (one zrange + n hgetall for n values in the zset), but writing and synchronising the value between many structures is cheap (only updating the hash corresponding to the value).
But if it is "self-contained", with no or few accesses outside the zset, you can serialize to a chosen format (JSON, MESSAGEPACK, KRYO...) your object and then store it as the value of your zset entry. This way, you will have better performance when you read from the zset (only 1 query with O(log(N)+M), it is actually pretty good, probably the best you can get), but maybe you will have to duplicate the value in other zsets / structures if you need to read / write this value outside, which also implies maintaining synchronisation by hand on the value.
Redis has good documentation on performance of each command, so check what queries you would write and calculate the total cost, so that you can make a good comparison of these two options.
Also, don't forget that redis comes with optimistic locking, so if you need pessimistic (because of contention for instance) you will have to do it by hand and/or using lua scripts. If you need a lot of sync, the first option seems better (less performance on read, but still good, less queries and complexity on writes), but if you have values that don't change a lot and memory space is not a problem, the second option will provide better performance on reads (you can duplicate the value in redis, synchronize the values periodically for instance).
Short answer: Yes, everything must be stored as a string
Longer answer: you can serialize your object into any text-based format of your choosing. Most people choose MsgPack or JSON because it is very compact and serializers are available in just about any language.
I am using a permission model where I have a table user_permissions. This table will hold one or more columns with a certain bigint. I will use the bits of each decimal number to compare with certain permission rules (the bit location will be a permission rule and the value will be the condition of the rule active or not active).
The problem with this approach is that I have limited number of bits to work when using a number such as bigint.
What is the best column type I can use in this case that works in a cross-database environment?
The tags represent the technologies I am aiming for, so any other solution related to those technologies are appreciated.
I was thinking to use #Lob annotation to store large data, is that the best practice?
UPDATE:
The user_permission table extends the user with a 1:1 relationship and have bigint fields like bin_create, bin_read, bin_update, bin_delete that will hold the binary data as decimal numbers.
To clarify the question:
I am considering comparing the permissions using bitwise operators. So let's assume I have a user with the permission value 10(1010), and an action requiring 13 (1101). So 10 & 13 == 8 (1000) -> The user have one permission matching the required permissions for the action, so I could allow or deny (it is up to the application rules define which).
But with this approach I have a limited number of bits to work on (lets say I increase the permissions to be considered, so the numbers will increase too). The max bigint value is ~9223372036854775807 and that gives me the binary 111111111111111111111111111111111111111111111111111111111111111 with ~60 blocks and permission possibilities per field.
So What is the best column type I can use in this case that works in a cross-database environment to store a huge quantity of binary blocks and with the possibility to work with bitwise operators in java?
If you want to store your data in an optimal way, you have to name the target, where you want to optimize.
This is an optimal solution for MySQL (by defining BINARY(32)), you can try something similar on your favorite database:
#Column(columnDefinition = "BINARY(32)", length = 32, nullable = false)
private byte[] bits;
Sometimes with some JPA providers and databases the column definitions ends up with Lob. That's not the best solution, because reading a Lob is an external (very expensive) operation. Try to change either the provider, or the database (if you're working with pure JPA, you can try it).
Options for replacing Lobs are for example numeric columns (you can use e.g. 4 columns with 64bit width, or similar). If you want a nice solution, these container columns can be even #Embedded into your main class. But it all depends on your database.
This way you will have 256 bits (32 bytes), without any conversion and further calculation, and you will have the possibility to extend the range if you want. You have to be careful when changing the column definition, though.
If it's the amount of data you can fit in a field that you're concerned about, why not store the number as a varchar? To my knowledge, pretty much any database will let you go up to at least a varchar(255). If you need more than 255 digits in the number, you could encode it in base 64 to squeeze it down more. If my mental arithmetic is right, that gives you 255 characters * 6 bits per character = 1530 different bits to use. If you need more than that, I might suggest your permissions model is a little excessive.
That's assuming that you're trying to crowd the data into a smallish space in the database. Your question isn't entirely clear on what you're trying to solve for. On the other end of the spectrum, you could unpack the bits and save each bit to its own field or its own row. For example, user_permissions could be a table with two columns: user and permission, where each row is one permission granted to one user.
There are two different approaches:
1) Pretty data model
One row (user in your example) can have values (here a user permission which is one bit or a boolean value) where you don't know how many values are possible, i. e. the number of values is principally unlimited. The normal approach in SQL to handle this is a child table:
You create a table (and Java class for mapping / annotations) UserPermission, which contains the user id as a foreign key, a permission id and the boolean value. User id and permission id is a unique key for this table (you can add an id as a different primary key if you like). You even can add columns for the user who gave the permission, the date when this was done etc, if you want to have some auditing, but this is not necessary.
If you want to make it more pretty, then you also create a table (and Java class) Permission, which contains the permission id, a name for the permission and perhaps other information.
This solution needs more database space than your idea with the bits in an integer, but take in mind you won't have so many users compared to other data in the data base, so the extra amount doesn't matter.
2) Fast solution:
If the solution with extra tables needs to much overhead, because your permissions are not really important, and you worry an integer can be to short, then you can use the Java type BigInteger (the type allows bit manipulations) and map it with an #Column annotation to a NUMBER or DECIMAL in the database.
Bear in mind the size of a database's NUMBER also is limited (for example Oracle allows at maximum 10^40). If this might be a problem, then you must use solution 1).
One more disadvantage of solution 2) is, you never can use an index for the permissions. (A selection of all users having a certain permission set never will go over an index.)
I always would use solution 1).