keys or list for my redis datamodel? - redis

I'm building the data model of my app and basically, for a given user, I'd like to keep a list of his/her friends and the status of each of them (if they accepted the request to be friends or if they did not yet)
I end up with several keys: (one for each friend of tom)
friends:tom:status:jessica => joined
friends:tom:status:stephan => joined
friends:tom:status:hubert => pending
friends:tom:status:peter => declined
Is that the best way to handle that or should a list be used in some other way ?

You can try to use for example hash structure where hash key would be friends:tom:status, field would represent friend name/ID and value would represent his status. Hash structure is more memory efficient than dedicated keys in general.

You could use an ordered set for this.
Have each status have a score associated with it, joined 1, pending 2, declined 3
zadd user1_friends 1 userid 1 userid 2 userid
then you can easily retrieve all users by category
zscore user1_friends 1
or you could split into 3 separate sets
sadd user1_joined userid1
sadd user1_pending userid3
Depending on what you want to do either will work

Related

How to count HASH lists with certain value in Redis

Assume I have multiple hash lists as below:
HMSET myhash1 type Car company BMW
HMSET myhash2 type Car company Benz
HMSET myhash3 type Car company BMW
HMSET myhash4 type Car company Honda
HMSET myhash5 type Car company BMW
HMSET myhash6 type Car company Toyota
HMSET myhash7 type Car company Benz
I want to count how many hash list I have with company = BMW which is 3
You have to build some form of secondary index to accomplish this efficently.
Use a Set
You can use a set, and create a set for each company, and add each key to it.
SADD myhash:Company:BMW myhash1 myhash3 myhash5
SADD myhash:Company:Benz myhash2 myhash7
SADD myhash:Company:Honda myhash4
SADD myhash:Company:Toyota myhash6
Then when you want to query it, you would just use SCARD, so if you wanted to know how many BMWs there were you'd just run
SCARD myhash:Company:BMW
With Redis Stack
Redis Stack has native secondary indexing capabilities which you can leverage here this is much easier to maintain (and can actually work across shards in a scaled out environment if need be). You'd just need to create the secondary index
FT.CREATE hashes PREFIX 1 myhash SCHEMA company tag
Then you'd just need to query them (if you don't care to get the actual cars matching your query back just pass in LIMIT 0 0
FT.SEARCH hashes "#company:{BMW}" LIMIT 0 0
The thing with non-relational databases is that you have to make relations by hand if needed. So here you are "obliged" to have another key holding this information.
If you want all informations, you can have a set key holding it like :
SADD BMW myhash1
//use SCARD to know how many BMW there are
SCARD BWM
If you only want to know the number of BMWs you can have a simple key holding a number like :
INCR BMW //autoset to 0 before applying INCR, holds 1 after this command
//use GET to have the number of BMWs
GET BWM //returns 1
//if you want to delete a hash you can use INCRBY to update the number
INCRBY BMW -1

what's the most effecient way to query all the messages in a group chat application?

i will use an example to illustrate my question.
you have a group-chat table that stores data about group chat.
-------------------+
id | name |owner_id|
-------------------+
33 | code | 45
you have a messages table that hold messages
-------------------------------------+
id | content | user_id | chat_room_id
-------------------------------------+
5 | "hello" | 41 | 33
2 | "hi" | 43 | 33
you have a users table that holds user information and which group chat they are part of:
-------------------------------------+
id | name | chat_room_id
-------------------------------------+
5 |"nick"| 33
2 |"mike"| 33
is this the right way to set up the database?
without joints or foreign keys. what's the most efficient way to load all the messages and user data and have it in a form that allows you to construct a ui where the user data is displayed next to the message?
My solutions:
if you query the messages database and retrieved all the messages where chat room id is equal to 33, you're gonna get an array that looks like
[
{
id : 5,
user_id : 41,
content : "hello"
},
{
id : 2,
user_id : 43,
content : "hi"
}
]
as you can see the user ids are part of the message object.
solution 1 : (naive) :
loop through the messages array and query the database using the user id.
this is a bad solution since querying the database from a loop is never a good idea.
solution 2 : (efficient but less data to send in the response) :
loop through the messages array and construct an array of user ids and use that in a query
using WHERE user_id IN
then loop through the array of users and construct a hash table using the user id as a key since it is unique.
on the front end just loop through the messages array and lookup the user.
is this solution going to be very slow if you have a large amount of messages. will it scale well since it's O(n).
solution 3 : (efficient but more data to send in the response) :
its the same as before but the difference here is adding properties to the messages object that store user data.
the problem with this solution is that you will have duplicate data since one user can publish multiple messages.
these are my solutions i hope to hear yours.
for context : system design videos on youtube don't address this part of chat apps. if you found one that does please post the link.

Modeling data in Redis. Which is better: Sorted set & strings or hash table?

I want to use redis to store data that is sourced from a sql db. In the db, each row has an ID, date, and value, where the ID and date make up a composite key (there can only be one value for a particular ID and date). An example is below:
ID Date Value
1 01/01/2001 1.2
1 02/01/2001 1.5
1 04/23/2002 1.5
2 05/05/2009 0.4
Users should be able to query this data in redis given a particular ID and date range. For example, they might want all values for 2019 with ID 45. If the user does not specify a start or end time, we use the system's Date.Min or Date.Max respectively. We also want to support refreshing redis data from the database using the same parameters (ID and date range).
Initially, I used a zset:
zset key zset member score
1 01/01/2001_1.2 20010101
1 02/01/2001_1.5 20010201
1 04/23/2002_1.5 20020423
2 05/05/2009_0.4 20090505
Only, what happens if the value field changes in the db? For instance, ID 1 and date 01/01/2001 might change to 1.3 later on. I would want the original value to be updated, but instead, a new member will be inserted. Rather, I would need to first check that a member for a particular score exists, and delete if it does before inserting a new member. I imagine this could get expensive if refreshing, for example, 10 years worth of data.
I thought of two possible fixes to this:
1.) Use a zset and string key-value:
zset key zset value score
1 1_01/01/2001 20010101
1 1_02/01/2001 20010201
1 1_04/23/2002 20020423
2 2_05/05/2009 20090505
string key string value
1_01/01/2001 1.2
1_02/01/2001 1.5
1_04/23/2002 1.5
2_05/05/2009 0.4
This allows me to easily update the value, and query for a date range, but adds some complexity as now I need to use two redis data structures instead of 1.
2.) Use a hash table:
hash key sub-key value
1 01/01/2001 1.2
1 02/01/2001 1.5
1 04/23/2002 1.5
2 05/05/2009 0.4
This is nice because I only have to use 1 data structure and although it would be O(N) to get all values for a particular hash key, solution 1 would have the same drawback when getting values for all string keys returned from the zset.
However, with this solution, I now need to generate all sub-keys between a given start and end date in my calling code, and not every date may have a value. There are also some edge cases that I now need to handle (what if the user wants all values up until today? Do I use HGETALL and just remove the ones in the future I don't care about? At what date range size should I use HGETALL rather than HMGET?)
In my view, there are pro's and con's to each solution, and I'm not sure which one will be easier in the long term to maintain. Does anyone have thoughts as to which structure they would choose in this situation?

How to pick transaction isolation levels?

I have a table in database that is responsible for storing ordered/reorderable lists. It has the following shape:
| id | listId | index | title | ... |
where id is primary key, listId is foreign key that identifies what list the item belongs to, title and other columns are contents of items. index property is responsible for position of item in list. It is an integer counter (starting with 0) that is unique in the scope of the list, but may repeat across lists. Example data:
| id | listId | index | title | ...
---------------------------------------------
| "item1" | "list1" | 0 | "title1" | ...
| "item2" | "list1" | 1 | "title2" | ...
| "item3" | "list1" | 2 | "title3" | ...
| "item4" | "list2" | 0 | "title4" | ...
| "item5" | "list2" | 1 | "title5" | ...
Users can create/delete items, move them inside the list or across lists.
To ensure consistency of indexes when running these operations, I do the following:
Create item:
Count items within this list
SELECT COUNT(DISTINCT "Item"."id") as "cnt"
FROM "item" "Item"
WHERE "Item"."listId" = ${listId}
Insert new item, with index set to count from step 1:
INSERT INTO "item"("id", "listId", "index", "title", ...)
VALUES (${id}, ${listId}, ${count}, ${title})
This way index grows with each item inserted into the list.
Move item:
Retrieve item's current listId and index:
SELECT "Item"."listId" AS "Item_listId", "Item"."index" AS "Item_index"
FROM "item" "Item"
WHERE "Item"."id" = ${id}
Change index of "shifted" items if necessary, so that order is consistent, e.g. given the item is moved forward, all items between its current position (exclusively) and its next position (inclusively) need to have their index decreased by 1:
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" BETWEEN ${sourceIndex + 1} AND ${destinationIndex}
I'll omit the variation with movement across lists because it is very similar.
Update the item itself:
UPDATE "item"
SET "index" = ${destinationIndex}
WHERE "id" = ${id}
Delete item:
Retrieve item's index and listId
Move all items in same list that are next to this item 1 step back, to remove the gap
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" > ${itemIndex}
Delete item:
DELETE FROM "item"
WHERE "id" = ${id}
Question is:
What transaction isolation levels should I provide for each of these operations? It is very important for me to keep index column consistent, no gaps and most importantly - no duplicates. Am I getting it right that create item operation is subject to phantom reads, because it counts items by some criteria, and it should be serializable? What about other operations?
Without knowing more about your specific application, the safest bet is indeed to use serializable as isolation level whenever you access that table but even that level may not be sufficient for your specific case.
A unique constraint on (listId, index) would prevent duplicates (what about the title? Can it be repeated in the same list?), some accurately crafted "watchdog" queries can further mitigate issues and database sequences or stored procedures can ensure that there are no gaps but truth is the mechanism itself seems fragile.
Knowing only so much of your specific problem, what you appear to have is a concurrency problem at user level in the sense that several users can access the same objects at the same time and make changes on them. Assuming this is your typical web-application with a stateless back-end (hence inherently distributed) this may carry a large amount of implications in terms of user experience reflecting on the architecture and even functional requirements. Say for example that user Foo moves item Car to List B which is currently being worked on by user Bar. It is then legit to assume that Bar will need to see item Car as soon as the operation is completed, but that will not happen unless there's some mechanism in place to immediately notify users of List B of the change. The more users you have working on the same set of lists, the worse it becomes even with notifications as you would have more and more of them up to the point where users see things changing all the time and just can't keep up with it.
There's a lot of assumptions anyone will make to provide you an answer. My own lead me to state that you probably need to revise the requirements for that application or ensure that management is aware of several limitations and that they accept them.
This type of problem is pretty common in distributed applications. Usually "locks" on certain sets of data are placed (either through database or shared memory pools) so that only one user can alter them at any given time or, alternatively, a workflow is provided to manage conflicting operations (much like versioning systems). When neither is done, a log of operations is kept to understand what happened and rectify problems later on should they be detected.
According to your constraints, you can create a unique index on two columns: listId,index can be defined as unique. That will avoid duplicates.
Additionally to avoid gaps I would recommend:
select listId, index, (select min(index) from Item i2 where listId = :listId and i2.index > i1.index) as nextIndex from Item i1 where nextIndex - index > 1 and listId = :listId
at the end of each transaction.
Together with transaction isolation level: "Repeatable Read" and rolling back and repeating the transaction if either the unique constraint fails, or the statement, I suggested, returned a record, this should meet your requirements.

DataStudio ::: Count_Distinct Calculation based on two values

I have a data set with two values. The first being User ID, the second being Type.
I previously had a count_distinct on the user ID. However, if a user changes their Type, the metrics we have to show 'Type' of user shows two instances of that once unique user.
I would like some SQL which basically states if the User ID and Type Match, then count that as a unique user. If either one of those change, count that also as a unique user. So that..
1) Jane Doe - Support User
2) Jane Doe - Employee User
3) John Smith - Support User
This would show as 3 unique users, rather than 2 based on a count_unique on just the User ID.
One way to accomplish this is with a calculated field that looks something like the following:
COUNT_DISTINCT(CONCAT(USER_ID, ' - ', TYPE))
You might need to replace USER_ID and TYPE with your field names.