what's the most effecient way to query all the messages in a group chat application? - sql

i will use an example to illustrate my question.
you have a group-chat table that stores data about group chat.
-------------------+
id | name |owner_id|
-------------------+
33 | code | 45
you have a messages table that hold messages
-------------------------------------+
id | content | user_id | chat_room_id
-------------------------------------+
5 | "hello" | 41 | 33
2 | "hi" | 43 | 33
you have a users table that holds user information and which group chat they are part of:
-------------------------------------+
id | name | chat_room_id
-------------------------------------+
5 |"nick"| 33
2 |"mike"| 33
is this the right way to set up the database?
without joints or foreign keys. what's the most efficient way to load all the messages and user data and have it in a form that allows you to construct a ui where the user data is displayed next to the message?
My solutions:
if you query the messages database and retrieved all the messages where chat room id is equal to 33, you're gonna get an array that looks like
[
{
id : 5,
user_id : 41,
content : "hello"
},
{
id : 2,
user_id : 43,
content : "hi"
}
]
as you can see the user ids are part of the message object.
solution 1 : (naive) :
loop through the messages array and query the database using the user id.
this is a bad solution since querying the database from a loop is never a good idea.
solution 2 : (efficient but less data to send in the response) :
loop through the messages array and construct an array of user ids and use that in a query
using WHERE user_id IN
then loop through the array of users and construct a hash table using the user id as a key since it is unique.
on the front end just loop through the messages array and lookup the user.
is this solution going to be very slow if you have a large amount of messages. will it scale well since it's O(n).
solution 3 : (efficient but more data to send in the response) :
its the same as before but the difference here is adding properties to the messages object that store user data.
the problem with this solution is that you will have duplicate data since one user can publish multiple messages.
these are my solutions i hope to hear yours.
for context : system design videos on youtube don't address this part of chat apps. if you found one that does please post the link.

Related

SQL different null values in different rows

I have a quick question regarding writing a SQL query to obtain a complete entry from two or more entries where the data is missing in different columns.
This is the example, suppose I have this table:
Client Id | Name | Email
1234 | John | (null)
1244 | (null) | john#example.com
Would it be possible to write a query that would return the following?
Client Id | Name | Email
1234 | John | john#example.com
I am finding this particularly hard because these are 2 entires in the same table.
I apologize if this is trivial, I am still studying SQL and learning, but I wasn't able to come up with a solution for this and I although I've tried looking online I couldn't phrase the question in the proper way, I suppose and I couldn't really find the answer I was after.
Many thanks in advance for the help!
Yes, but actually no.
It is possible to write a query that works with your example data.
But just under the assumption that the first part of the mail is always equal to the name.
SELECT clients.id,clients.name,bclients.email FROM clients
JOIN clients bclients ON upper(clients.name) = upper(substring(bclients.email from 0 for position('#' in bclients.email)));
db<>fiddle
Explanation:
We join the table onto itself, to get the information into one row.
For this we first search for the position of the '#' in the email, get the substring from the start (0) of the string for the amount of characters until we hit the # (result of positon).
To avoid case-problems the name and substring are cast to uppercase for comparsion.
(lowercase would work the same)
The design is flawed
How can a client have multiple ids and different kind of information about the same user at the same time?
I think you want to split the table between clients and users, so that a user can have multiple clients.
I recommend that you read information about database normalization as this provides you with necessary knowledge for successfull database design.

How to pick transaction isolation levels?

I have a table in database that is responsible for storing ordered/reorderable lists. It has the following shape:
| id | listId | index | title | ... |
where id is primary key, listId is foreign key that identifies what list the item belongs to, title and other columns are contents of items. index property is responsible for position of item in list. It is an integer counter (starting with 0) that is unique in the scope of the list, but may repeat across lists. Example data:
| id | listId | index | title | ...
---------------------------------------------
| "item1" | "list1" | 0 | "title1" | ...
| "item2" | "list1" | 1 | "title2" | ...
| "item3" | "list1" | 2 | "title3" | ...
| "item4" | "list2" | 0 | "title4" | ...
| "item5" | "list2" | 1 | "title5" | ...
Users can create/delete items, move them inside the list or across lists.
To ensure consistency of indexes when running these operations, I do the following:
Create item:
Count items within this list
SELECT COUNT(DISTINCT "Item"."id") as "cnt"
FROM "item" "Item"
WHERE "Item"."listId" = ${listId}
Insert new item, with index set to count from step 1:
INSERT INTO "item"("id", "listId", "index", "title", ...)
VALUES (${id}, ${listId}, ${count}, ${title})
This way index grows with each item inserted into the list.
Move item:
Retrieve item's current listId and index:
SELECT "Item"."listId" AS "Item_listId", "Item"."index" AS "Item_index"
FROM "item" "Item"
WHERE "Item"."id" = ${id}
Change index of "shifted" items if necessary, so that order is consistent, e.g. given the item is moved forward, all items between its current position (exclusively) and its next position (inclusively) need to have their index decreased by 1:
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" BETWEEN ${sourceIndex + 1} AND ${destinationIndex}
I'll omit the variation with movement across lists because it is very similar.
Update the item itself:
UPDATE "item"
SET "index" = ${destinationIndex}
WHERE "id" = ${id}
Delete item:
Retrieve item's index and listId
Move all items in same list that are next to this item 1 step back, to remove the gap
UPDATE "item"
SET "index" = "index" - 1
WHERE "listId" = ${listId}
AND "index" > ${itemIndex}
Delete item:
DELETE FROM "item"
WHERE "id" = ${id}
Question is:
What transaction isolation levels should I provide for each of these operations? It is very important for me to keep index column consistent, no gaps and most importantly - no duplicates. Am I getting it right that create item operation is subject to phantom reads, because it counts items by some criteria, and it should be serializable? What about other operations?
Without knowing more about your specific application, the safest bet is indeed to use serializable as isolation level whenever you access that table but even that level may not be sufficient for your specific case.
A unique constraint on (listId, index) would prevent duplicates (what about the title? Can it be repeated in the same list?), some accurately crafted "watchdog" queries can further mitigate issues and database sequences or stored procedures can ensure that there are no gaps but truth is the mechanism itself seems fragile.
Knowing only so much of your specific problem, what you appear to have is a concurrency problem at user level in the sense that several users can access the same objects at the same time and make changes on them. Assuming this is your typical web-application with a stateless back-end (hence inherently distributed) this may carry a large amount of implications in terms of user experience reflecting on the architecture and even functional requirements. Say for example that user Foo moves item Car to List B which is currently being worked on by user Bar. It is then legit to assume that Bar will need to see item Car as soon as the operation is completed, but that will not happen unless there's some mechanism in place to immediately notify users of List B of the change. The more users you have working on the same set of lists, the worse it becomes even with notifications as you would have more and more of them up to the point where users see things changing all the time and just can't keep up with it.
There's a lot of assumptions anyone will make to provide you an answer. My own lead me to state that you probably need to revise the requirements for that application or ensure that management is aware of several limitations and that they accept them.
This type of problem is pretty common in distributed applications. Usually "locks" on certain sets of data are placed (either through database or shared memory pools) so that only one user can alter them at any given time or, alternatively, a workflow is provided to manage conflicting operations (much like versioning systems). When neither is done, a log of operations is kept to understand what happened and rectify problems later on should they be detected.
According to your constraints, you can create a unique index on two columns: listId,index can be defined as unique. That will avoid duplicates.
Additionally to avoid gaps I would recommend:
select listId, index, (select min(index) from Item i2 where listId = :listId and i2.index > i1.index) as nextIndex from Item i1 where nextIndex - index > 1 and listId = :listId
at the end of each transaction.
Together with transaction isolation level: "Repeatable Read" and rolling back and repeating the transaction if either the unique constraint fails, or the statement, I suggested, returned a record, this should meet your requirements.

django database design when you will have too many rows

I have a django web app with postgres db; the general operation is that every day I have an array of values that need to be stored in one of the tables.
There is no foreseeable need to query the values of the array but need to be able to plot the values for a specific day.
The problem is that this array is pretty big and if I were to store it in the db, I'd have 60 million rows per year but if I store each row as a blob object, I'd have 60 thousand rows per year.
Is is a good decision to use a blob object to reduce table size when you do not want to query with the row of values?
Here are the two options:
option1: keeping all
group(foreignkey)| parent(foreignkey) | pos(int) | length(int)
A | B | 232 | 45
A | B | 233 | 45
A | B | 234 | 45
A | B | 233 | 46
...
option2: collapsing the array into a blob:
group(fk)| parent(fk) | mean_len(float)| values(blob)
A | B | 45 |[(pos=232, len=45),...]
...
so I do NOT want to query pos or length but I want to query group or parent.
An example of read query that I'm talking about is:
SELECT * FROM "mytable"
LEFT OUTER JOIN "group"
ON ( "group"."id" = "grouptable"."id" )
ORDER BY "pos" DESC LIMIT 100
which is a typical django admin list_view page main query.
I tried loading the data and tried displaying the table in the django admin page without doing any complex query (just a read query).
When I get pass 1.5 millions rows, the admin page freezes. All it takes is a some count query on that table to cause the app to crash so I should definitely either keep the data as a blob or not keep it in the db at all and use the filesystem instead.
I want to emphasize that I've used django 1.8 as my test bench so this is not a postgres evaluation but rather a system evaluation with django admin and postgres.

Create a Parent–Children Web Part Page

I am trying to figure out how to do something that I would think is commonplace, but I cannot find how to do.
Given two Custom Lists, one with a field that is essentially a primary key, and the other with what is essentially a foreign key, I want to show all the rows from the first in one area of the display, and the related records for the selected row of the first, in a second part of the screen.
I am thinking this would be side–by–side web parts on a web-part page.
So:
ID pkID Data ID fkID Data
___________________ ______________________________
| 1 100 Row one. | | 8 100 Related one/one |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | 9 100 Related one/two |
2 113 Row two. | 10 100 Related one/three |
3 118 Row n. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
11 113 Related two/one
12 113 Related two/two
13 118 Related n/one
(That is my attempt to show what is established between the two lists. Top row selected on the left, related records from the other row on the right.)
Surely this is common enough that there is a way to readily do this?
I suppose I might need to create a means of asserting that a row is 'selected.'
You will note that I am not useing the ID field that "belongs" to SharePoint.
You can create look up fields to establish that relationship, sharepoint 2010 even allows you to enforce the relationship like in a SQL database. so for instace you can declare what happens if you try to delete a parent if there is childs (Cascade, Prevent, etc).
Have a read here:
http://office.microsoft.com/en-au/sharepoint-server-help/create-list-relationships-by-using-unique-and-lookup-columns-HA101729901.aspx
About visually displaying them, you might have to create some webparts for it, as the only support OOB is the link to the child entity from the main entity on the parent list.

SQL adjustable structure redundancy

I have to build a database structure which allow a totally modular structure. Let's take an example, it will be easier to understand.
We have a website record, looking like this :
WEBSITE A
| ----- SECTION A
| |-- SUBSECTION 1
| | | -- Data 1 : Value 1
| | | -- Data 2 : Value 2
| | | ...
| | | -- Data N : Value N
| |
| |-- SUBSECTION 2
| | | -- Data 52 : Value 1
| | | -- Data 53 : Value 2
| | | ...
| | | -- Data M : Value M
| |
| ...
|
| ----- SECTION B
| |
| ...
...
Model 1 :
And so on. The trouble is that I have to implement a permission system. For instance, User A have access to Section A,B,D,Z from website 1 whereas User 2 have acces to section C,V,W,X from website 2.
First, I though that building this as a tree would be the most efficient way to do.
Here is my first database representation :
TABLE website (id, id_client, name, address)
TABLE section (id, id_website, name)
TABLE sub_section (id, id_section, name)
TABLE data (id, id_sub_section, key, value)
With this representation, it would be easy to give some restricted access to the employees.
However, both websites will have common data. For instance, all websites will have section A,B,C,D with the same structure. It implies a lot of redundancy. For each website, we'll have a lot of common structure, the only difference will be the attribute value in the TABLE data.
The second problem is that this structure have to be totally modular. For instance, the admin should be able to add a section, a subsection or a data to a website record. That's the reason why I though that this model is easier to manage.
Model 2 :
I have a second model, easier to store but harder to exploit :
TABLE website (id, id_client, Value 1, Value 2, Value 3 ... Value N)
TABLE section (id, name, Data 1, Data 2, Data 3 .. Data N, ..., Data 52, Data 53, Data M) (it represents the name of the columns)
TABLE subsection (id, id_section, name, Data 1, Data 2, Data N)
By doing this, I have a table where data are stored and "structural tables" with section and subsection in common with both websites. If the admin wants to add a section / subsection, we're going back to the tree structure to store additionnal data, looking like this :
TABLE additional_section (id,id_website,name)
TABLE additionnal_subsection (id,id_section, id_additional_section, name)
TABLE additional_data (id, id_subsection, id_additionnal_subsection, key, value)
It avoids a lot of redundancy and facilitate the permissions management.
Here's my question :
What's the best model for this kind of application ? Model 1 ? Model 2 ? Another one ?
Thanks for reading and for your answers !
I would suggest that you modify Model 1.
You can eliminate the redundancy in the section table by removing the id_website FK from that table and create a new table between the website table and it.
This new table WebsiteSection has a PK that consist of an FK to website AND an FK to section, allowing each section to be part of multiple websites.
Section data that is common to all websites would then be stored in the section table while section data that is site specific would be stored in the WebsiteSection table.