How can I optimize a chat room query? - sql

Background
I'm building a simple chat client for a custom web application. I am required to store all chat logs. Also users can message individuals or groups. Think google chat (which I told my client to use instead but he insisted on custom). My database is structured so:
Table: ChatRoom
int Primary Key ChatRoomID
varchar(64) Name
Table ChatMessage
int Primary Key ChatMessageID
int UserID
int ChatRoomID
varchar(2000) message
datetime date
Table ChatUser
int ChatRoomID
int UserID
int LastMessageID
Primary Key (ChatRoomID, UserID)
I am using SQL Server and will be migrating soon to mysql so the solution needs to work on both platforms.
My problem
Assuming a user has just logged in I need to pull a list of all chat rooms with outstanding messages. My current query looks like this:
SELECT DISTINCT
cr.ChatRoomID AS id,
cu.LastMessageID AS label
FROM ChatRooms cr
LEFT JOIN ChatUsers cu ON cu.ChatRoomID = cr.ChatRoomID
LEFT JOIN ChatMessages cm ON cm.ChatRoomID = cr.ChatRoomID
WHERE cu.UserID = :user_id
AND cu.LastMessageID < cm.ChatMessageID
The question
This seems to work rather well. However I suspect this will get inefficient when their are dozens of users, thousands of rooms, and millions of messages. How do optimize this query (or database structure) to make this request (number of chat rooms with outstanding messages for a given user) a performance scale-able query?
My primary concern is that I'm forced to use the "distinct" flag for this query. So this could be joining a temporary table to the millions before filtering down to 2 numbers.
Example data
Users
1 | Dr A
2 | Dr B
3 | Biller A
4 | Biller B
5 | Boss
ChatRoom
1 | Doctor Group
2 | Billing Group
ChatUser
Room | User | Message
- | - | -------
1 | 1 | 0
1 | 2 | 2
1 | 5 | 2
2 | 3 | 6
2 | 4 | 0
2 | 5 | 5
Chat Message
ID | Room | User | Message
- | - | - | -------
1 | 1 | 5 | "How is everybody today?"
2 | 1 | 2 | "I'm well. Need more band aids in room 5."
3 | 2 | 5 | "Can someone restock Room 5 with Band aids?"
4 | 2 | 3 | "That's not my job get a lackey."
5 | 2 | 5 | "Do it anyway or your fired."
6 | 2 | 3 | "It's your're not your and I quit."
In this scenario user 1 and 4 are late for work and when they log in a a message will pop-up, and user 5 is in for a surprise in his billing department next time I run the query.

You can optimize this query like this:
select cr.ChatRoomID AS id,
cu.LastMessageID AS label
from ChatUsers cu inner join ChatRooms cr ON cu.ChatRoomID = cr.ChatRoomID
where cu.UserID = :user_id and
exists (select 1 from ChatMessages cm where cm.ChatRoomID = cr.ChatRoomID and cu.LastMessageID < cm.ChatMessageID);
There are mainly 2 issues with your current query:
Left joining will also bring the blank records. Also there will be multiple records for a group which you are handling by using distinct.
The list of records are again joined will all message table data so if the message table will contain more data then you query is destined to get slow.
This is something similar we solved at https://www.applozic.com.
Disclaimer: I am working at Applozic.

Related

SQL Query to show results that don't have a relation to variable

For an assignment I have which includes a delete and add friend system (like Facebook), I've made a query that works by using two SQL tables, one which includes a friend_id, name and other information, and another which holds two friend_id columns, that show the relationship with the users and if they're friends.
User Table (friends)
| friend_id | profile_name |
|:---------- |:------------:|
| 1 | John |
| 2 | Peter |
| 3 | Alex |
| 4 | Nick |
---------------------------
Friendship Table (myfriends)
| friend_id1 | friend_id2 |
|:---------- |:----------:|
| 1 | 3 |
| 2 | 4 |
| 3 | 1 |
| 4 | 2 |
-------------------------
I am wanting to get a query which selects people that don't have a connection with a result (I want to show anyone who doesn't have a connection to friend_id '1', so only want to show users 2 and 4), and then display their name.
I have a query that selects the ones which have the relation which is:
SELECT friends.profile_name,friends.friend_id FROM `myfriends` JOIN `friends` ON friends.friend_id = myfriends.friend_id2 WHERE `friend_id1` = 1;
The query bellow shows all results from the table, and even using '!=', it doesn't select those who don't have a relation to friend_id '1'
SELECT friends.profile_name,friends.friend_id FROM `myfriends` JOIN `friends` ON friends.friend_id = myfriends.friend_id2 WHERE `friend_id1` != 1;
How can I fix this query so it shows all results but those connected to ‘friend_id1’ = 1
with connected as (SELECT friend_id,
myfriends.friend_id2 friend
FROM myfriends
JOIN friends
ON friends.friend_id = myfriends.friend_id1
WHERE friend_id1 = 1)
select *
from friends
where friend_id not in (select distinct friend from connected union all select distinct friend_id from connected)
you cannot change the where clause as it specifies which user you want to focus on.
So first get the users that are connected (in the first cte), and then select all users except those found in the first result of the connected users.
By the way, your example is misleading as it can be solved with a bug by doing something simple in the join.
edit
while it wasn't clease which version you were using, (I thought with clause is available in the newer mysql versions) I created another solution that is working on mysql 5.6 and should work for you as well:
select f.*
from friends f
left join (
SELECT friend_id, myfriends.friend_id2 friend
FROM myfriends
JOIN friends
ON friends.friend_id = myfriends.friend_id1
WHERE 1 in (friend_id,friend_id2)) f1
on f1.friend = f.friend_id
where f1.friend is null
it has a nicer implementation in one part (1 in one of 2 columns), and uses a left join that takes the nulls from the right table.

SQL find duplicates and assign group number

Situation
On a Microsoft SQL Server 2008 I have about 2 million rows. (this should have never happened but we inherited the situation). A sample as follows:
usernum. | phone | email
1 | 123 | user1#local.com
2 | 123 | user2#local.com
3 | 245 | user3#local.com
4 | 678 | user3#local.com
Aim
I would like to create a table that looks like this. The idea is that if 'phone' or 'email' is the same, they are assigned the same group number.
groupnum |usernum. | phone | email
1 | 1 | 123 | user1#local.com
1 | 2 | 123 | user2#local.com
2 | 3 | 245 | user3#local.com
2 | 4 | 678 | user3#local.com
Tried so far
So far I have created a simple python script that conceptually does the following:
- for each usernum in the table
-- assign a group number
-- also assign the group number to all rows where phone or email is the same as this row
-- do not assign the group number if usernum already processed (else we would do things double)
Problem
The python script basically has to check for each row if there are duplicates for phone or email. Although this is perfectly fine for maybe 10,000 records or so, it is too slow for 2 million records. I think this possible to do in t-sql which should be much faster than my python script using pyodbc. The big question thus is, how to do this in sql.
Just noticed you said email or phone is duplicate. For that I would think you would need to decide which has priority in instances where a user could be joined from either field. Or you could potentially just split the update into a few batches to make group numbers based on phone AND email, then email (when not already matched), then phone (when not already matched) as such:
insert into yourGroupsTable (phone, email) -- assuming identity column of groupNum here
select distinct phone, email
from yourUserTable
-- assign group nums with priority on matching phone AND email
update yourUserTable
set groupNum = g.groupNum
from yourUserTable u
join yourGroupsTable g on u.phone = g.phone
and u.email = g.email
It occurs to me now that this would not work as each row would join on the yourGroupsTable due to the distinct select. I came across a scenario that I'm unsure what your expected outcome would be (and too big for a comment) - what happens in this instance:
your test data slightly modified:
groupnum |usernum. | phone | email
1 | 1 | 123 | user1#local.com
1 | 2 | 123 | user2#local.com
? | 3 | 245 | user3#local.com
? | 4 | 678 | user3#local.com
? | 5 | 245 | user7#local.com
? | 6 | 678 | user7#local.com
what would the group numbs be in the above case?
As you do python script is good way ... if you want to move with mysql make it one procedure before inserting record must check its exist or not in table
If Exist
THEN get that row groupnum and assign that groupnum to this new record ...
IF Not
Then give new groupnum
but i have still little confusion
now if record is like
5 | 678 | user1#local.com
if this is the case then ?
I assume that both column [phone and email ] is consider to give groupnum.
if my assumption is correct then go with mysql procedure ...

SQL Getting last date and values associated with description of value in another table

I have two different tables, my first table contains the authorizations granted to other requests. It has the following columns:
Authorizations table
| authorization_date | role_id | request_id |
|--------------------|---------|------------|
| 2011-08-02 | 1 | 168 |
| 2011-08-10 | 2 | 168 |
| 2011-08-20 | 6 | 168 |
| 2011-08-03 | 2 | 169 |
| 2011-08-24 | 6 | 169 |
| 2011-08-05 | 3 | 170 |
| 2011-08-09 | 5 | 170 |
As you can see, different people have different roles and also can grant a certain level of authorization. The higher the role, the higher the request has been processed.
Now, what I want to do is I want to show the description associated to the role_id (which is in another table) and ONLY the last authorization. Since I have the date for it I already know which one is the latest one. However I don't know how to do this as I try to group my query by getting the maximum value of the date, when I link it to my second table containing only the description of the role_id, I get duplicates and I just can't think of a way to do this, I'm kind of new with queries & as I've been learning by myself I don't know many things. Any ideas?
Thanks!
I am guessing a little here, but I think you want:
SELECT Authorizations.*, anotherTable.description
FROM
(
SELECT MAX(authorization_date) AS max_date, request_id
FROM Authorizations
GROUP BY request_id
) last_auths
INNER JOIN Authorizations ON last_auths.max_date = Authorizations.authorization_date
AND last_auths.request_id = Authorizations.request_id
INNER JOIN anotherTable ON anotherTable.role_id = Authorizations.role_id
The derived table will get the maximum authorization date for each request. Then you can join to get the role_id for that request and join again to get the description.
SELECT d.role_desc, a.role_id, MAX(a.authorization_date)
FROM Authorizations a
INNER JOIN Description d ON a.role_id = d.role_id
GROUP BY a.role_id, d.role_desc, d.role_id
Try this query and ensure you are getting the expected result.

SELECT certain fields based on certain WHERE conditions

I am writing an advanced MySQL query that searches a database and retrieves user information. What I am wondering is can I select certain fields if WHERE condition 1 is met and select other fields if WHERE condition 2 is met?
Database: users
------------------------
| user_id | first_name |
------------------------
| 1 | John |
------------------------
| 2 | Chris |
------------------------
| 3 | Sam |
------------------------
| 4 | Megan |
------------------------
Database: friendship
--------------------------------------
| user_id_one | user_id_two | status |
--------------------------------------
| 2 | 4 | 0 |
--------------------------------------
| 4 | 1 | 1 |
--------------------------------------
Status 0 = Unconfirmed
Status 1 = Confirmed
OK, as you can see John & Megan are confirmed friends while Chris & Megan are friends but the relationship is unconfirmed.
The query I am trying to write is as follow: Megan(4) searches for new friends I want all of the users except for the ones she is a confirmed friend with to be returned. So, the results should return 2,3. But since a relationship with user_id 2 exists but is not confirmed, I want to also return the status since an entry in the friendship table does exist between the two. If a user exist but there is no connection in the relationship table it still returns that users information but returns status as a NULL or doesn't return status at all since it doesn't exist in that table.
I hope this makes since. Ask questions if you need to.
Why not use a left join or an if-not-exists?
SELECT users.*
FROM (users LEFT JOIN friendships
ON status=1 AND (user_id_one=user_id OR user_id_two=user_id) )
WHERE
status IS NULL
or
SELECT users.*
FROM users
WHERE
NOT EXISTS (SELECT *
FROM friendships
WHERE status=1
AND (user_id_one=user_id
OR user_id_two=user_id))
You can create to separate queries and then UNION the result tables. In each query, add a field that always has the same value.
So something like this should work:
(SELECT id, 'Not Friends' As Status FROM t1 WHERE condition1)
UNION
(SELECT id, 'Unconfirmed' As Status FROM t1 WHERE condition2)
Just make sure the same number and name of fields exists in both queries.

How should joins used in mysql?

If i have two tables like
user table-"u"
userid | name
1 | lenova
2 | acer
3 | hp
pass table-"p"
userid | password
1 | len123
2 | acer123
3 | hp123
as for as i learnt from tutorials I can join these 2 tables using many joins available in
mysql as said here
If i have a table like
role table-"r"
roleid | rname
1 | admin
2 | user
3 | dataanalyst
token table-"t"
tokenid| tname
1 | xxxx
2 | yyyy
3 | zzzz
tole_token_association table-"a"
roleid | tokenid
1 | 1
1 | 2
3 | 1
3 | 3
3 | 1
I have to make a join such that I have to display a table which corresponds
like this "rolename" has all these tokens.How to make this? I am confused. Is it possible to make a join? I am liking mysql a lot. I wish to play with queries such that not playing. I want to get well versed. Any Suggestions Please?
It's easiest to see when the column names that need to be joined are named identically:
SELECT r.rname,
t.tname
FROM ROLE r
JOIN ROLE_TOKEN_ASSOCIATION rta ON rta.roleid = r.roleid
JOIN TOKEN t ON t.tokenid = rta.tokenid
This will return only the roles with tokens associated. If you have a role that doesn't have a token associated, you need to use an OUTER join, like this:
SELECT r.rname,
t.tname
FROM ROLE r
LEFT JOIN ROLE_TOKEN_ASSOCIATION rta ON rta.roleid = r.roleid
JOIN TOKEN t ON t.tokenid = rta.tokenid
This link might help -- it's a visual representation of JOINs.