Duplicates results in query - sql

I have an apps table. Each app has many conversations and users. A conversation has many messages and each message can either belong to a visitor or user and a visitor can have many conversations.
For each of my conversations, I want to attach the name and avatar of the user who most recently wrote in the conversation.
If no user has replied, then instead I'd like to grab the 3 most recently created user's avatars, along with the name of the app, and use these instead.
This is what I've got so far, but it returns multiple results for the same conversation id, and I haven't found a solution to getting the app users avatars
select
c.id,
c.last_message,
c.last_activity,
coalesce(last.display_name, a.name || ' Team') as name,
array_agg(last.avatar)
from messages m
left join conversations c on c.id = m.conversation_id
left join apps a on a.id = c.app_id
left join lateral (
select u.id, u.display_name, u.avatar
from users u
where u.id = m.user_id
) as last on true
where c.visitor_id = 'c6p77hu9v000a4zcth4lnefn9'
group by c.id, last.display_name, last.avatar, a.name
order by c.inserted_at desc
Any help is greatly appreciated

For each of my conversations, I want to attach the name and avatar of the user who most recently wrote in the conversation.
To do that, you can use a LATERAL subquery, but you also need to add ORDER BY in such way that the last message is first, then use LIMIT 1 to get only that last row. So, if I assume you have a column message_datetime in message table, which stores the date and time the message has been sent, you can use:
select
c.id,
c.last_message,
c.last_activity,
coalesce(last.display_name, a.name || ' Team') as name,
last.avatar
from
conversations c
left join apps a on a.id = c.app_id
left join lateral (
select
u.id, u.display_name, u.avatar
from
users u
inner join messages m on u.id = m.user_id
where
c.id = m.conversation_id
order by
m.message_datetime desc
limit 1
) as last on true
where
c.visitor_id = 'c6p77hu9v000a4zcth4lnefn9'
order by
c.inserted_at desc
If no user has replied, then instead I'd like to grab the 3 most recently created user's avatars, along with the name of the app, and use these instead.
That is simpler, as this query is uncorrelated to the previous. Assuming your users have an created_datetime column with the date and time the user has been created, you can use the simple query:
select
u.id, u.display_name, u.avatar
from
users u
order by
u.created_datetime desc
limit 3
And so you can use it as a subquery in the previous query, using COALESCE to control which information to use:
select
c.id,
c.last_message,
c.last_activity,
coalesce(last.display_name, a.name || ' Team') as name,
coalesce(array[last.avatar], last_all.avatar) as avatar
from
conversations c
left join apps a on a.id = c.app_id
left join lateral (
select
u.id, u.display_name, u.avatar
from
users u
inner join messages m on u.id = m.user_id
where
c.id = m.conversation_id
order by
m.message_datetime desc
limit 1
) as last on true
left join (
select
array_agg(u.avatar) as avatar
from
users u
order by
u.created_datetime desc
limit 3
) last_all on true
where
c.visitor_id = 'c6p77hu9v000a4zcth4lnefn9'
order by
c.inserted_at desc

Related

Retrieving data from PostgreSQL DB in a more efficient way

I'm developing a real-time chat app using PostgreSQL and I'm having the following issue:
When a user logs in, I need to fetch all the users that are not the logged-in user, in order to display them on the sidebar of the app.
Below each user should be displayed the latest message that was sent either by the logged-in user or by the other user.
I'm trying to execute an efficient query in order to retrieve all the users with their latest message at once but with no success.
Here are my tabels:
I tried at first to do something like that:
SELECT users.id, users.first_name, users.last_name, users.image, messages.sender_id, messages.recipient_id, messages.content
FROM users LEFT JOIN messages on users.id = messages.sender_id OR users.id = messages.recipient_id
WHERE (messages.sender_id = 1 OR messages.recipient_id = 1) AND users.id != 1
GROUP BY users.id
ORDER BY messages.created_at DESC;
And I got this error:
"1" refers to the logged user id
My temporary solution is to fetch all the users from the db, mapping over them on the server and executing another query which sends back the latest message between the users using - ORDER BY created_at DESC LIMIT 1.
I'm sure there are more efficient ways, and I would appreciate any help!
If I follow you correctly, you can use conditional logic to select the messages exchanged (sent or received) between the logged-in user and any other user, and then a join to bring the corresponding user records. To get the latest message per user, distinct on comes handy in Postgres.
Consider:
select distinct on (u.id) u.id, ... -- enumerate the columns you want here
from (
select m.*,
case when sender_id = 1 then recipient_id else sender_id end as other_user_id
from messages m
where 1 in (m.sender_id, m.recipient_id)
) m
inner join users u on u.id = m.other_user_id
order by u.id, m.created_at desc
We could also phrase this with a lateral join:
select distinct on (u.id) u.id, ...
from messages m
cross join lateral (values
(case when sender_id = 1 then recipient_id else sender_id end as other_user_id)
) as x(other_user_id)
inner join users u on u.id = x.other_user_id
where 1 in (m.sender_id, m.recipient_id)
order by u.id, m.created_at desc

Best approach for limiting rows coming back in SQL when joining for a sum

I need to get back a list of users and the total amount that they have ordered. In reality my query is more complex but I think this sums it up. My issue is, if a user made 5 orders for example, I'll get back their name and the total they've ordered 5 times due to the join (having 5 rows in the order table for that user).
What's the recommended approach for when you need to total the records in one table that has multiple rows without requiring many rows to come back? distinct could work but is this the best? (especially when my select chooses more information than what's below)
SELECT user.name, sum(order.amount) FROM USER user
INNER JOIN USER_ORDERS order
ON (user.user_id = order.user_id)
Are you just looking for GROUP BY?
SELECT u.name, SUM(o.amount)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Note that this has included user_id in the GROUP BY, just in case two users have the same name.
If you want all users, even those without orders, then you want a LEFT JOIN:
SELECT u.name, SUM(o.amount)
FROM USER u LEFT JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Or a correlated subquery:
SELECT u.name,
(SELECT SUM(o.amount)
FROM USER_ORDERS uo
WHERE u.user_id = uo.user_id
)
FROM USER u;
You could use the analytic version of SUM.
SELECT u.name, SUM(o.amount) OVER(PARTITION BY u.name)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id;

Too much Data using DISTINCT MAX

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows
You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username
There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

SQL - Removing Duplicate Rows but Keeping Null Rows

Happy New Year SO!
Problem
I'm writing a stored procedure that takes in a list of User Ids and it should return 1 record per user id that contains:
User details from the User table (first name, last name etc)
Latest modified address record from the Address table. If there IS NO Address record, then we need to return NULL in the address fields (Address, Postcode etc)
Country details from the Country table (Region etc)
Now I have the following, which is correctly returning NULL's for no address details (last record in the screen shot) BUT for a User Id that has multiple address records, I'm having multiple records returned and NOT the lastest modified address record:
SELECT
U.Id, U.FirstName, U.Surname, U.Email, U.DateOfBirth,
AD.AddressLine1, AD.AddressLine2, AD.AddressLine3,
AD.PostCode, AD.Nickname, AD.Phone, AD.Modified,
CNT.Name, CNT.Code,
a.MaxDate
FROM
#TableVariable AS List
LEFT JOIN
dbo.Users AS U ON List.Id = U.Id
LEFT JOIN
dbo.Addresses AS AD ON U.Id = AD.User_Id
LEFT JOIN
(SELECT
JA.User_Id, MAX(CONVERT(DATE,JA.Modified,10)) AS MaxDate
FROM dbo.Addresses AS JA
GROUP BY JA.User_Id) A ON (AD.User_Id = A.User_Id AND CONVERT(DATE,AD.Modified,10) = A.MaxDate)
LEFT JOIN
dbo.Countries AS CNT ON AD.Country_Id = CNT.Id
ORDER BY
AD.Modified DESC
Here is the result set after running the above. As you can see I have correctly got my record returned for a User WITHOUT an address (Last record) but I'm getting 3 records for 2108 when I wanted 1, inlcuding the latest modified address (AD.Modified).
I'm using SQL Server 2008.
You can use outer apply and order by to get the latest record, with something like this:
FROM #TableVariable AS List
LEFT JOIN dbo.Users AS U
ON List.Id = U.Id
OUTER APPLY (
SELECT top 1 *
FROM dbo.Addresses AS JA
WHERE U.Id = JA.User_Id
order by Modified DESC
) AD
LEFT JOIN dbo.Countries AS CNT
ON AD.Country_Id = CNT.Id
ORDER BY AD.Modified DESC

SQL - Join 3 tables but limit the results based on the first table

Here is the work-flow of my situation:
I have a website that allows registered users to register for prizes from local prize sponsors (ie: a business) For example, there may be a page that has Pizza Hut as a prize sponsor, and the registered/authenticated user simply clicks "Enter the Drawing" and once they do they can't click it again.
Here are the 3 tables:
BD_Listing (This table is for the prize sponsor, and includes basic columns such as ListingID, Title, ContactPerson, Email, Prize)
Users (This table is for the registered users on the site, and includes UserID, FirstName, LastName, Email, etc.)
PrizeEntry (This table is where the registrant data goes: EntryID, ListingID, User_ID, Date_Created)
Now all this works fine, as far as storing the data in the database. My problem is SELECT. I have a custom module that I'm building on the ADMIN side of the website, that I want to perform the following:
Here is my query that works:
SELECT ListingID, Title, ContactEmail, PrizeSponsor
FROM [BD_Listing]
WHERE PrizeSponsor = 'True'
If I display this data in a table on the site, it works fine. The problem is, I need more data, specifically the User_ID from the PrizeEntry table. This User_ID needs to be joined with the UserID from the Users table, because I need to pull their other info from it.
SELECT a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor
b.ListingID, b.User_ID
FROM BD_Listing a INNER JOIN PrizeEntry b ON a.ListingID = b.ListingID
WHERE a.PrizeSponsor = 'True'
Now, the first problem arises. If 20 people register for Pizza Hut, then I'm going to get 21 rows of data, which isn't what I'm going for. Here is ultimately my code to join all the information and the reasoning for it, and then you can tell me how idiotic I am for doing it wrong :)
SELECT a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor
b.ListingID, b.User_ID
c.UserID, c.FirstName, c.LastName, c.Email
,(SELECT COUNT(ListingID) AS EntryCount FROM PrizeEntry WHERE (ListingID = a.ListingID)) AS EntryCount
,(SELECT TOP 1 User_ID AS RandomWinner FROM PrizeEntry WHERE (ListingID = a.ListingID)ORDER BY NEWID()) as RandomWinner
FROM BD_Listing a INNER JOIN PrizeEntry B on a.ListingID = b.ListingID
INNER JOIN Users C on b.User_ID = c.UserID
WHERE a.PrizeSponsor = 'True'
Okay, so in my table that displays this data I just want Pizza Hut to show up ONE time, but instead because it's joined with the PrizeEntry table, it shows up multiple times.
I want to see:
Business: Pizza Hut // comes from the BD_Listing table
Contact: John Doe // comes from the BD_Listing table
Total Registrations: 20 // count from the PrizeEntry table
Random Winner: John Smith (UserID:10) // result from the subquery
But instead, I see multiple rows for each business for every time a prize is registered for.
I apologize for the length... I'm new here, and this is my second post. And going through the beginning learning curves of SQL.
Thank you so much for any advice.
First, if there are 20 users you should expect 20 rows, not 21.
I think you can simplify your query, just by removing the outer join to users, to get what you want:
SELECT a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor,
(SELECT COUNT(ListingID) AS EntryCount FROM PrizeEntry pe WHERE (pe.ListingID = a.ListingID)) AS EntryCount,
(SELECT TOP 1 User_ID AS RandomWinner FROM PrizeEntry pe WHERE (pe.ListingID = a.ListingID) ORDER BY NEWID()) as RandomWinner
FROM BD_Listing a
WHERE a.PrizeSponsor = 'True'
You want the information by prize sponsor. The correlated subqueries get the additional information. To get more information about the winner, just make this a subquery and join to the appropriate table, as below:
with t as (
SELECT a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor,
(SELECT COUNT(ListingID) AS EntryCount FROM PrizeEntry pe WHERE (pe.ListingID = a.ListingID)) AS EntryCount,
(SELECT TOP 1 User_ID AS RandomWinner FROM PrizeEntry pe WHERE (pe.ListingID = a.ListingID) ORDER BY NEWID()) as RandomWinner
FROM BD_Listing a
WHERE a.PrizeSponsor = 'True'
)
select t.*, u.Firstname, u.LastName, u.Email
from t join
users u
on t.RandomWinner = u.user_id
The simplest method is two do two queries. The first one gets the sponsor info, and the second query gets the entries.
Technically this is a case of N+1 (See: What is SELECT N+1? ) but I wouldn't worry about it here.
Theoretically you could get everything at once, then in the front end you loop through it and see if you have already printed the sponsor. The trouble is that that is so much work that doing N+1 is actually much better.
Now, if you wanted summary info about the sponsor that's different - you can do that with a GROUP BY statement.
Like this:
SELECT a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor, COUNT(*) AS EntryCount
FROM BD_Listing a INNER JOIN PrizeEntry B on a.ListingID = b.ListingID
INNER JOIN Users C on b.User_ID = c.UserID
WHERE a.PrizeSponsor = 'True'
GROUP BY a.ListingID
You're going to get all matching combinations because of the structure of your joins : as things stand you cannot get anything else.
What you need to do is to get the entrants and work out your winner first, then correlate this back to the listing:
select a.ListingID, a.Title, a.ContactEmail, a.PrizeSponsor, d.userid, d.firstname, d.lastname, d.email
from BD_Listing a inner join (
select top 1 listingid, userid, firstname, lastname, email from (
select b.ListingID, c.UserID, c.FirstName, c.LastName, c.Email, newid() as ordering
from PrizeEntry B INNER JOIN Users C on b.User_ID = c.UserID
) x order by ordering
) d on a.listingid = d.listingid