Too much Data using DISTINCT MAX - sql

I want to see the last activity each individual handset and the user that used that handset. I have a table UserSessions that stores the last activity of a particular user as well as what handset they used in that activity. There are roughly 40 handsets, yet I always get back way too many records, like 10,000 rows when I only want the last activity of each handset. What am I doing wrong?
SELECT DISTINCT MAX(UserSessions.LastActivity), Handsets.Name,Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE
Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY UserSessions.LastActivity, Handsets.Name,Users.Username
I expect to get one record per handset of the users last activity with that handset. What I get is multiple records on all handsets and dates over 10000 rows

You typically GROUP BY the same columns as you SELECT, except those who are arguments to set functions.
This GROUP BY returns no duplicates, so SELECT DISTINCT isn't needed.
SELECT MAX(UserSessions.LastActivity), Handsets.Name, Users.Username
FROM UserSessions
INNER JOIN Handsets on Handsets.HandsetId = UserSessions.HandsetId
INNER JOIN Users on Users.UserId = UserSessions.UserId
WHERE Handsets.Name in (1000,1001.1002,1003,1004....)
AND Handsets.Deleted = 0
GROUP BY Handsets.Name, Users.Username

There is no such thing as DISTINCT MAX. You have SELECT DISTINCT which ensures that all columns referenced in the SELECT are not duplicated (as a group) across multiple rows. And there is MAX() an aggregation function.
As a note: SELECT DISTINCT is almost never appropriate with GROUP BY.
You seem to want:
SELECT *
FROM (SELECT h.Name, u.Username, MAX(us.LastActivity) as last_activity,
RANK() OVER (PARTITION BY h.Name ORDER BY MAX(us.LastActivity) desc) as seqnum
FROM UserSessions us JOIN
Handsets h
ON h.HandsetId = us.HandsetId INNER JOIN
Users u
ON u.UserId = us.UserId
WHERE h.Name in (1000,1001.1002,1003,1004....) AND
h.Deleted = 0
GROUP BY h.Name, u.Username
) h
WHERE seqnum = 1

Related

Best approach for limiting rows coming back in SQL when joining for a sum

I need to get back a list of users and the total amount that they have ordered. In reality my query is more complex but I think this sums it up. My issue is, if a user made 5 orders for example, I'll get back their name and the total they've ordered 5 times due to the join (having 5 rows in the order table for that user).
What's the recommended approach for when you need to total the records in one table that has multiple rows without requiring many rows to come back? distinct could work but is this the best? (especially when my select chooses more information than what's below)
SELECT user.name, sum(order.amount) FROM USER user
INNER JOIN USER_ORDERS order
ON (user.user_id = order.user_id)
Are you just looking for GROUP BY?
SELECT u.name, SUM(o.amount)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Note that this has included user_id in the GROUP BY, just in case two users have the same name.
If you want all users, even those without orders, then you want a LEFT JOIN:
SELECT u.name, SUM(o.amount)
FROM USER u LEFT JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Or a correlated subquery:
SELECT u.name,
(SELECT SUM(o.amount)
FROM USER_ORDERS uo
WHERE u.user_id = uo.user_id
)
FROM USER u;
You could use the analytic version of SUM.
SELECT u.name, SUM(o.amount) OVER(PARTITION BY u.name)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id;

Include 0 in count(*) SQL query

I have two entities, User and MaBase. MaBase contains user_id and status. I want to get the count of status by user, I also want to show a 0 for any status values where the user doesn't have a record.
I created the below query using count, but it only returns non-null values. How I can solve this:
SELECT status, COUNT(*)
FROM ma_base
WHERE ma_base.user_id = 5
GROUP BY status
I have 5 types of status values. If a user only has ma_base records for 4 of them, I still want to see a 0 value for the 5th status.
It's not every day I get to write a CROSS JOIN:
SELECT u.ID, s.status,
coalesce((SELECT COUNT(*) FROM ma_base m WHERE m.User_Id = u.ID and m.status = s.Status),0) As Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
WHERE u.ID = 5
OR:
SELECT u.ID, s.status, COALESCE(COUNT(m.status), 0) AS Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
LEFT JOIN MA_Base m ON m.User_Id = u.ID AND m.status = s.status
WHERE u.ID = 5
GROUP BY u.ID, s.status
In a nutshell, we first need to create a projection for the user with every possible status value, to anchor the result records for your "missing" statuses. Then we can JOIN or do a correlated subquery to get your desired results.
For the JOIN option, note the expression in the COUNT() function. It's important; COUNT(*) won't do what you want. For both options, note the use of COALESCE() to put the expected result in for NULL.
If you have a separate table defining your status values, use that instead of deriving them from ma_base.

join 2 foreign key using subquery

help me solve this, i am intended to join 2 table for 2 different foreign key within the same column, table snapshot provide below:
users table
transactions table
i want to return top 5 based on transactions amount from high-low alongside to display transactions id, investor id, investor name, borrower id, borrower name, amount
the following run properly but contains no investor name
select top 5 t.id,
investor_id,
borrower_id,
username as BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
minus investor name result table
while if i do subquery resulting error
select top 5 t.id,
investor_id,
(select username from users join transactions on users.id =
transactions.investor_id) investorName,
borrower_id,
username BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
select top 5 t.id,
investor_id, ui.username as InvestorName,
borrower_id, ub.username as BorrowerName,
amount
from transactions t
join users ub on t.borrower_id = ub.id
join users ui on t.investor_id = ui.id
order by t.amount desc;
The Subquery must be scalar. i.e. return a single value, but you currently return a result set.
select top 5 t.id,
investor_id,
(-- Correlated Scalar Subquery, returns a single value
select username
from users
WHERE users.id = transactions.investor_id) investorName,
borrower_id,
username BorrowerName,
amount
from transactions t join users u on t.borrower_id = u.id
order by t.amount desc;
Isn't this what you want? Two joins on users table
SELECT TOP 5
investor_id,
investors.username InvestorName,
borrower_id,
borrowers.username BorrowerName,
amount
FROM
transactions
INNER JOIN users investors ON (transactions.investor_id = investors.id)
INNER JOIN users borrowers ON (transactions.borrower_id = borrowers.id)
ORDER BY
amount desc;
I would recommend against using subqueries in this case, since the database will be forced to perform two sequential scans in a nested loop for each row.

Select last record out of grouped records

i have this code and i want someone to help me to change it to a grouped query which orders froms below.
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
Left Join photo_comment
On users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
what i mean is i have grouped of records which i want to select one per record only from beneath. for example if i have 10 records of the name "John" i want to select the last "John" out of the 10 and then the rest also follows
I'm going to presume that your users table contains a single user, and each user has a single profile, and your photo_comment table can contain multiple comments.
Depending on your RDBMS, you can do this a number of ways. Row_Number can often be a quick way of doing this if you're using a database which supports window functions such as SQL Server or Oracle.
A generic solution to this is to join the table back to itself using the MAX aggregate. This is dependent on having a field to determine which record is the max. Generally speaking, that would be an identity/auto number field or a time stamp field.
Here is the basic concept using photo_comment_id as your determining column:
SELECT *
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT Join (
SELECT pic_id, MAX(photo_comment_id) max_photo_comment_id
FROM max_photo_comment
GROUP BY pic_id
) max_photo_comment On users_pics.u_pic_id = max_photo_comment.pic_id
LEFT Join photo_comment On
max_photo_comment.pic_id = photo_comment.pic_id AND
max_photo_comment.max_photo_comment_id = photo_comment.photo_comment_id
WHERE users_pics.wardrobe = MMColParam
ORDER BY u_pic_id asc
If your database supports ROW_NUMBER, then you can do this as well (still using the photo_comment_id field):
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY photo_comment.pic_id
ORDER BY photo_comment.photo_comment_id DESC) rn
FROM dbo.users_pics INNER JOIN profile
ON users_pics.email = profile.email
LEFT JOIN photo_comment
ON users_pics.u_pic_id = photo_comment.pic_id
WHERE users_pics.wardrobe = MMColParam
) t
WHERE rn = 1
ORDER BY u_pic_id asc

Distinct mixing rows values?

I'm getting the result with mixed dates values, instead get the last revision for each title i get them mixed.
I'm using MySQL.
The general idea is retireve all rows for each entry, the last revision of each entry.
My current sql query:
SELECT DISTINCT
w.owner_id,
w.date,
w.title,
MAX(w.revision),
u.name AS updater
FROM wiki_pages AS w
JOIN users AS u ON w.owner_id = u.id
GROUP BY title
ORDER BY title ASC
SQL TABLE
Use:
SELECT wp.owner_id,
wp.date,
wp.title,
wp.revision,
u.name AS updater
FROM WIKI_PAGES wp
JOIN USERS u ON u.id = wp.owner_id
JOIN (SELECT t.title,
MAX(t.revision) AS max_rev
FROM WIKI_PAGES t
GROUP BY t.title) x ON x.title = wp.title
AND x.max_rev = wp.revision
In your query, the only thing you can guarantee is the title & the revision value is the highest. The other rows aren't necessarily related, hence the join to a derived table...