Best way to construct this query? [duplicate] - sql

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Retrieving the last record in each group
I have two tables set up similar to this (simplified for the quest):
actions-
id - user_id - action - time
users -
id - name
I want to output the latest action for each user. I have no idea how to go about it.
I'm not great with SQL, but from what I've looked up, it should look something like the following. not sure though.
SELECT `users`.`name`, *
FROM users, actions
JOIN < not sure what to put here >
ORDER BY `actions`.`time` DESC
< only one per user_id >
Any help would be appreciated.

SELECT * FROM users JOIN actions ON actions.id=(SELECT id FROM actions WHERE user_id=users.id ORDER BY time DESC LIMIT 1);

you need to do a groupwise max - please refer to examples here http://jan.kneschke.de/projects/mysql/groupwise-max/
here's an example i did for somone else which is similar to your requirements:
http://pastie.org/925108
select
u.user_id,
u.username,
latest.comment_id
from
users u
left outer join
(
select
max(comment_id) as comment_id,
user_id
from
user_comment
group by
user_id
) latest on u.user_id = latest.user_id;

select u.name, a.action, a.time
from user u, action a
where u.id = a.user_id
and a.time in (select max(time) from action where user_id = u.user_id group by user_id )
note untested - but this should be the pattern

DECLARE #Table (ID Int, User_ID, Time DateTime)
-- This gets the latest entry for each user
INSERT INTO #Table (ID, User_ID, Time)
SELECT ID, User_ID, MAX(TIME)
FROM actions z
INNER JOIN users x on x.ID = z.ID
GROUP BY z. userID
-- Join to get resulting action
SELECT z.user_ID, z.Action
FROM actions z
INNER JOIN #Table x on x.ID = z.ID

This is the greatest-n-per-group problem that comes up frequently on Stack Overflow. Follow the tag for dozens of other posts on this problem.
Here's how to do it in MySQL given your schema with no subqueries and no GROUP BY:
SELECT u.*, a1.*
FROM users u JOIN actions a1 ON (u.id = a1.user_id)
LEFT OUTER JOIN actions a2 ON (u.id = a2.user_id AND a1.time < a2.time)
WHERE a2.id IS NULL;
In other words, show the user with her action such that if we search for another action with the same user and a later time, we find none.

It seems to me that the following will be works
WITH GetMaxTimePerUser (user_id, time) (
SELECT user_id, MAX(time)
FROM actions
GROUP BY user_id
)
SELECT u.name, a.action, amax.time
FROM actions AS a
INNER JOIN users AS u ON u.id=a.user_id
INNER JOIN GetMaxTimePerUser AS u_maxtime ON u_maxtime.user_id=u.id
WHERE a.time=u_maxtime.time
Usage of temporary named result set (common table expression or CTE) without subqueries and OUTER JOIN is the way best opened for query optimization. (CTE is something like a VIEW but existing only virtual or inline)

Related

How to print two attribute values from your Sub query table

Suppose I have two tables,
User
Post
Posts are made by Users (i.e. the Post Table will have foreign key of user)
Now my question is,
Print the details of all the users who have more than 10 posts
To solve this, I can type the following query and it would give me the desired result,
SELECT * from USER where user_id in (SELECT user_id from POST group by user_id having count(user_id) > 10)
The problem occurs when I also want to print the Count of the Posts along with the user details. Now obtaining the count of user is not possible from USER table. That can only be done from POST table. But, I can't get two values from my subquery, i.e. I can't do the following,
SELECT * from USER where user_id in (SELECT user_id, **count(user_id)** from POST group by user_id having count(user_id) > 10)
So, how do I resolve this issue? One solution I know is this, but this I think it would be a very naive way to resolve this and will make the query much more complex and also much more slow,
SELECT u.*, (SELECT po.count(user_id) from POST as po group by user_id having po.count(user_id) > 10) from USER u where u.user_id in (SELECT p.user_id from POST p group by user_id having p.count(user_id) > 10)
Is there any other way to solve this using subqueries?
Move the aggregation to the from clause:
SELECT u.*, p.num_posts
FROM user u JOIN
(SELECT p.user_id, COUNT(*) as num_posts
FROM post p
GROUP BY p.user_id
HAVING COUNT(*) > 10
) p
ON u.user_id = p.user_id;
You can do this with subqueries:
select u.*
from (select u.*,
(select count(*) from post p where p.user_id = u.user_id) as num_posts
from users u
) u
where num_posts > 10;
With an index on post(user_id), this might actually have better performance than the version using JOIN/GROUP BY.
You can try by joining the tables, Prefer to do a JOIN than using SUBQUERY
SELECT user.*, count( post.user_id ) as postcount
FROM user LEFT JOIN post ON users.user_id = post.user_id
GROUP BY post.user_id
HAVING postcount > 10 ;

How to pull the count of occurences from 2 SQL tables

I am using python on a SQlite3 DB i created. I have the DB created and currently just using command line to try and get the sql statement correct.
I have 2 tables.
Table 1 - users
user_id, name, message_count
Table 2 - messages
id, date, message, user_id
When I setup table two, I added this statement in the creation of my messages table, but I have no clue what, if anything, it does:
FOREIGN KEY (user_id) REFERENCES users (user_id)
What I am trying to do is return a list containing the name and message count during 2020. I have used this statement to get the TOTAL number of posts in 2020, and it works:
SELECT COUNT(*) FROM messages WHERE substr(date,1,4)='2020';
But I am struggling with figuring out if I should Join the tables, or if there is a way to pull just the info I need. The statement I want would look something like this:
SELECT name, COUNT(*) FROM users JOIN messages ON messages.user_id = users.user_id WHERE substr(date,1,4)='2020';
One option uses a correlated subquery:
select u.*,
(
select count(*)
from messages m
where m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
) as cnt_messages
from users u
This query would take advantage of an index on messages(user_id, date).
You could also join and aggregate. If you want to allow users that have no messages, a left join is a appropriate:
select u.name, count(m.user_id) as cnt_messages
from users u
left join messages m
on m.user_id = u.user_id and m.date >= '2020-01-01' and m.date < '2021-01-01'
group by u.user_id, u.name
Note that it is more efficient to filter the date column against literal dates than applying a function on it (which precludes the use of an index).
You are missing a GROUP BY clause to group by user:
SELECT u.user_id, u.name, COUNT(*) AS counter
FROM users u JOIN messages m
ON m.user_id = u.user_id
WHERE substr(m.date,1,4)='2020'
GROUP BY u.user_id, u.name

Getting single record in subquery

I have a DB2 database with multiple user records, which are related to a newsletter record through a couple intermediate tables.
My problem is that I need to get just the users where the latest newsletter they received was in the last week. I've been banging my head against this for hours and still haven't found a way to cleanly get the records I need. I thought this would be the solution, but I keep running into very generic errors that I don't really understand the cause of.
SELECT a.*, d.tech_id as newsletter FROM users a
JOIN user_profile b ON b.tech_id = a.profile_id
JOIN user_contact c ON c.user_id = b.tech_id
JOIN (
SELECT newsletters.tech_id, ROW_NUMBER() OVER (ORDER BY timestamp(tech_id) DESC) AS RN
FROM NEWSLETTERS
) d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.rn) < current_timestamp - 7 days)
Is there a better way to do this, or am I missing an obvious problem?
EDIT:
This is what I'd like to be doing, though it doesn't work right either:
SELECT a.* as newsletter FROM users a
WHERE (
SELECT MAX(timestamp(newsletters.tech_id))
FROM newsletters
WHERE newsletters.tech_id IN(
SELECT newsletter_id FROM user_contact WHERE user_contact.profile_id = a.tech_id
)
) < current_timestamp - 7 days
The structure is pretty straightforward. The users table has a foreign key of profile_id which is keyed to the user_profile.tech_id. The user_contact.user_id field is keyed to the user_profile.tech_id. And the user_contact table has a foreign key called user_contact.newsletter_id that is keyed to the newsletters.tech_id
RN is a row number generated based on the latest timestamp of the column. It should not be compared with dates in the final where clause. However, in your row_number function, you have ordered by timestamp(tech_id) which won't work as expected if tech_id is not a datetime datatype.
As per your requirement, row_number isn't needed.
Try the query below.
SELECT a.*, d.tech_id as newsletter
FROM users a
JOIN user_profile b ON b.tech_id = a.profile_id
JOIN user_contact c ON c.user_id = b.tech_id
JOIN NEWSLETTERS d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.datecolumn) < current_timestamp - 7 days)
--change the d.datecolumn to the datetime column in the table
It seems that you need to use GROUP BY. Try something like this:
SELECT a.*
FROM users a
JOIN (
SELECT d.tech_id as newsletter, MAX(d.datecol)
FROM user_profile b
JOIN user_contact c ON c.user_id = b.tech_id
JOIN NEWSLETTERS d ON d.tech_id = c.newsletter_id
WHERE (timestamp(d.datecol) < current_timestamp - 7 days)
GROUP BY d.tech_id
) e ON e.tech_id = a.profile_id

Fetching latest item from a relative collection

I got two tables, User and UserActivity.
How do I write a SQL query which fetches each user and it's latest activity? UserActivity.UserId references User.Id.
Might sound simple but I can't figure out how to get the latest entry from UserActivity for each user.
try this
Select u.*
ua.*
from user u
join useractivity ua on ua.userid = u.userid
join (select userid, max(useractivityid) from useractivity groupy by userid) um
on um.useractivityid = ua.useractivityid
Let's supose that your tables are:
UserT( Id, name )
UserActivity( UserId, sessionNumber, activityTimeStamp)
And when you say latest activity you are talking about the last moment that this user has activity.
In this case, the query is:
select
UserT.name,
max( activityTimeStamp ) as latestActivity
from
UserT left outer join
UserActivity UA on UA.UserId = UserT.Id
group by
UserT.Id, UserT.name
Yes, is a simple query. Only complexity is grouping by users and get aggregated max time.
Regards and sorry about answer delay. I have a little lag today ;)
If you are talking about all columns of activity, then use CTE:
;with cte as (
select
UA.*,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY activityTimeStamp DESC) as RN,
from
UserActivity UA )
select
UserT.*,
cte.*
from
UserT left outer join
cte on cte.RN = 1 and cte.UserId = UserT.Id

Query for newest record in a table, store query as view

I'm trying to turn this query into a view:
SELECT t.*
FROM user t
JOIN (SELECT t.UserId,
MAX( t.creationDate ) 'max_date'
FROM user t
GROUP BY t.UserId) x ON x.UserId = t.UserId
AND x.max_date = t.creationDate
But views do not accept subqueries.
What this does is look for the latest, newest record of a user.
I got the idea from this other stackoverflow question
Is there a way to turn this into a query with joins, perhaps?
Create two views
Create View MaxCreationDate
As
SELECT t.userId, Max(t2.CreationDate) MaxCreated
FROM user t
Group By t.UserId
Create View UserWithMaxDate
As
Select t.*, m.MaxCreated From user t
Join MaxCreationDate m
On m.UserId= t.UserId
and then just call the second one...
EDIT: hey, based on comment from Quassnoi, and your inclusion of
where t.CreationDate = MaxDate in yr orig sql, I wonder if you want to see all rows for each distinct user, with the max creation date for that user in every row, or, do you want only one row per user, the one row that was created most recently?
If the latter is the case, as #Quassnoi suggested in comment, change the second view query as follows
Create View UserWithMaxDate
As
Select t.*, m.MaxCreated From user t
Join MaxCreationDate m
On m.UserId= t.UserId
And m.MaxCreated = t.Creationdate
CREATE INDEX ix_user_userid_creationdate_id ON user (userid, creationdate, id);
CREATE VIEW v_duser AS
SELECT DISTINCT userId
FROM user;
CREATE VIEW v_lastuser AS
SELECT u.*
FROM v_duser ud
JOIN user u
ON u.id =
(
SELECT ui.id
FROM user ui
WHERE ui.userid = ud.userid
ORDER BY
ui.userid DESC, ui.creationdate DESC, ui.id DESC
LIMIT 1
);
This is fast and deals with possible duplicates on (userid, creationdate).