Get last instance of counted rows - sql

There are two tables jobs and users. users has a 1-to-many relation to jobs.
I want to grab the email of all users who have done 5 or more jobs.
The query below does that. However, how can I also retrieve the date of the last job done by the user.
So the desired output would be:
Email jobs done date of last of job
jack#email.com 5+ 1-20-2015
joe#email.com 5+ 2-20-2015
Query that grabs all emails of users who have done 5+ jobs
select
email
, case
when times_used >= 5
then '5+'
end as times_used
from
(
select
u.id
, u.email as email
, count(*) as times_used
from
jobs j
join users u on
j.user_id = u.id
group by
u.id
)
a
where
times_used >= 5
group by
times_used
, email

You could add a join for another derived table that pulls the last date for each user:
select
b.email,
case when times_used >= 5 then '5+' end as 'jobs done',
b.max_date 'date of last job'
from (
select u.id, count(*) as times_used
from jobs j
join users u on j.user_id = u.id
group by u.id
) a
join (
select u.id, u.email, max(j.date) max_date
from jobs j
join users u on j.user_id = u.id
group by u.id, email
) b on b.id = a.id
where times_used >= 5
But if you only want the email, number of jobs and date of the last job for all users that have 5+ jobs then you the query below should be enough:
select u.id, u.email, max(j.date) max_date
from jobs j
join users u on j.user_id = u.id
group by u.id, u.email
having count(j.id) >= 5
Both queries assume that the jobs table looks like id (pk), user_id, date so you have to adjust according to your actual table definition.

You should try WINDOW function approach, as it can be more efficient:
WITH user_jobs AS (
SELECT
u.id as user_id,
j.id as job_id,
u.email,
ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY j.date DESC) as rn,
ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY j.date) as job_number
FROM
jobs j
join users u ON j.user_id = u.id
)
SELECT
user_id,
job_id,
email,
job_number
FROM user_jobs
WHERE rn = 1 and job_number >= 5

Related

Select only those users who have the most visits to provided district

I have a query that selects users with the districts which they visited and visits count.
select users.id, places.district, count(users.id) as counts from users
left join visits on users.id = visits.user_id
inner join places on visits.place_id = places.id
group by users.id, places.district
I need to select only those users who have visited provided district the most. For example, I have a user with id 1 who visited district A one time and district B three times. If I provide district B as parameter, user 1 will be in select. If I want to select users from district A, user 1 will not be in select.
I think that's ranking, then filtering:
select *
from (
select u.id, p.district, count(*) as cnt_visits,
rank() over(partition by u.id order by count(*) desc)
from users u
inner join visits v on u.id = v.user_id
inner join places p on p.id = v.place_id
group by u.id, p.district
) t
where rn = 1 and district = ?
Note that you don't actually need table users to get this result. We could simplify the query as:
select *
from (
select v.user_id, p.district, count(*) as cnt_visits,
rank() over(partition by u.id order by count(*) desc)
from visits v
inner join places p on p.id = v.place_id
group by v.user_id, p.district
) t
where rn = 1 and district = ?
This query handles top ties: if a user had the same, maximum number of visits in two different districts, both are taken into account. If you don't need that feature, then we can simplify the subquery with distinct on:
select *
from (
select distinct on (v.user_id) v.user_id, p.district, count(*) as cnt_visits
from visits v
inner join places p on p.id = v.place_id
group by v.user_id, p.district
order by v.user_id, cnt_visits desc
) t
where district = ?

SQL combining 3 tables and get just the row with the latest date

I have 3 tables user, session and log. The user table stores all user relevant information while the session just connects the user with the log. And i want to get a list of all users with the latest log entry. The table design looks like this:
user (id, name, ...)
session (id, user_id)
log (id, session_id, time, type, ...)
My current query looks like this
SELECT *
FROM USER AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
ORDER BY l.time DESC
But it's not hard to imagine that this just returns the data of all 3 tables sorted by date. How do i achieve a result that i just get every user just once with the data from the latest log entry ordered by the time of log (desc)?
Thanks in advance for your help.
You can use DISTINCT ON in conjunction with ORDER BY to get the latest row per user by log date. This will allow you to select the additional fields you need:
SELECT DISTINCT ON (u.id)
u.id,
u.Name,
l.type,
l.time
FROM user AS u
INNER JOIN session AS s ON u.id = s.user_id
INNER JOIN log AS l ON l.session_id = s.id
ORDER BY u.id, l.time DESC;
N.B. I don't know exactly what columns you need, but I have added a couple in to demonstrate as I don't like to advocate the use of SELECT *
For completeness there are a couple of other ways to achieve this, the first is to select the max in a subquery and join back to the outer query on both user_id and time:
SELECT u.id,
u.Name,
l.type,
l.time
FROM user AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
INNER JOIN
( SELECT s.user_id, MAX(l.time) AS time
FROM session AS s
INNER JOIN log AS l
ON l.session_id = s.id
GROUP BY s.user_id
) AS MaxLog
ON MaxLog.user_id = u.id
AND MaxLog.time = l.time
ORDER BY l.time DESC;
Or you can use ROW_NUMBER():
SELECT id, Name, type, time
FROM ( SELECT u.id,
u.Name,
l.type,
l.time,
ROW_NUMBER() OVER(PARTITION BY u.id ORDER BY l.time DESC) AS RowNumber
FROM user AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
) u
WHERE RowNumber = 1;
I've assumed some schema (user.user_name?), but you can do this by grouping and an aggregate like Max:
SELECT u.user_id,
u.user_name,
Max(l.time) AS LastLogTime
FROM USER AS u
LEFT JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
GROUP BY u.user_id,
u.user_name;
You won't be able to select * as we need to use GROUP BY
Similarly, ORDER BY l.time isn't applicable any more - you could still order by e.g. user_name
I've also LEFT JOINED - this way, if the user has no sessions, it will still return a record for the user, possibly with a LastLogTime of NULL.

Writing a Mathematical Formula in SQL?

I have these tables: users, comments, ratings, and items
I would like to know if it is possible to write SQL query that basically does this:
user_id is in each table. I'd like a SQL query to count each occurrence in each table (except users of course). BUT, I want some tables to carry more weight than the others. Then I want to tally up a "score".
Here is an example:
user_id 5 occurs...
2 times in items;
5 times in comments;
11 times in ratings.
I want a formula/point system that totals something like this:
items 2 x 5 = 10;
comments 5 x 1 = 5;
ratings 11 x .5 = 5.5
TOTAL 21.5
This is what I have so far.....
SELECT u.users
COUNT(*) r.user_id
COUNT(*) c.user_id
COUNT(*) i.user_id
FROM users as u
JOIN COMMENTS as c
ON u.user_id = c_user_id
JOIN RATINGS as r
ON r.user_id = u.user_id
JOIN ITEMS as i
i.user_id = u.user_id
WHERE
????
GROUP BY u.user_id
ORDER by total DESC
I am not sure how to do the mathematical formula portion (if possible). Or how to tally up a total.
Final Code based on John Woo's Answer!
$sql = mysql_query("
SELECT u.username,
(a.totalCount * 5) +
(b.totalCount) +
(c.totalCount * .2) totalScore
FROM users u
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM items
GROUP BY user_id
) a ON a.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM comments
GROUP BY user_id
) b ON b.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM ratings
GROUP BY user_id
) c ON c.user_id = u.user_id
ORDER BY totalScore DESC LIMIT 10;");
Maybe this can help you,
SELECT u.user_ID,
(a.totalCount * 5) +
(b.totalCount) +
(c.totalCount * .2) totalScore
FROM users u LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM items
GROUP BY user_ID
) a ON a.user_ID = u.user_ID
LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM comments
GROUP BY user_ID
) b ON b.user_ID = u.user_ID
LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM ratings
GROUP BY user_ID
) c ON c.user_ID = u.user_ID
ORDER BY totalScore DESC
but based on yur query above,thismay also work
SELECT u.users
(COUNT(*) * .5) +
COUNT(*) +
(COUNT(*) * 2) totalcore
FROM users as u
LEFT JOIN COMMENTS as c
ON u.user_id = c_user_id
LEFT JOIN RATINGS as r
ON r.user_id = u.user_id
LEFT JOIN ITEMS as i
ON i.user_id = u.user_id
GROUP BY u.user_id
ORDER by totalcore DESC
The only difference is by using LEFT JOIN. You will not use INNER JOIN in this situation because there are chances that user_id is not guaranteed to exists on every table.
Hope this makes sense
Here's an alternative approach:
SELECT
u.user_id,
SUM(s.weight) AS totalScore
FROM users u
LEFT JOIN (
SELECT user_id, 5.0 AS weight
FROM items
UNION ALL
SELECT user_id, 1.0
FROM comments
UNION ALL
SELECT user_id, 0.5
FROM ratings
) s
ON u.user_id = s.user_id
GROUP BY
u.user_id
I.e. for every occurrence of every user in every table, a row with a specific weight is produced. The UNIONed set of weights is then joined to the users table for subsequent grouping and aggregating.

postgresql database

i wanna to make a query that select users that have same username and same hour of creation date by using postgresql database
Something like this should do the trick. This will return any user/hour pair along with the count (untested):
select users.username, datepart('hour', users.created_at), count(*) from users
inner join users u2
on users.username = u2.username
and datepart('hour', users.created_at) = datepart('hour', u2.created_at)
group by users.username, datepart('hour', users.created_at) having count(*) > 1
select u.*
from users u
join (
select username, date_trunc('hour', creation_timestamp)
from users
group by 1, 2
having count(*) > 1
) as x on u.username = x.username
order by u.username;
Should work nicely.

How to select all users who made more than 10 submissions

I have a submission table that is very simple: userId, submissionGuid
I want to select the username (simple inner join to get it) of all the users who have more than 10 submissions in the table.
I would do this with embedded queries and a group by to count submissions... but is there a better way of doing it (without embedded queries)?
Thanks!
This is the simplest way, I believe:
select userId
from submission
group by userId
having count(submissionGuid) > 10
select userId, count(*)
from submissions
having count(*) > 10
group by userId
SELECT
username
FROM
usertable
JOIN submissions
ON usertable.userid = submissions.userid
GROUP BY
usertable.username
HAVING
Count(*) > 1
*Assuming that your "Users" table is call usertable and that it has a column called "UserName"
I think the correct query is this (SQL Server):
SELECT s.userId, u.userName
FROM submission s INNER JOIN users u on u.userId = s.userId
GROUP BY s.userId, u.username
HAVING COUNT(submissionGuid) > 10
If you don't have the HAVING clause:
SELECT u.userId, u.userName
FROM users u INNER JOIN (
SELECT userId, COUNT(submissionGuid) AS cnt
FROM submission
GROUP BY userId ) sc ON sc.userId = u.userId
WHERE sc.cnt > 10
select userid, count(submissionGUID) as submitCount
from Submissions
group by userid, submitCount
having submitCount > 10