Joining a subquery with multiple associcated rows in subquery - sql

I have a query that selects a set of Users. Each User can have a number
of Events associated with it. I want to join each User with the earliest Event
associated with that User (resulting in one row per User), and do so within a single query.
So, I kind of want to do this:
SELECT * FROM users
left join (
select * from events where events.user_id = users.id
order by start_time limit 1) as event
ON ("event"."user_id" = "users"."id")
but it is illegal to reference 'users' within the join's select.

You can use a subquery to get the min(start_time) for each user_id. Then you will use this result to join back to the events table to get the details of the min event:
SELECT *
FROM users u
LEFT JOIN
(
SELECT Min(start_time) Min_Start, user_id
FROM events
GROUP BY user_id
) e1
ON u.id = e1.user_id
LEFT JOIN events e2
ON e1.user_id = e2.user_id
AND e1.min_start = e2.start_time
If you are using a database that has the ability to apply a row_number(), then you could use the following:
select *
from
(
SELECT *,
row_number() over(partition by e.user_id order by start_time) rn
FROM users u
LEFT JOIN events e
ON u.id = e.user_id
) src
where rn = 1

In most databases, you can use row_number() for this:
SELECT *
FROM users u left join
(select e.*, row_number() over (partition by e.user_id order by start_time) seqnum
from events e
) e
on e.user_id = u.id
MySQL and MS Access do not support this function, but most other databases do (and you do not specify what database you are using).

Related

Issue with getting the rank of a user based on combined columns in a join table

I have a users table and each user has flights in a flights table. Each flight has a departure and an arrival airport relationship within an airports table. What I need to do is count up the unique airports across both departure and arrival columns (flights.departure_airport_id and flights.arrival_airport_id) for each user, and then assign them a rank via dense_rank and then retrieve the rank for a given user id.
Basically, I need to order all users according to how many unique airports they have flown to or from and then get the rank for a certain user.
Here's what I have so far:
SELECT u.rank FROM (
SELECT
users.id,
dense_rank () OVER (ORDER BY count(DISTINCT (flights.departure_airport_id, flights.arrival_airport_id)) DESC) AS rank
FROM users
LEFT JOIN flights ON users.id = flights.user_id
GROUP BY users.id
) AS u WHERE u.id = 'uuid';
This works, but does not actually return the desired result as count(DISTINCT (flights.departure_airport_id, flights.arrival_airport_id)) counts the combined airport ids and not each unique airport id separately. That's how I understand it works, anyway... I'm guessing that I somehow need to use a UNION join on the airport id columns but can't figure out how to do that.
I'm on Postgres 13.0.
I would recommend a lateral join to unpivot, then aggregation and ranking:
select *
from (
select f.user_id,
dense_rank() over(order by count(distinct a.airport_id) desc) rn
from flights f
cross join lateral (values
(f.departure_airport_id), (f.arrival_airport_id)
) a(airport_id)
group by f.user_id
) t
where user_id = 'uuid'
You don't really need the users table for what you want, unless you do want to allow users without any flight (they would all have the same, highest rank). If so:
select *
from (
select u.id,
dense_rank() over(order by count(distinct a.airport_id) desc) rn
from users u
left join flights f on f.user_id = u.id
left join lateral (values
(f.departure_airport_id), (f.arrival_airport_id)
) a(airport_id) on true
group by u.id
) t
where id = 'uuid'
You're counting the distinct pairs of (departure_airport_id, arrival_airpot_id). As you suggested, you could use union to get a single column of airport IDs (regardless of whether they are departure or arrival airports), and then apply a count on them:
SELECT user_id, DENSE_RANK() OVER (ORDER BY cnt DESC) AS user_rank
FROM (SELECT u.id AS user_id, COALESCE(cnt, 0) AS cnt
FROM users u
LEFT JOIN (SELECT user_id, COUNT DISTINCT(airport_id) AS cnt
FROM (SELECT user_id, departure_airport_id AS airport_id
FROM flights
UNION
SELECT user_id, arrival_airport_id AS airport_id
FROM flights) x
GROUP BY u.id) f ON u.id = f.user_id) t

SQL conditional field, first match JOIN

Lets imagine I have two tables:
user
--userid
--fname
--lname
widget
--id
--userid
--value
user.userid = widget.userid
I want to see the full list of users with the Widget.value if they have one, AND(!) the first match if there are more than 1 widget. No widget = null field
id fname lname value
1 John Doe X8
I can not do simple joins, cos if there is no 'widget.value' for some 'user' user won't be displayed
CROSS APPLY doesn't work as well
I need
1 widget = value
2 widgets = first one
0 widgets = null field
using top with ties:
select top 1 with ties
u.*, w.id, w.value
from dbo.user u
left join dbo.widget w
on u.userid = w.userid
order by row_number() over (partition by u.userid order by w.id);
using common table expression with row_number()
;with cte as (
select u.*, w.id, w.value
, rn = row_number() over (partition by u.userid order by w.id)
from dbo.user u
left join dbo.widget w
on u.userid = w.userid
)
select *
from cte
where rn = 1;
outer apply should do what you want:
select u.*, w.value
from user u outer apply
(select top 1 w.*
from widgets w
where w.userid = u.userid
order by id -- or however you define the first one
) w;
Try this:
SELECT
u.userid, u.fname, u.lname, w.value
FROM user as u
LEFT JOIN
(
SELECT w1.*
FROM widget as w1
INNER JOIN
(
SELECT userid, MAX(id) AS LatestId
FROM widget
GROUP BY userid
) AS w2 ON w1.userid = w2.userid and w1.id = w2.latestid
) AS w ON u.userid = w.userid;
The inner join with subquery with max and group by, will give you the latest row for each userid if any. So for those with more than 1 row you will get the latest one.
There is no date, so I assumed the max id is the latest one, which might not always the case.
LEFT JOIN will include those rows with un matched rows form the widget table, so if there is a user with no widget you will get a value null.
SELECT u.userid, w.value
FROM user u
OUTER APPLY (
SELECT TOP 1 w.value
FROM widget w
WHERE w.userid = u.userid
ORDER BY w.id --order by whatever makes a widget the first one
) w

How to get latest DETAIL entry against the MASTER entry?

I have 2 tables
1. User Master
user_id, user_full_name, user_dob...so on
2. Login Details
login_id, login_user_id, login_time, login_date, logout_time
Problem
2nd table has n number of rows against User Master table id
I need to make a join but the condition is that it should show only last login data of the user
example
user_full_name, user_login, user_logout so on...
If you want the result for a single user, you could use a simple INNER JOIN combined with an ORDER BY and TOP 1:
SELECT TOP 1 user_full_name, login_time, login_date, logout_time
FROM Users INNER JOIN Logins ON
Users.user_id = Logins.user_id
WHERE
Users.user_id = #user_id
ORDER BY login_date DESC, login_time DESC
(See SQLFiddle)
If you want the result for all users, you could use CROSS APPLY:
SELECT user_full_name, l.*
FROM Users u CROSS APPLY (
SELECT TOP 1 login_time, login_date, logout_time
FROM Logins
WHERE
u.user_id = Logins.user_id
ORDER BY login_date DESC, login_time DESC
) l
(See SQLFiddle)
A common solution for this problem is to use the row_number window function and filter for rows with row number 1 in each partition (by user, ordered by date/time):
WITH UserDetails AS (
SELECT
*
, ROW_NUMBER() OVER (PARTITION BY login_user_id
ORDER BY login_date DESC, login_time DESC) AS RN
FROM LoginDetails
)
SELECT *
FROM UserMaster M
JOIN UserDetails D ON M.user_id = D.login_user_id
WHERE D.RN = 1;
You could try using a TOP 1 inside the JOIN clause:
SELECT a.user_id, a.user_full_name, b.login_id...
FROM UserMaster a INNER JOIN Logins b ON b.login_date =
(
SELECT TOP 1 login_date
FROM Logins
WHERE login_user_id = a.user_id
ORDER BY login_date DESC
)

Select all threads and order by the latest one

Now that I got the Select all forums and get latest post too.. how? question answered, I am trying to write a query to select all threads in one particular forum and order them by the date of the latest post (column "updated_at").
This is my structure again:
forums forum_threads forum_posts
---------- ------------- -----------
id id id
parent_forum (NULLABLE) forum_id content
name user_id thread_id
description title user_id
icon views updated_at
created_at created_at
updated_at
last_post_id (NULLABLE)
I tried writing this query, and it works.. but not as expected: It doesn't order the threads by their last post date:
SELECT DISTINCT ON(t.id) t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC;
How can I solve this one?
Assuming you want a single row per thread and not all rows for all posts.
DISTINCT ON is still the most convenient tool. But the leading ORDER BY items have to match the expressions of the DISTINCT ON clause. If you want to order the result some other way, you need to wrap it into a subquery and add another ORDER BY to the outer query:
SELECT *
FROM (
SELECT DISTINCT ON (t.id)
t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC
) sub
ORDER BY updated_at DESC;
If you are looking for a query without subquery for some unknown reason, this should work, too:
SELECT DISTINCT
t.id
, first_value(u.username) OVER w AS username
, first_value(p.updated_at) OVER w AS updated_at
, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
WINDOW w AS (PARTITION BY t.id ORDER BY p.updated_at DESC)
ORDER BY updated_at DESC;
There is quite a bit going on here:
The tables are joined and rows are selected according to JOIN and WHERE clauses.
The two instances of the window function first_value() are run (on the same window definition) to retrieve username and updated_at from the latest post per thread. This results in as many identical rows as there are posts in the thread.
The DISTINCT step is executed after the window functions and reduces each set to a single instance.
ORDER BY is applied last and updated_at references the OUT column (SELECT list), not one of the two IN columns (FROM list) of the same name.
Yet another variant, a subquery with the window function row_number():
SELECT id, username, updated_at, title
FROM (
SELECT t.id
, u.username
, p.updated_at
, t.title
, row_number() OVER (PARTITION BY t.id
ORDER BY p.updated_at DESC) AS rn
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
) sub
WHERE rn = 1
ORDER BY updated_at DESC;
Similar case:
Return records distinct on one column but order by another column
You'll have to test which is faster. Depends on a couple of circumstances.
Forget the distinct on:
SELECT t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY p.updated_at DESC;

SQL combining 3 tables and get just the row with the latest date

I have 3 tables user, session and log. The user table stores all user relevant information while the session just connects the user with the log. And i want to get a list of all users with the latest log entry. The table design looks like this:
user (id, name, ...)
session (id, user_id)
log (id, session_id, time, type, ...)
My current query looks like this
SELECT *
FROM USER AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
ORDER BY l.time DESC
But it's not hard to imagine that this just returns the data of all 3 tables sorted by date. How do i achieve a result that i just get every user just once with the data from the latest log entry ordered by the time of log (desc)?
Thanks in advance for your help.
You can use DISTINCT ON in conjunction with ORDER BY to get the latest row per user by log date. This will allow you to select the additional fields you need:
SELECT DISTINCT ON (u.id)
u.id,
u.Name,
l.type,
l.time
FROM user AS u
INNER JOIN session AS s ON u.id = s.user_id
INNER JOIN log AS l ON l.session_id = s.id
ORDER BY u.id, l.time DESC;
N.B. I don't know exactly what columns you need, but I have added a couple in to demonstrate as I don't like to advocate the use of SELECT *
For completeness there are a couple of other ways to achieve this, the first is to select the max in a subquery and join back to the outer query on both user_id and time:
SELECT u.id,
u.Name,
l.type,
l.time
FROM user AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
INNER JOIN
( SELECT s.user_id, MAX(l.time) AS time
FROM session AS s
INNER JOIN log AS l
ON l.session_id = s.id
GROUP BY s.user_id
) AS MaxLog
ON MaxLog.user_id = u.id
AND MaxLog.time = l.time
ORDER BY l.time DESC;
Or you can use ROW_NUMBER():
SELECT id, Name, type, time
FROM ( SELECT u.id,
u.Name,
l.type,
l.time,
ROW_NUMBER() OVER(PARTITION BY u.id ORDER BY l.time DESC) AS RowNumber
FROM user AS u
INNER JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
) u
WHERE RowNumber = 1;
I've assumed some schema (user.user_name?), but you can do this by grouping and an aggregate like Max:
SELECT u.user_id,
u.user_name,
Max(l.time) AS LastLogTime
FROM USER AS u
LEFT JOIN session AS s
ON u.id = s.user_id
INNER JOIN log AS l
ON l.session_id = s.id
GROUP BY u.user_id,
u.user_name;
You won't be able to select * as we need to use GROUP BY
Similarly, ORDER BY l.time isn't applicable any more - you could still order by e.g. user_name
I've also LEFT JOINED - this way, if the user has no sessions, it will still return a record for the user, possibly with a LastLogTime of NULL.