SQL: How to Order By And Limit Via a Join - sql

As an example, let's say I have the following query:
select
u.id,
u.name,
(select s.status from user_status s where s.user_id = u.id order by s.created_at desc limit 1) as status
from
user u
where
u.active = true;
The above query works great. It returns the most recent user status for the selected user. However, I want to know how to get the same result using a join on the user_status table, instead of using a sub-query. Is something like this possible?
I'm using PostgreSQL.
Thank you for any help you can give!

select u.id , b.status
from user u join user_status b on u.id= b.user_id
where u.active = true;
order by b.s.created_at desc limit 1
i think this work.
but in your code there is "b" which i did not know what it is.

JOIN syntax does not directly offer order or limit as options; so strictly speaking you cannot achieve what you want directly as a join. I believe the easiest way to resolve your question is to use a joined subquery, like this:
select
u.id
, u.name
, s.status
from user u
left join (
select
user_id
, status
, row_number() over(partition by user_id
order by created_at desc) as rn
from user_status s
) s on u.id = s.userid and s.rn = 1
where u.active = true;
Here the analytic function row_number() combined with the over() clause enables the subsequent join condition and s.rn=1 to take advantage of both ordering and limiting the joined rows via the calculation of the rn value.
nb a correlated subquery within the select clause (as used in the question's query) acts like a left join because it can return NULL. If that effect isn't needed or desired you can change to an inner join.
It is possible to move that subquery into a CTE, but unless there are compelling reasons to do so I prefer using the more traditional form seen above.
An alternative approach (for Postgres 9.3 or later) is to use a lateral join which is quite similar to the original subquery, but as it becomes part of the from clause is likely to be more efficient that using that subquery in the select clause.
select
u.id
, u.name
, s.status
from user u
left join lateral (
select user_status.status
from user_status
where user_status.user_id = u.id
order by user_status.created_at desc
limit 1
) s ON true
where u.active = true;

I ended up doing the following, which is working great for me:
select
u.id,
u.name,
us.status
from
user u
left join (
select
distinct on (user_id)
*
from
user_status
order by
user_id,
created_at desc
) as us on u.id = vs.user_id
where
u.active = true;
This is also more efficient than the query that I had in my question.

You can achieve it by converting the subquery into a with clause and use it in the join

Related

Writing a subquery instead of using a spreadshet

Pretty basic SQL uses, I usually do some basic joins, and then pull data into Sheets to pivot or filter it to get what I want, but know I can do it quicker all in SQL.
For this query, I want to only return data if the c2.id count is greater than 0. I tried writing a subquery in the where clause, but feels like I need to group by task_id for this to be right...can someone help me understand what I should do and why?
select t.inserted_at::date, count (distinct c2.id), t.id, t.conversation_id
from tasks t
left join users u on u.id = t.creator_id
left join "comments" c2 on t.id = c2.task_id
left join conversations c on c.id = t.conversation_id
where u.include_in_metrics = true
and c.type = 'PROJECT_FEED'
group by 1,3,4
order by t.inserted_at::date desc;
Just add this after the group by before order by
having count(distinct c2.id)>0

SELECT DISTINCT + ORDER BY additional expression

I have no experience with PostgreSQL and I am migrating a Rails5+MySQL application to Rails5+PostgreSQL and I am having a problem with a query.
I've already looked at some questions/answers and still haven't been able to solve my problem. My problem seems to be ridiculous, but I needed to ask for help here!
Query:
SELECT DISTINCT users.* FROM users
INNER JOIN areas_users ON areas_users.user_id = users.id
INNER JOIN areas ON areas.deleted_at IS NULL AND areas.id = areas_users.area_id
WHERE users.deleted_at IS NULL AND users.company_id = 2 AND areas.id IN (2, 4, 5)
ORDER BY CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END, users.id, 1 ASC
Running the query in DBeaver, returns the error:
SQL Error [42P10]: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
What do I need to do to be able to use this SELECT DISTINCT with this ORDER BY CASE?
It's like error message says:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
This is an expression:
CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END
You cannot order by it, while doing SELECT DISTINCT users.* FROM ... because that only allows ORDER BY expressions that appear in the SELECT list.
Typically, the best solution for DISTINCT is not to use it in the first place. If you don't duplicate rows, you don't have to de-duplicate them later. See:
How to speed up select distinct?
In your case, use an EXISTS semi-join (expression / subquery) instead of the joins. This avoids the duplication. Assuming distinct rows in table users, DISTINCT is out of job.
SELECT u.*
FROM users u
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND EXISTS (
SELECT FROM areas_users au JOIN areas a ON a.id = au.area_id
WHERE au.user_id = u.id
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
)
ORDER BY CASE u.id WHEN 3 THEN 0
WHEN 5 THEN 1 END, u.id, 1; -- ①
Does what you request, and typically much faster, too.
Using simple ("switched") CASE syntax.
① There is still an ugly bit. Using a positional reference in ORDER BY can be convenient short syntax. But while you have SELECT *, it's a really bad idea. If the order of columns in the underlying table changes, your query is silently changed. Spell out the column in this use case!
(Typically, you don't need SELECT * in the first place, but just a selection of columns.)
IF your ID column is guaranteed to have positive numbers, this would be a bit faster:
...
ORDER BY CASE u.id WHEN 3 THEN -2
WHEN 5 THEN -1
ELSE u.id END, <name_of_first_column>
I MUST use DISTINCT
(Really?) If you insist:
SELECT DISTINCT CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END AS order_column, u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY 1, <name_of_previously_first_column>; -- now, "ORDER BY 1" is ok
You get the additional column order_column in the result. You can wrap it in a subquery with a different SELECT ...
Just a proof of concept. Don't use this.
Or DISTINCT ON?
SELECT DISTINCT ON (CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>)
u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>;
This works without returning an additional column. Still just proof of concept. Don't use it, the EXISTS query is much cheaper.
See:
Select first row in each GROUP BY group?

How to return Null instead of nothing in Postgres?

I am writing a postgres query where I ask to return each order for each user. In case of the user hasn't a order I would like to return Null instead of no return.
My current query:
Select Order.user_id,
User.name,
Order.id,
Order.value,
Order.item
FROM Order
Left Join User
On Order.user_id = User.id
it returns each user order but in case of user with no order, nothing is returned. How I can return Null (or any string) instead of no return?
Thank you so much.
You want a left join, but you want user first:
SELECT u.id as user_id, u.name, o.id, o.value, o.item
FROM User u LEFT JOIN
Order o
ON o.user_id = u.id;
I also introduced table aliases so the query is easier to write and to read. And . . . change o.user_id to u.id in the SELECT so it has a value when there are no orders.

Best approach for limiting rows coming back in SQL when joining for a sum

I need to get back a list of users and the total amount that they have ordered. In reality my query is more complex but I think this sums it up. My issue is, if a user made 5 orders for example, I'll get back their name and the total they've ordered 5 times due to the join (having 5 rows in the order table for that user).
What's the recommended approach for when you need to total the records in one table that has multiple rows without requiring many rows to come back? distinct could work but is this the best? (especially when my select chooses more information than what's below)
SELECT user.name, sum(order.amount) FROM USER user
INNER JOIN USER_ORDERS order
ON (user.user_id = order.user_id)
Are you just looking for GROUP BY?
SELECT u.name, SUM(o.amount)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Note that this has included user_id in the GROUP BY, just in case two users have the same name.
If you want all users, even those without orders, then you want a LEFT JOIN:
SELECT u.name, SUM(o.amount)
FROM USER u LEFT JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id
GROUP BY u.name, u.user_id;
Or a correlated subquery:
SELECT u.name,
(SELECT SUM(o.amount)
FROM USER_ORDERS uo
WHERE u.user_id = uo.user_id
)
FROM USER u;
You could use the analytic version of SUM.
SELECT u.name, SUM(o.amount) OVER(PARTITION BY u.name)
FROM USER u JOIN
USER_ORDERS uo
ON u.user_id = uo.user_id;

CTE Optimization

I have a query involving CTEs. I just want to know if the following query can be optimized in anyway and if it can be, then what's the rationale behind the optimized version of it:
here is the query:
WITH A AS
(
SELECT
user_ID
FROM user
WHERE user_Date IS not NULL
),
B AS
(
SELECT
P.user_ID,
P.Payment_Type,
SUM(P.Payment_Amount) AS Total_Payment
FROM Payment P
JOIN A ON A.user_ID = P.user_ID
)
SELECT
user_ID,
Total_Payment_Amount
FROM B
WHERE Payment_Type = 'CR';
Your query should be using GROUP BY, as it seems you want to take a sum aggregate for each user_ID. From a performance point of view, you are introducing many subqueries, which aren't really necessary. We can write your query using a single join between the Payment and user tables.
SELECT
P.user_ID,
SUM(P.Payment_Amount) AS Total_Payment
FROM Payment P
INNER JOIN user A
ON A.user_ID = P.user_ID
WHERE
A.user_Date IS NOT NULL AND P.Payment_Type = 'CR'
GROUP BY
P.user_ID;
For Oracle, SQL Server, and Postgres, the query optimizers should have no problems finding the optimal query plan for this. You can find out what the database will do with EXPLAIN, usually.
Try the folowing query-:
select p.user_ID,
SUM(Payment_Amount) as Total_Payment_Amount
from Payment P join user u
on P.user_ID=u.user_ID
where Payment_Type = 'CR';
and u.user_Date IS not NULL
group by p.user_ID
SQL Server 2014