SELECT DISTINCT + ORDER BY additional expression - sql

I have no experience with PostgreSQL and I am migrating a Rails5+MySQL application to Rails5+PostgreSQL and I am having a problem with a query.
I've already looked at some questions/answers and still haven't been able to solve my problem. My problem seems to be ridiculous, but I needed to ask for help here!
Query:
SELECT DISTINCT users.* FROM users
INNER JOIN areas_users ON areas_users.user_id = users.id
INNER JOIN areas ON areas.deleted_at IS NULL AND areas.id = areas_users.area_id
WHERE users.deleted_at IS NULL AND users.company_id = 2 AND areas.id IN (2, 4, 5)
ORDER BY CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END, users.id, 1 ASC
Running the query in DBeaver, returns the error:
SQL Error [42P10]: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
What do I need to do to be able to use this SELECT DISTINCT with this ORDER BY CASE?

It's like error message says:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
This is an expression:
CASE WHEN users.id=3 THEN 0 WHEN users.id=5 THEN 1 END
You cannot order by it, while doing SELECT DISTINCT users.* FROM ... because that only allows ORDER BY expressions that appear in the SELECT list.
Typically, the best solution for DISTINCT is not to use it in the first place. If you don't duplicate rows, you don't have to de-duplicate them later. See:
How to speed up select distinct?
In your case, use an EXISTS semi-join (expression / subquery) instead of the joins. This avoids the duplication. Assuming distinct rows in table users, DISTINCT is out of job.
SELECT u.*
FROM users u
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND EXISTS (
SELECT FROM areas_users au JOIN areas a ON a.id = au.area_id
WHERE au.user_id = u.id
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
)
ORDER BY CASE u.id WHEN 3 THEN 0
WHEN 5 THEN 1 END, u.id, 1; -- ①
Does what you request, and typically much faster, too.
Using simple ("switched") CASE syntax.
① There is still an ugly bit. Using a positional reference in ORDER BY can be convenient short syntax. But while you have SELECT *, it's a really bad idea. If the order of columns in the underlying table changes, your query is silently changed. Spell out the column in this use case!
(Typically, you don't need SELECT * in the first place, but just a selection of columns.)
IF your ID column is guaranteed to have positive numbers, this would be a bit faster:
...
ORDER BY CASE u.id WHEN 3 THEN -2
WHEN 5 THEN -1
ELSE u.id END, <name_of_first_column>
I MUST use DISTINCT
(Really?) If you insist:
SELECT DISTINCT CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END AS order_column, u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY 1, <name_of_previously_first_column>; -- now, "ORDER BY 1" is ok
You get the additional column order_column in the result. You can wrap it in a subquery with a different SELECT ...
Just a proof of concept. Don't use this.
Or DISTINCT ON?
SELECT DISTINCT ON (CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>)
u.*
FROM users u
JOIN areas_users au ON au.user_id = u.id
JOIN areas a ON a.id = au.area_id
WHERE u.deleted_at IS NULL
AND u.company_id = 2
AND a.id IN (2, 4, 5)
AND a.deleted_at IS NULL
ORDER BY CASE u.id WHEN 3 THEN -2 WHEN 5 THEN -1 ELSE u.id END, <name_of_first_column>;
This works without returning an additional column. Still just proof of concept. Don't use it, the EXISTS query is much cheaper.
See:
Select first row in each GROUP BY group?

Related

Only way to write this SQL JOIN question?

I wrote this sql query and it seems to work great but im not sure if it is the correct way to write it or if there is another better way to write it:
SELECT
art.artid, users.userid
FROM
art LEFT JOIN users
ON
art.userid = users.userid
WHERE
(SELECT COUNT(1) FROM art WHERE art.userid = users.userid) > 5 AND
users.active = '1' AND
art.active = '1' AND
art.status = '0' AND
art.pricesek > 0 GROUP BY users.userid ORDER BY RAND()
It gets the users from users table that are active and has 5 or more artworks in the art table. It also checks to see that artwork is active, status of artwork is set to 0 "for sale" and price is more then 0. Then it groups results by userid in a random order.
Is this the correct way to write this or is there another way.
"All input is hardcoded so no userinput will be sent into database, so not worried about injections (should i be worried even if its hardcoded?)."
I made a small change in your code. Instead of using (SELECT COUNT (1) FROM art WHERE art.userid = users.userid)> 5 I put it in Having clause.
SELECT art.artid, users.userid
FROM art LEFT JOIN users ON art.userid = users.userid
WHERE users.active = '1' AND art.active = '1' AND
art.status = '0' AND art.pricesek > 0
GROUP BY users.userid, art.artid
HAVING COUNT(users.userid) > 5
ORDER BY RAND()
Your query has problems at many levels. The most obvious is that the GROUP BY clause is inconsistent with the SELECT. That should be generating an error.
It gets the users from users table that are active and has 5 or more artworks in the art table.
I would instead suggest aggregating the art table before joining:
SELECT u.userid
FROM users u JOIN
(SELECT a.userid, COUNT(*) as cnt
FROM art a
WHERE a.active = 1 AND
a.status = 0 AND
a.pricesek > 0
GROUP BY a.userid
) a
ON a.userid = u.userid
WHERE a.cnt > 5 AND u.active = 1
ORDER BY RAND();
Notes:
LEFT JOIN is not appropriate. In order to count the number of artworks, the JOIN must find at least 1 (really 6) matching rows.
It makes no sense to return a.artid. If you need an example, you could use min(a.artid) in the subquery. If you want all of them, then you would need to specify how to return them, but a JSON, array, or string aggregation function would be used n the subquery.
The values "1" and "0" look like numbers, so I removed the single quotes, so I assume the columns are numbers. Compare numbers to numbers and strings to strings. Try to avoid mixing the two.

Include 0 in count(*) SQL query

I have two entities, User and MaBase. MaBase contains user_id and status. I want to get the count of status by user, I also want to show a 0 for any status values where the user doesn't have a record.
I created the below query using count, but it only returns non-null values. How I can solve this:
SELECT status, COUNT(*)
FROM ma_base
WHERE ma_base.user_id = 5
GROUP BY status
I have 5 types of status values. If a user only has ma_base records for 4 of them, I still want to see a 0 value for the 5th status.
It's not every day I get to write a CROSS JOIN:
SELECT u.ID, s.status,
coalesce((SELECT COUNT(*) FROM ma_base m WHERE m.User_Id = u.ID and m.status = s.Status),0) As Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
WHERE u.ID = 5
OR:
SELECT u.ID, s.status, COALESCE(COUNT(m.status), 0) AS Status_Count
FROM User u
CROSS JOIN (SELECT DISTINCT status FROM MA_Base) s
LEFT JOIN MA_Base m ON m.User_Id = u.ID AND m.status = s.status
WHERE u.ID = 5
GROUP BY u.ID, s.status
In a nutshell, we first need to create a projection for the user with every possible status value, to anchor the result records for your "missing" statuses. Then we can JOIN or do a correlated subquery to get your desired results.
For the JOIN option, note the expression in the COUNT() function. It's important; COUNT(*) won't do what you want. For both options, note the use of COALESCE() to put the expected result in for NULL.
If you have a separate table defining your status values, use that instead of deriving them from ma_base.

SQL: How to Order By And Limit Via a Join

As an example, let's say I have the following query:
select
u.id,
u.name,
(select s.status from user_status s where s.user_id = u.id order by s.created_at desc limit 1) as status
from
user u
where
u.active = true;
The above query works great. It returns the most recent user status for the selected user. However, I want to know how to get the same result using a join on the user_status table, instead of using a sub-query. Is something like this possible?
I'm using PostgreSQL.
Thank you for any help you can give!
select u.id , b.status
from user u join user_status b on u.id= b.user_id
where u.active = true;
order by b.s.created_at desc limit 1
i think this work.
but in your code there is "b" which i did not know what it is.
JOIN syntax does not directly offer order or limit as options; so strictly speaking you cannot achieve what you want directly as a join. I believe the easiest way to resolve your question is to use a joined subquery, like this:
select
u.id
, u.name
, s.status
from user u
left join (
select
user_id
, status
, row_number() over(partition by user_id
order by created_at desc) as rn
from user_status s
) s on u.id = s.userid and s.rn = 1
where u.active = true;
Here the analytic function row_number() combined with the over() clause enables the subsequent join condition and s.rn=1 to take advantage of both ordering and limiting the joined rows via the calculation of the rn value.
nb a correlated subquery within the select clause (as used in the question's query) acts like a left join because it can return NULL. If that effect isn't needed or desired you can change to an inner join.
It is possible to move that subquery into a CTE, but unless there are compelling reasons to do so I prefer using the more traditional form seen above.
An alternative approach (for Postgres 9.3 or later) is to use a lateral join which is quite similar to the original subquery, but as it becomes part of the from clause is likely to be more efficient that using that subquery in the select clause.
select
u.id
, u.name
, s.status
from user u
left join lateral (
select user_status.status
from user_status
where user_status.user_id = u.id
order by user_status.created_at desc
limit 1
) s ON true
where u.active = true;
I ended up doing the following, which is working great for me:
select
u.id,
u.name,
us.status
from
user u
left join (
select
distinct on (user_id)
*
from
user_status
order by
user_id,
created_at desc
) as us on u.id = vs.user_id
where
u.active = true;
This is also more efficient than the query that I had in my question.
You can achieve it by converting the subquery into a with clause and use it in the join

SQL single-row subquery returns more than one row?

I'm trying to get ID and USER name from one query but at the same time I'm looking in my WHERE clause if ID exist in other table. I got error:
ORA-01427: single-row subquery returns more than one row
Here is how my query look:
SELECT s.ID, s.LASTFIRST
From USERS s
Left Outer Join CALENDAR c
On s.ID = c.USERID
Where c.SUPERVISOR = '103'
And TO_CHAR(c.DATEENROLLED,'fmmm/fmdd/yyyy') >= '4/22/2016'
And TO_CHAR(c.DATELEFT,'fmmm/fmdd/yyyy') <= '4/22/2016'
And s.ID != (SELECT USER_ID
From RESERVATIONS
Where EVENT_ID = '56')
My query inside of where clause returns two ID's: 158 and 159 so these two should not be returned in my query where I'm looking for s.ID and s.LASTFIRST. What could cause this error?
Use not in instead of !=
!= or = are for single IDs and values, not in and in are for multiple
And s.ID not in (SELECT USER_ID
From RESERVATIONS
Where EVENT_ID = '56')
Edit: not in vs not exists
Not exists is a perfectly viable option as well. In fact, it is better to not exists than not in if there are the possibility of null values in the subquery result set - In Oracle, the existence of a null will cause not in to return no results. As a general rule, I use not in for ID, not null columns, and not exists for everything else. It may be better practice to always use not exists... personal preference I suppose.
Not exists would be written like so:
SELECT s.ID, s.LASTFIRST
From USERS s
Left Outer Join CALENDAR c
On s.ID = c.USERID
Where c.SUPERVISOR = '103'
And TO_CHAR(c.DATEENROLLED,'fmmm/fmdd/yyyy') >= '4/22/2016'
And TO_CHAR(c.DATELEFT,'fmmm/fmdd/yyyy') <= '4/22/2016'
And not exists (SELECT USER_ID
From RESERVATIONS r
Where r.USER_ID = S.ID
And EVENT_ID = '56')
Performance
In Oracle there is no performance difference between using not in, not exists or a left join.
Source : https://explainextended.com/2009/09/17/not-in-vs-not-exists-vs-left-join-is-null-oracle/
Oracle's optimizer is able to see that NOT EXISTS, NOT IN and LEFT JOIN / IS NULL are semantically equivalent as long as the list values are declared as NOT NULL.
It uses same execution plan for all three methods, and they yield same results in same time.
This is a formatted comment that is not related to your question.
This is slow:
And TO_CHAR(c.DATEENROLLED,'fmmm/fmdd/yyyy') >= '4/22/2016'
because you are filtering on a function result.
This is logically equivalent and much faster:
And c.DATEENROLLED >= to_date('4/22/2016','fmmm/fmdd/yyyy')
Edit starts here
Aaron D's answer says to use not in. Here are two faster ways to do the same thing:
left join reservations r on s.id = user_id
and r.event_id = '56'
etc
where r.user_id is null
or
where s.id in
(
select user_id
from reservations
minus
select user_id
from reservations
where event_id = 56
)

PostgreSQL query returning multiple rows instead of one

I have two tables: user and projects, with a one-to-many relationship between two.
projects table has field status with project statuses of the user.
status can be one of:
launched, confirm, staffed, overdue, complete, failed, ended
I want to categorize users in two categories:
users having projects in launched phase
users having projects other than launched status.
I am using the following query:
SELECT DISTINCT(u.*), CASE
WHEN p.status = 'LAUNCHED' THEN 1
ELSE 2
END as user_category
FROM users u
LEFT JOIN projects p ON p.user_id = u.id
WHERE (LOWER(u.username) like '%%%'
OR LOWER(u.personal_intro) like '%%%'
OR LOWER(u.location) like '%%%'
OR u.account_status != 'DELETED'
AND system_role=10 AND u.account_status ='ACTIVE')
ORDER BY set_order, u.page_hits DESC
LIMIT 10
OFFSET 0
I am facing duplicate records for following scenario:
If user has projects with status launched as well as overdue, complete or failed, then that user is recorded two times as both the conditions in CASE are satisfying for that user.
Please suggest a query where a user that has any project in launched status gets his user_category set to 1. The same user should not be repeated for user_category 2.
The query is probably not doing what you think it does for a number of reasons
There is DISTINCT and there is DISTINCTON(col1, col2).
DISTINCT (u.*) is no different from DISTINCT u.*. The parentheses are just noise.
AND binds before OR according to operator precedence. I suspect you want to use parentheses around the conditions OR'ed together? Or do you need it the way it is? But you don't need parentheses around the whole WHERE clause in any case.
Your expression LOWER(u.username) LIKE '%%%' doesn't make any sense. Every non-null string qualifies. Can be replaced with u.username IS NOT NULL. I suspect you want something different?
Postgres is case sensitive in string handling. You write of status being 'launched' etc. but use 'LAUNCHED' in your query. Which is it?
A couple of table qualifications are missing from the question making it ambiguous for the reader. I filled in as I saw fit.
Everything put together, it might work like this:
SELECT DISTINCT ON (u.set_order, u.page_hits, u.id)
u.*
, CASE WHEN p.status = 'LAUNCHED' THEN 1 ELSE 2 END AS user_category
FROM users u
LEFT JOIN projects p ON p.user_id = u.id
WHERE LOWER(u.username) LIKE '%%%' -- ???
OR LOWER(u.personal_intro) LIKE '%%%'
OR LOWER(u.location) LIKE '%%%'
OR u.account_status != 'DELETED' -- with original logic
AND u.system_role = 10
AND u.account_status = 'ACTIVE'
ORDER BY u.set_order, u.page_hits DESC, u.id, user_category
LIMIT 10
Detailed explanation in this related question:
Select first row in each GROUP BY group?
Two EXISTS semi-joins instead of the DISTINCT ON and CASE might be faster:
SELECT u.*
, CASE WHEN EXISTS (
SELECT FROM projects p
WHERE p.user_id = u.id AND p.status = 'LAUNCHED')
THEN 1 ELSE 2 END AS user_category
FROM users u
WHERE
( LOWER(u.username) LIKE '%%%' -- ???
OR LOWER(u.personal_intro) LIKE '%%%'
OR LOWER(u.location) LIKE '%%%'
OR u.account_status != 'DELETED' -- with alternative logic?
)
AND u.system_role = 10 -- assuming it comes from users ???
AND u.account_status = 'ACTIVE'
AND EXISTS (SELECT 1 FROM projects p WHERE p.user_id = u.id)
ORDER BY u.set_order, u.page_hits DESC
LIMIT 10;
You can use MIN() on your CASE result, and it seems dropping the DISTINCT would be a wise choice:
SELECT u.*, MIN(CASE
WHEN p.status = 'LAUNCHED' THEN 1
ELSE 2
END) as user_category
...
GROUP BY <list all columns in the users table>
...
Since "launched" gives a 1, using MIN() will not only force a single result but will also give preference to "launched" over the other states.