I have a query of the form:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry')
In my_table there are multiple rows for each person, and I want to exclude those people who have ever had status "hungry". This is a construct I regard as standard and have used in other SQL dialects, but this brings me back an empty result set in Athena.
On the other hand, the plain old IN construction works as expected.
Can anyone explain how I can write this query in Presto? I found another article on SO that seems to imply it works correctly, so I am a bit nonplussed.
Do not use NOT IN. If any returned values are NULL then it returns no rows. Note: This is how SQL works, not a peculiarity of any particular database.
Instead, use NOT EXISTS:
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT
FROM my_table t2
WHERE t2.status = 'hungry' AND
t2.person_id = t.person_id
);
Actually, I might suggest aggregation for this instead -- you are already doing aggregation essentially with the SELECT DISTINCT:
select person_id
from my_table t
group by person_id
having sum(case when status = 'hungry' then 1 else 0 end) = 0;
Using conditional aggregation:
SELECT person_id
FROM my_table m
GROUP BY person_id
HAVING COUNT(CASE WHEN status='hungry' THEN 1 END)=0
I would do aggregation :
SELECT person_id
FROM my_table
GROUP BY person_id
HAVING SUM(CASE WHEN status = 'hungry' THEN 1 ELSE 0 END) = 0;
If you want full row then use NOT EXISTS , NOT IN would return no row if sub-query have null :
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT 1
FROM my_table t1
WHERE t1.status = 'hungry' AND
t1.person_id = t.person_id
);
I feel compelled to point out that you can solve this by just excluding the NULLs explicitly from the subquery, and sticking with the NOT IN construct:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry' AND person_id IS NOT NULL)
Related
I have the data in Initial format:
STEP 1: To find out the users having more than 1 record and show those records. This was achieved using the below.
SELECT ID,
USER,
STATUS
FROM TABLE
WHERE USER in
(SELECT USER
FROM TABLE
GROUP BY USER
HAVING COUNT(*) > 1)
*STEP 2: From the above set of records find out records for which all the values are either 1 or 2. SO data should be something like:
Can I get some suggestions to how to achieve that. Note status is NVARCHAR hence aggregate functions can't be used.
The simplest thing is to check that the status is the same in your subquery. Assuming that status only takes on the values 1 and 2:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status)
);
If there are other status values and you particularly care about 1 and 2, you would use:
SELECT t.ID, t.USER, t.STATUS
FROM TABLE
WHERE t.USER IN (SELECT t2.USER
FROM TABLE t2
GROUP BY t2.USER
HAVING COUNT(*) > 1 AND
MIN(t2.status) = MAX(t2.status) AND
MIN(t2.status) IN (1, 2)
);
Please check if this helps
SELECT ID,
[USER],
[STATUS]
FROM TABLE
WHERE [USER] in
(SELECT [USER]
FROM TABLE
GROUP BY [USER]
HAVING COUNT([USER]) > 1 AND ((MIN(STATUS) != MAX(STATUS) AND COUNT(STATUS) > 2) OR (MIN(STATUS) = MAX(STATUS))))
In SQL, where we need to filter the unnecessary data from a table:
case 1: if 2 IDs are same and DOD is not null then Record is needed
case 2: if single id is there and dod is not null then Record is needed
case 3: if 2 ids are same and if DOD is null for any one of them ,then record is not needed
Your help is much appreciated.
Thanks
You can use analytic functions for this:
select t.*
from (
select
t.*,
sum(case when dod is null then 1 else 0 end) over(partition by id) no_nulls
from mytable t
) t
where no_nulls = 0
Note that this also excludes records that have no duplicate id but whose dod is null (you did not describe how to handle those).
You could also use not exists (which can conviniently be turned to a delete statement if needed):
select t.*
from mytable t
where not exists(select 1 from mytable t1 where t1.id = t.id and t1.dod is null)
where no_nulls = 0
I'm working with a dataset - structured like this
I want to exclude all records with ReviewRound being "a" if they have gone through review round "b" - If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
Some records have not gone to round "b". The issues I'm running into are as a result of there being multiple records for each unique ID.
Ideally this could be done in GoogleBigQuery, if not, filtering through GoogleScripts may also be an option!
Any suggestions would be appreciated!
If a set of unique ID's has an associated round "b" review, the round "a" review should not be included.
If I followed you correctly, you could express this as a not condition with a correlated subquery that ensures that, if the current record has ReviewRound = 'a', there is no other record that has the same id and ReviewRound = 'b'.
select t.*
from mytable t
where not (
t.ReviewRound = 'a'
and exists (
select 1
from mytable t1
and t1.id = t.id and t1.ReviewRound = 'b'
)
)
You can do this with window functions as well:
select t.* except (num_bs)
from (select t.*,
countif(reviewround = 'b') over (partition by id) as num_bs
from t
) t
where num_bs = 0 or reviewround = 'b';
By using window functions, you can solve it with this query
SELECT ID, Score
FROM (
SELECT *,
MAX(CASE WHEN ReviewRound = 'b' THEN 1 ELSE 0 END) OVER (partition by ID) as has_b
FROM mytable
) t
WHERE has_b = 0
Re-conceptualizing as keeping only the latest review round, I would try:
select * from mytable join
(select ID, max(ReviewRound) as ReviewRound from mytable group by ID)
on (ID, ReviewRound)
since I am not as good with more complex SQL SELECT Statements I thought of just asking here, since it's hard to find something right on topic.
I got two tables who have exactly the same structure like
TABLE A (id (INT(11)), time (VARCHAR(10));)
TABLE B (id (INT(11)), time (VARCHAR(10));)
Now I want a single SELECT to count the entrys of an specific id in both tables.
SELECT COUNT(*) FROM TABLE A WHERE id = '1';
SELECT COUNT(*) FROM TABLE B WHERE id = '1';
So I thought it would be much better for the database performance if I use one SELECT instead of one.
Thanks for helping out
SELECT COUNT(*) as count, 'tableA' as table_name FROM TABLEA WHERE id = '1'
union all
SELECT COUNT(*), 'tableB' FROM TABLEB WHERE id = '1'
If you want the separate counts in a single row, you can use subqueries
SELECT
(SELECT COUNT(*) FROM TABLE A WHERE id = '1') a_count,
(SELECT COUNT(*) FROM TABLE B WHERE id = '1') b_count;
You could do it like:
select count(*)
from (
select id from t1 where id = 1
union all
select id from t2 where id = 1
) as t
Another alternative is:
select sum(cnt)
from (
select count(*) as cnt from t1 where id = 1
union all
select count(*) as cnt from t2 where id = 1
) as t
I am having trouble with some logic here. Trying to get a count of rows where S.ID's not in my subquery.
COUNT(CASE WHEN S.ID IN
(SELECT DISTINCT S.ID FROM...)
THEN 1 ELSE 0 END)
I am recieving the error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
How to fix this or an alternative?
Maybe something like this?
SELECT COUNT(*) FROM .... WHERE ID NOT IN (SELECT DISTINCT ID FROM ...)
Using EXISTS :
SELECT COUNT(t.*) FROM table1 t WHERE NOT EXISTS (SELECT * FROM table2 WHERE ID = t.ID)
Use following construct with CTE.
with cte as
(
select
case
when S.ID in (SELECT DISTINCT S.ID FROM LookupTable) then 1
else 0
end
as SID
from MyTable)
select count(SID) as SIDCOUNT from cte;