TSQL Count Case When In - sql

I am having trouble with some logic here. Trying to get a count of rows where S.ID's not in my subquery.
COUNT(CASE WHEN S.ID IN
(SELECT DISTINCT S.ID FROM...)
THEN 1 ELSE 0 END)
I am recieving the error:
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
How to fix this or an alternative?

Maybe something like this?
SELECT COUNT(*) FROM .... WHERE ID NOT IN (SELECT DISTINCT ID FROM ...)

Using EXISTS :
SELECT COUNT(t.*) FROM table1 t WHERE NOT EXISTS (SELECT * FROM table2 WHERE ID = t.ID)

Use following construct with CTE.
with cte as
(
select
case
when S.ID in (SELECT DISTINCT S.ID FROM LookupTable) then 1
else 0
end
as SID
from MyTable)
select count(SID) as SIDCOUNT from cte;

Related

SQL Error 10249 Hive with subqueries in SELECT clause

I am trying to write query like
select (select count(1) from tableA), (select count(1) from tableB)
in Hive and there came an error
10249 SubQuery expressions are only allowed as where and having clause.
I think the grammar is right, is there any suggestion about this? Thank you!
Some support of queries in the select was added in Hive >= 2.3, see this Jira HIVE-16091 and docs: Subqueries in SELECT
If your Hive version does not support such subqueries in the select (and docs says such subqueries are not supported yet), move subqueries to the FROM and use CROSS JOIN or use UNION ALL + aggregation.
Using CROSS JOIN:
select a.cnt as count_a, b.cnt as count_b
from
(select count(1) as cnt from tableA) a
cross join
(select count(1) as cnt from tableB) b
Using UNION ALL + aggregation
select count(case when src='A' then 1 else NULL end) count_a,
count(case when src='B' then 1 else NULL end) count_b
from
(
select 'A' as src from tableA
union all
select 'B' as src from tableB
)s

Does Presto support NOT IN constructs?

I have a query of the form:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry')
In my_table there are multiple rows for each person, and I want to exclude those people who have ever had status "hungry". This is a construct I regard as standard and have used in other SQL dialects, but this brings me back an empty result set in Athena.
On the other hand, the plain old IN construction works as expected.
Can anyone explain how I can write this query in Presto? I found another article on SO that seems to imply it works correctly, so I am a bit nonplussed.
Do not use NOT IN. If any returned values are NULL then it returns no rows. Note: This is how SQL works, not a peculiarity of any particular database.
Instead, use NOT EXISTS:
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT
FROM my_table t2
WHERE t2.status = 'hungry' AND
t2.person_id = t.person_id
);
Actually, I might suggest aggregation for this instead -- you are already doing aggregation essentially with the SELECT DISTINCT:
select person_id
from my_table t
group by person_id
having sum(case when status = 'hungry' then 1 else 0 end) = 0;
Using conditional aggregation:
SELECT person_id
FROM my_table m
GROUP BY person_id
HAVING COUNT(CASE WHEN status='hungry' THEN 1 END)=0
I would do aggregation :
SELECT person_id
FROM my_table
GROUP BY person_id
HAVING SUM(CASE WHEN status = 'hungry' THEN 1 ELSE 0 END) = 0;
If you want full row then use NOT EXISTS , NOT IN would return no row if sub-query have null :
SELECT DISTINCT t.person_id
FROM my_table t
WHERE NOT EXISTS (SELECT 1
FROM my_table t1
WHERE t1.status = 'hungry' AND
t1.person_id = t.person_id
);
I feel compelled to point out that you can solve this by just excluding the NULLs explicitly from the subquery, and sticking with the NOT IN construct:
SELECT DISTINCT person_id
FROM my_table
WHERE person_id NOT IN (SELECT person_id FROM my_table WHERE status = 'hungry' AND person_id IS NOT NULL)

Select duplicated data from table

Query
select * from table1
where having count(reference)>1
I want to select * the data which have duplicate data,any idea why my query is not working?
Below are my expect result..
You can make use of window function count to find number of rows per id and reference and then filter to get those which have count more than 1.
;with cte as (
select t.*, count(*) over (partition by id, reference) cnt
from table1 t
)
select * from cte where cnt > 1;
Demo
In the above solution, I have made an assumption that name and id has one to one correspondence (which is true as per your given data). If that's not the case, add name too in the partition by clause:
;with cte as (
select t.*, count(*) over (partition by name, id, reference) cnt
from table1 t
)
select * from cte where cnt > 1;
I might actually approach this by using a subquery with GROUP BY:
SELECT t1.*
FROM table1 t1
INNER JOIN
(
SELECT Name, ID, reference
FROM table1
GROUP BY Name, ID, reference
HAVING COUNT(*) > 1
) t2
ON t1.Name = t2.Name AND
t1.ID = t2.ID AND
t1.reference = t2.reference
Demo here:
Rextester
Try this ), first i get count by partition, after that i get row with count > 1
select No, Name, ID, Reference
from (select count(*) over (partition by name, ID, reference) cnt, table1.* from table1)
where cnt>1
The easy way (although maybe not the best for performance) would be:
select * from table1 where reference in (
select reference from table1 group by reference having count(*)>1
)
In a subselect you have the duplicated data, and in the outter select you have all the data for these references.

How to get Original Rows filtered by a HAVING Condition?

What is the method in T-SQL to select the orginal values limited by a HAVING attribute. For example, if I have
A|B
10|1
11|2
10|3
How would I get all the values of B (Not An Average or some other summary stat), Grouped by A, having a Count (Occurrences of A) greater than or equal two 2?
Actually, you have several options to choose from
1. You could make a subquery out of your original having statement and join it back to your table
SELECT *
FROM YourTable yt
INNER JOIN (
SELECT A
FROM YourTable
GROUP BY
A
HAVING COUNT(*) >= 2
) cnt ON cnt.A = yt.A
2. another equivalent solution would be to use a WITH clause
;WITH cnt AS (
SELECT A
FROM YourTable
GROUP BY
A
HAVING COUNT(*) >= 2
)
SELECT *
FROM YourTable yt
INNER JOIN cnt ON cnt.A = yt.A
3. or you could use an IN statement
SELECT *
FROM YourTable yt
WHERE A IN (SELECT A FROM YourTable GROUP BY A HAVING COUNT(*) >= 2)
A self join will work:
select B
from table
join(
select A
from table
group by 1
having count(1)>1
)s
using(A);
You can use window function (no joins, only one table scan):
select * from (
select *, cnt=count(*) over(partiton by A) from table
) as a
where cnt >= 2

Select and sums from another table. Whats wrong with this SQL?

Whats wrong with this SQL?
SELECT Id, (select SUM(VALUE) from SomeTable) AS SumValue, GETDATE()
FROM MyTable
WHERE SumValue > 0
You cannot use aliased columns in the SELECT clause in the same query, except in ORDER BY.
It needs to be subqueried
SELECT Id, SumValue, GETDATE()
FROM (
SELECT Id, (select SUM(VALUE) from TABLE) AS SumValue
FROM MyTable
) X
WHERE SumValue > 0
That is the general case. For your specific query, it doesn't make sense because the subquery is not correlated to the outer query, so either NO rows show, or ALL rows show (with the same SumValue). I will simply assume you have simplified the query a lot since a table name of "table" doesn't really work.
I would probably rewrite like this:
SELECT a.Id, b.SumValue, GETDATE() as [now]
FROM MyTable a
Join
(
select id, SUM(VALUE) as [SumValue]
from [TABLE]
Group by id
)b on a.Id = b.Id
WHERE b.SumValue > 0
This is assuming that the value you are totalling relates to the ID in your table?
right way is
SELECT Id, (select SUM(VALUE) from TABLE) AS SumValue, GETDATE()
FROM MyTable
WHERE (select SUM(VALUE) from TABLE) > 0