COUNT(column) returns COUNT(*) - sql

I using this website to practice SQL. I've got this query:
SELECT DISTINCT maker
FROM Product
GROUP BY maker
HAVING COUNT(type) = 1
AND COUNT(model) > 1
For some reason both count aggregates return the same value--as if they were COUNT(*)--but this isn't what I'm expecting. Please explain why and, if it's not too much trouble, what the correct approach is.

Your having clause is:
HAVING COUNT(type) = 1 AND COUNT(model) > 1
Each component is counting the number of non-NULL rows with a value in that column. So, if type contained 200 NULLs and 100 '1's, the value would be 100. Count(*), in this case, would return the number of rows, or 300.
Perhaps you want to count the number of distinct values in each column. In that case, you can use:
HAVING COUNT(DISTINCT type) = 1 AND COUNT(DISTINCT model) > 1
In practice, though, COUNT(DISTINCT) usually uses more resources than other aggregation functions. The following does the same thing and often performs better:
HAVING min(type) = max(type) and min(model) < max(model)

Count() aggregate function, counts the number of records of the table you are query. (Product Table)
There is no difference that which column you give it as input.
It will return the same output as you said.
And it's completely normal.

Related

How to get 0 if no row found from sql query in sql server

I am getting blank value with this query from sql server
SELECT TOP 1 Amount from PaymentDetails WHERE Id = '5678'
it has no row,that is why its returning blank,So I want if no row then it should return 0
I already tried with COALESCE ,but its not working
how to solve this?
You are selecting an arbitrary amount, so one method is aggregation:
SELECT COALESCE(MAX(Amount), 0)
FROM PaymentDetails
WHERE Id = '5678';
Note that if id is a number, then don't use single quotes for the comparison.
To be honest, I would expect SUM() to be more useful than an arbitrary value:
SELECT COALESCE(SUM(Amount), 0)
FROM PaymentDetails
WHERE Id = '5678';
You can wrap the subquery in an ISNULL:
SELECT ISNULL((SELECT TOP 1 Amount from PaymentDetails WHERE Id = '5678' ORDER BY ????),0) AS Amount;
Don't forget to add a column (or columns) to your ORDER BY as otherwise you will get inconsistent results when more than one row has the same value for Id. If Id is unique, however, then remove both the TOP and ORDER BY as they aren't needed.
You should never, however, use TOP without an ORDER BY unless you are "happy" with inconsistent results.

Using the total of a column of the queried table in a case when (Hive)

Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.

Comparing 2 values in the Same column

I have a table like following :
Orderserialno SKU Units
1234-6789 2x3 5
1234-6789 4x5 7
1334-8905 4x5 2
1334-8905 6x10 2
I need to get the count of distinct orderserialno where Units are not equal within a orderserialno. There could be more combinations of Sku's in an order than what I have mentioned but the eventual goal is to get those orders where units corresponding to various SKUs (in that order) are not equal.
In the above case I should get answer as 1 as orderserialno 1234-6789 has different units.
Thanks
This is a relatively simple GROUP BY query:
SELECT Orderserialno, Units
FROM MyTable
GROUP BY Orderserialno, Units
HAVING COUNT(1) > 1
This would give you all pairs (Orderserialno, Units). To project out the Units, nest this query inside a DISTINCT, like this:
SELECT DICTINCT(Orderserialno) FROM (
SELECT Orderserialno, Units
FROM MyTable
GROUP BY Orderserialno, Units
HAVING COUNT(1) > 1
)
If you need only the total count of Orderserialnos with multiple units, replace DICTINCT(Orderserialno) with COUNT(DICTINCT Orderserialno).
To get the list of such order numbers, use an aggregation query:
select OrderSerialNo
from t
group by OrderSerialNo
having min(Units) <> max(Units)
This uses a trick to see if the units value changes. You can use count(distinct), but that usually incurs a performance overhead. Instead, just compare the minimum and maximum values. If they are different, then the value is not constant.
To get the count, use this as a subquery:
select count(*)
from (select OrderSerialNo
from t
group by OrderSerialNo
having min(Units) <> max(Units)
) t

Coalesce not evaluating second argument?

I am trying to run the following query:
SELECT COALESCE(count(percent_cov), 0)
FROM sample_cov
WHERE target = 542
GROUP BY percent_cov
HAVING percent_cov < 10
Basically, I want to show the number of times this statistic was < 10, and return 0 rather than null if the count was 0. If the count is >0 I get the number I want as the result, however if the count is 0 I still get a null returned. (Same thing if I set the second argument to coalesce as a positive number). What am I doing wrong?
I rewrote your query the way I think you want it:
SELECT count(*) AS ct
FROM sample_cov
WHERE target = 542
AND percent_cov < 10;
count() returns 0 When no matching rows (or non-null values in the column) are found. No need for coalesce(). I quote the manual on this:
It should be noted that except for count, these functions return a
null value when no rows are selected.
Bold emphasis mine. If you want to return a different value when count() comes back with 0, use a CASE statement.
Also, it's no use to write count(percent_cov) while you have WHERE percent_cov < 10. Only non-null values qualify, count(*) yields the same result slightly faster and simpler in this case.
You don't need a GROUP BY clause as you don't group by anything, you are aggregating over the whole table.
You could GROUP BY target, but this would be a different query:
SELECT target, count(*)
FROM sample_cov
WHERE percent_cov < 10
GROUP BY target;
You would need to spell out the expression in the HAVING clause again. Output column names are visible in ORDER BY and GROUP BY clauses, not in WHERE or HAVING.

Calculate percents inline in SQL query

SELECT User, COUNT(*) as count FROM Tests WHERE Release = '1.0' GROUP by User;
Above query will return distinct numbers, however, I would like to convert count to percents in relation to total number of records. Total number of records considering WHERE clause.
SELECT R1.user, COUNT(*)/R2.COUNT_ALL AS Expr1
FROM Releases R1,
(SELECT COUNT(*) As COUNT_ALL FROM Releases WHERE Release = '1.0') R2
WHERE R1.Release = '1.0'
GROUP BY R1.user, R2.COUNT_ALL
Here's another approach that uses a single SELECT:
SELECT
Tests.User,
Count(IIf([Tests].[Release]='1.0', 1, Null)) / Count(*) AS Percentage
FROM
Tests
GROUP BY
Tests.User
Unlike the approaches suggested earlier, this one will, for better or worse, return records for users having no records in Tests where Release is "1.0". If you don't want these records, you could add a HAVING clause to eliminate them.
Hope this helps.
Like Brian's answer but in standard SQL instead of MS Access only. (One can also use COUNT instead of SUM with NULL where I have 0.)
SELECT
Tests.User,
Sum(CASE WHEN Release='1.0' THEN 1 ELSE 0 END) / Count(*) AS Percentage
FROM
Tests
GROUP BY
Tests.User