What does this part of my SQL query mean? - sql

sum( (record_id is NULL AND joined.table_id is NULL)::int )
I know the sum returns the sum of the column entries, but what will this expression (... and...) return, can it be compared with this expression (.. + ..), and what does this ()::int?? convert result to int?
i dont know will return this expression, on my sampling will returned number of integer

It is a more complicated way to write
count(*) FILTER (WHERE record_id IS NULL
AND joined.table_id IS NULL)

(record_id is NULL AND joined.table_id is NULL)::int will return 1 iff both record_id and joined.table_id are null.
Therefore, sum( (record_id is NULL AND joined.table_id is NULL)::int ) will return the number of rows in which both record_id and joined.table_id are null.

Related

AWS Athena/Presto SQL: Having trouble getting null values

I am doing a query in aws Athena where I want to get some total values, however I am having issues getting a column where the values are null, this column sometimes contains the value of [] that is consider also as null
My query
SELECT COUNT() AS total_rows,
COUNT(DISTINCT sfattachmentid) AS total_attachments,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions") AS total_opps,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions" WHERE (oldcategory IS NOT NULL OR oldcategory != '[]')) AS opp_w_changed,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions" WHERE (oldcategory IS NULL OR oldcategory = '[]')) AS opp_without_changed,
SUM(CASE WHEN oldcategory != '' THEN 1 ELSE 0 END) AS oldCategory_changed,
SUM(CASE WHEN oldcategory IS NULL THEN 1 ELSE 0 END) AS oldCategory_blank
FROM "athena_decisionengine"."transactions"
Is giving the following results
However, the value of opp_without_changed seems wrong, becuase if I have total_opps of 1282 and opp_w_changed as 1110 I should expect opp_without_changed to be 172, but is showing me 1282 that seems to be the total of unique salesforce_opportunity_id, so it is like if the filter:
(oldcategory IS NULL OR oldcategory = '[]'))
Was not working
There are two problems in your query
wrong boolean expressions
wrong assumption that
coumnt(distinct) = count(distinct NULL) + count(distinct NOT NULL)
This boolean expression oldcategory IS NOT NULL OR oldcategory != '[]' allows any value except NULL, it allows '[]' as well because '[]' is not NULL. If you want to filter out NULLs and '[]' then correct expression should be oldcategory != '[]' it does not allow NULLs as well because NULL can not be equal or not equal to something. Also it can be empty strings, not NULLs, with empty strings filtered also it will be
oldcategory not in ('[]','') --does not allow NULL, '[]', ''
Second expression including empty rows will be:
oldcategory IS NULL OR oldcategory in ('[]','') --allows NULL, '[]', '' only
Also you are counting DISTINCT salesforce_opportunity_id, not just rows satisfying the WHERE condition, the same salesforce_opportunity_id can possibly have records with NULL, empty, '[]' and other values, so these datasets can intersect and you should NOT expect that
count (distinct salesforce_opportunity_id ) = count(distinct salesforce_opportunity_id where oldcategory is NULL) + count (distinct salesforce_opportunity_id where oldcategory is NOT NULL)
DISTINCT counts are not additive. If you want check that TOTAL = NULLs + NOT NULLs, count everything without DISTINCT and it should match.
For most use case,give a default value to those NULL value should be ok.
coalesce(oldcategory,'[]') not in (a,b,c,d)

I need Total return NULL only If all values are NULL

A query like these
SELECT A,B,C,D, (A+B+C+D) as TOTAL
FROM TABLES
If A,B,C and D is NULL. i need to return NULL.
But if any one of the them is not NULL. Other will change from NULL to zero.
And total(a+b+c+d).
Now try this way
SELECT A,B,.. CASE WHEN (A IS NULL) AND (B IS NULL) AND ... THEN NULL
ELSE ISNULL(A,0) + ISNULL(B,NULL) +... END
But it is so long and I have a lot of total in this whole query.
What the best way I can use for this problem?
The semantics you want are the same as those provided by SUM.
SELECT A,B,C,D,
(SELECT SUM(val)
FROM (VALUES(A),
(B),
(C),
(D)) T (val)) AS Total
FROM YourTable
I would use COALESCE function.
Evaluates the arguments in order and returns the current value of the first expression that initially does not evaluate to NULL.
SELECT
CASE WHEN COALESCE(A,B,C,D) IS NOT NULL THEN
COALESCE(A,0 ) + COALESCE(B,0 )+ COALESCE(C,0 ) + COALESCE(D,0 )
END
FROM TABLES
COALESCE() is a function that you can use:
SELECT A,B,..
CASE WHEN COALESCE(A,B,C,D) IS NULL THEN NULL ELSE ISNULL(A,0) + ISNULL(B,0) +... END

Why Sum in database query giving NULL

Suppose I have a table named "Expense" and it has some month-wise expense which may contain some "NULL", now I want to check the yearly expenses in total with this below query:-
Select Sum(January) + Sum (February) ..... (I skipped the rest 10 months)
from Expense
This gives result as "NULL"
How can I avoid this situation? I think there are more convenient way to check the yearly sum
All arithmetic or logical operations involving NULL yield NULL. For example:
SELECT 1 + NULL -- NULL
You must convert NULL to zeros before you can + them:
SELECT
COALESCE(SUM(January), 0) +
COALESCE(SUM(February) , 0) +
...
It is also possible to add the columns first and then calculate the sum:
SELECT SUM(
COALESCE(January, 0) +
COALESCE(February, 0) +
)
Be advised that (i) SUM skips NULL values (ii) returns NULL instead of 0 if all values are NULL:
SELECT SUM(a) FROM (VALUES
(1),
(2),
(NULL)
) AS v(a) -- returns 3 instead of NULL
It will return NULL if all values encountered were NULL:
SELECT SUM(a) FROM (VALUES
(CAST(NULL AS INT)),
(NULL),
(NULL)
) AS v(a) -- returns NULL instead of 0
use coalesce function to convert null to 0 then use sum
Select Sum(coalesce(January,0)) + Sum (coalesce(February,0)) ..... (I skipped the rest 10 months)
from Expense
Just use coalesce [ with 0 as the second argument ] to replace nulls for all month columns, otherwise you can not get true results from aggregation of numeric values :
select sum(coalesce(January,0)+coalesce(February,0) ... )
from Expense
That because you have NULL values, you can use Case, Coalesce or IIF:
Select SUM(IIF(Col IS NULL, 0, Col))
Select SUM(CASE WHEN Col IS NULL THEN 0 ELSE Col END)
Select COALESCE(Sum(Col), 0)
Any arithmetic function will return null if there is at least one null value in the given column. That's why you should use functions like coalesce or isNull (MSSQL), NVL (Oracle).
You can use ISNULL(SUM(January),0).
Because null + value is always null and in your sample some months sums are null, you can avoid this by adding ISNULL
Select isnull(Sum(January),0) +
isnull(Sum(February),0)
--..... (I skipped the rest 10 months)
from Expense
Alternatively you can use below way:
Select Sum(
isnull(January,0) +
isnull(February,0)
)
--..... (I skipped the rest 10 months)
from Expense

Coalesce function not selecting data value from series when it exists

My code is as follows:
Insert Into dbo.database (Period, Amount)
Select coalesce (date_1, date_2, date_3), Amount FROM Source.dbo.[10]
I'm 100% a value exists in one of the 3 variables: date_1, date_2, date_3, all as strings (var char 100), yet I am still getting blanks when I call Period.
Any help?
Coalesce is designed to return the first NOT NULL field from the list or NULL if none of the fields are NOT NULL, follow the link for full details http://msdn.microsoft.com/en-us/library/ms190349.aspx
I would guess that you have blank values (' ') in one of the columns instead of NULL values. If you are trying to find the first not null non-blank column you can use a case statement.
select
case
when len(rtrim(ltrim(date_1))) > 0 then date_1
when len(rtrim(ltrim(date_2))) > 0 then date_2
when len(rtrim(ltrim(date_3))) > 0 then date_3
else null
end,
Amount
from Source.dbo.[10]

Compute percents from SUM() in the same SELECT sql query

In the table my_obj there are two integer fields:
(value_a integer, value_b integer);
I try to compute how many time value_a = value_b, and I want to express this ratio in percents.
This is the code I have tried:
select sum(case when o.value_a = o.value_b then 1 else 0 end) as nb_ok,
sum(case when o.value_a != o.value_b then 1 else 0 end) as nb_not_ok,
compute_percent(nb_ok,nb_not_ok)
from my_obj as o
group by o.property_name;
compute_percent is a stored_procedure that simply does (a * 100) / (a + b)
But PostgreSQL complains that the column nb_ok doesn't exist.
How would you do that properly ?
I use PostgreSQL 9.1 with Ubuntu 12.04.
There is more to this question than it may seem.
Simple version
This is much faster and simpler:
SELECT property_name
,(count(value_a = value_b OR NULL) * 100) / count(*) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+----
prop_1 | 17
prop_2 | 43
How?
You don't need a function for this at all.
Instead of counting value_b (which you don't need to begin with) and calculating the total, use count(*) for the total. Faster, simpler.
This assumes you don't have NULL values. I.e. both columns are defined NOT NULL. The information is missing in your question.
If not, your original query is probably not doing what you think it does. If any of the values is NULL, your version does not count that row at all. You could even provoke a division-by-zero exception this way.
This version works with NULL, too. count(*) produces the count of all rows, regardless of values.
Here's how the count works:
TRUE OR NULL = TRUE
FALSE OR NULL = NULL
count() ignores NULL values. Voilá.
Operator precedence governs that = binds before OR. You could add parentheses to make it clearer:
count ((value_a = value_b) OR FALSE)
You can do the same with
count NULLIF(<expression>, FALSE)
The result type of count() is bigint by default.
A division bigint / bigint, truncates fractional digits.
Include fractional digits
Use 100.0 (with fractional digit) to force the calculation to be numeric and thereby preserve fractional digits.
You may want to use round() with this:
SELECT property_name
,round((count(value_a = value_b OR NULL) * 100.0) / count(*), 2) AS pct
FROM my_obj
GROUP BY 1;
Result:
property_name | pct
--------------+-------
prop_1 | 17.23
prop_2 | 43.09
As an aside:
I use value_a instead of valueA. Don't use unquoted mixed-case identifiers in PostgreSQL. I have seen too many desperate question coming from this folly. If you wonder what I am talking about, read the chapter Identifiers and Key Words in the manual.
Probably the easiest way to do is to just use a with clause
WITH data
AS (SELECT Sum(CASE WHEN o.valuea = o.valueb THEN 1 ELSE 0 END) AS nbOk,
Sum(CASE WHEN o.valuea != o.valueb THEN 1 ELSE 0 END) AS nbNotOk,
FROM my_obj AS o
GROUP BY o.property_name)
SELECT nbok,
nbnotok,
Compute_percent(nbok, nbnotok)
FROM data
You might also want to try this version:
WITH all(count) as (SELECT COUNT(*)
FROM my_obj),
matching(count) as (SELECT COUNT(*)
FROM my_obj
WHERE valueA = valueB)
SELECT nbOk, nbNotOk, Compute_percent(nbOk, nbNotOk)
FROM (SELECT matching.count as nbOk, all.count - matching.count as nbNotOk
FROM all
CROSS JOIN matching) data