AWS Athena/Presto SQL: Having trouble getting null values - sql

I am doing a query in aws Athena where I want to get some total values, however I am having issues getting a column where the values are null, this column sometimes contains the value of [] that is consider also as null
My query
SELECT COUNT() AS total_rows,
COUNT(DISTINCT sfattachmentid) AS total_attachments,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions") AS total_opps,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions" WHERE (oldcategory IS NOT NULL OR oldcategory != '[]')) AS opp_w_changed,
(SELECT COUNT(DISTINCT salesforce_opportunity_id) FROM "athena_decisionengine"."transactions" WHERE (oldcategory IS NULL OR oldcategory = '[]')) AS opp_without_changed,
SUM(CASE WHEN oldcategory != '' THEN 1 ELSE 0 END) AS oldCategory_changed,
SUM(CASE WHEN oldcategory IS NULL THEN 1 ELSE 0 END) AS oldCategory_blank
FROM "athena_decisionengine"."transactions"
Is giving the following results
However, the value of opp_without_changed seems wrong, becuase if I have total_opps of 1282 and opp_w_changed as 1110 I should expect opp_without_changed to be 172, but is showing me 1282 that seems to be the total of unique salesforce_opportunity_id, so it is like if the filter:
(oldcategory IS NULL OR oldcategory = '[]'))
Was not working

There are two problems in your query
wrong boolean expressions
wrong assumption that
coumnt(distinct) = count(distinct NULL) + count(distinct NOT NULL)
This boolean expression oldcategory IS NOT NULL OR oldcategory != '[]' allows any value except NULL, it allows '[]' as well because '[]' is not NULL. If you want to filter out NULLs and '[]' then correct expression should be oldcategory != '[]' it does not allow NULLs as well because NULL can not be equal or not equal to something. Also it can be empty strings, not NULLs, with empty strings filtered also it will be
oldcategory not in ('[]','') --does not allow NULL, '[]', ''
Second expression including empty rows will be:
oldcategory IS NULL OR oldcategory in ('[]','') --allows NULL, '[]', '' only
Also you are counting DISTINCT salesforce_opportunity_id, not just rows satisfying the WHERE condition, the same salesforce_opportunity_id can possibly have records with NULL, empty, '[]' and other values, so these datasets can intersect and you should NOT expect that
count (distinct salesforce_opportunity_id ) = count(distinct salesforce_opportunity_id where oldcategory is NULL) + count (distinct salesforce_opportunity_id where oldcategory is NOT NULL)
DISTINCT counts are not additive. If you want check that TOTAL = NULLs + NOT NULLs, count everything without DISTINCT and it should match.

For most use caseļ¼Œgive a default value to those NULL value should be ok.
coalesce(oldcategory,'[]') not in (a,b,c,d)

Related

What does this part of my SQL query mean?

sum( (record_id is NULL AND joined.table_id is NULL)::int )
I know the sum returns the sum of the column entries, but what will this expression (... and...) return, can it be compared with this expression (.. + ..), and what does this ()::int?? convert result to int?
i dont know will return this expression, on my sampling will returned number of integer
It is a more complicated way to write
count(*) FILTER (WHERE record_id IS NULL
AND joined.table_id IS NULL)
(record_id is NULL AND joined.table_id is NULL)::int will return 1 iff both record_id and joined.table_id are null.
Therefore, sum( (record_id is NULL AND joined.table_id is NULL)::int ) will return the number of rows in which both record_id and joined.table_id are null.

SQL Count Distinct returning one extra count

How is this possible that these two methods are returning different results?
Method 1 (returns correct count):
SELECT COUNT(DISTINCT contact_id)
FROM Traffic_Action
WHERE action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost')
Method 2 (returns one extra count):
SELECT COUNT(DISTINCT CASE WHEN action_type IN ('Schedule a Tour', 'Schedule Follow-up', 'Lost') THEN contact_id ELSE 0 END)
FROM Traffic_Action
Remove the else part - as 0 is also counted
SELECT COUNT(DISTINCT CASE WHEN
action_type in ('Schedule a Tour','Schedule Follow-up','Lost') THEN contact_id END)
FROM Traffic_Action
No wonder you are getting two different results.
First query:
Provides you the distinct count of records where action_type in Schedule a Tour, Schedule Follow-up and Lost
SELECT COUNT(DISTINCT contact_id) FROM Traffic_Action WHERE action_type in
('Schedule a Tour','Schedule Follow-up','Lost')
Second query:
In this query any value apart from Schedule a Tour, Schedule Follow-up and Lost is considered as 0, and on taking distinct value, results one row according to your case statement
SELECT COUNT(DISTINCT CASE WHEN action_type in ('Schedule a Tour','Schedule Follow-
up','Lost') THEN contact_id ELSE 0 END) FROM Traffic_Action
In simple words,
In first query you are filtering only three values
In second query you have no filters, but case statement on three values and else condition to return 0 for non matching criteria
That means you have 1 record where contact_id is NULL. Normally, COUNT() ignores NULL values. Your second query converts NULL to zero via the "ELSE" branch. That should be why you see a difference.
You can quickly see for yourself in this example. This will return 2 although there are 3 records
select count(distinct a.col1)
from (
select 1 as Col1
union select 2
union select NULL
) a

SQL Query - Get records with null values (but make sure they dont have any other records that match the key with a value)

So I am writing a query whereby I need to get all records within a table that have null or '' values for two fields...
File and Postcode.
My problem is I have duplicate records, all queries I have written so far will return me a record with a null or '' file and postcode field however one of the duplicates (based on email field) does have a file/postcode value.
I need to only get those records where all instances have a null file/postcode value
SELECT DISTINCT EMAIL FROM Results R
WHERE
( ISNULL(R.Postcode, '') = ''
AND
ISNULL(R.File, '') = ''
)
AND NOT EXISTS (
SELECT Id FROM Results RR
WHERE RR.Email = R.Email
AND (
ISNULL(R.Postcode, '') <> ''
AND
ISNULL(R.File, '') <> ''
)
)
ORDER BY R.Email
Bit of a blind stab in the dark here, but I suspect a HAVING clause with a conditional aggregate will resolve this one:
SELECT Email
FROM Results
GROUP BY Email
HAVING COUNT(CASE WHEN Postcode IS NOT NULL AND Postcode != '' THEN 1 END) = 0
AND COUNT(CASE WHEN [File] IS NOT NULL AND [File] != '' THEN 1 END) = 0;
Note, also, that I haven't used ISNULL (or COALESCE) in the logic, but instead used boolean logic. This is actually important as having functions like ISNULL wrapped around a column in your WHERE cause the query to be non-SARGable; meaning that the indexes on your table can't be used to aid the data engine filter to the correct rows and instead it has to perform a full scan of the data.
I would express the having clause as:
HAVING COUNT(NULLIF(Postcode, '')) = 0 AND
COUNT(NULLIF([File], '')) = 0 ;

Coalesce function not selecting data value from series when it exists

My code is as follows:
Insert Into dbo.database (Period, Amount)
Select coalesce (date_1, date_2, date_3), Amount FROM Source.dbo.[10]
I'm 100% a value exists in one of the 3 variables: date_1, date_2, date_3, all as strings (var char 100), yet I am still getting blanks when I call Period.
Any help?
Coalesce is designed to return the first NOT NULL field from the list or NULL if none of the fields are NOT NULL, follow the link for full details http://msdn.microsoft.com/en-us/library/ms190349.aspx
I would guess that you have blank values (' ') in one of the columns instead of NULL values. If you are trying to find the first not null non-blank column you can use a case statement.
select
case
when len(rtrim(ltrim(date_1))) > 0 then date_1
when len(rtrim(ltrim(date_2))) > 0 then date_2
when len(rtrim(ltrim(date_3))) > 0 then date_3
else null
end,
Amount
from Source.dbo.[10]

Counting null and non-null values in a single query

I have a table
create table us
(
a number
);
Now I have data like:
a
1
2
3
4
null
null
null
8
9
Now I need a single query to count null and not null values in column a
This works for Oracle and SQL Server (you might be able to get it to work on another RDBMS):
select sum(case when a is null then 1 else 0 end) count_nulls
, count(a) count_not_nulls
from us;
Or:
select count(*) - count(a), count(a) from us;
If I understood correctly you want to count all NULL and all NOT NULL in a column...
If that is correct:
SELECT count(*) FROM us WHERE a IS NULL
UNION ALL
SELECT count(*) FROM us WHERE a IS NOT NULL
Edited to have the full query, after reading the comments :]
SELECT COUNT(*), 'null_tally' AS narrative
FROM us
WHERE a IS NULL
UNION
SELECT COUNT(*), 'not_null_tally' AS narrative
FROM us
WHERE a IS NOT NULL;
Here is a quick and dirty version that works on Oracle :
select sum(case a when null then 1 else 0) "Null values",
sum(case a when null then 0 else 1) "Non-null values"
from us
for non nulls
select count(a)
from us
for nulls
select count(*)
from us
minus
select count(a)
from us
Hence
SELECT COUNT(A) NOT_NULLS
FROM US
UNION
SELECT COUNT(*) - COUNT(A) NULLS
FROM US
ought to do the job
Better in that the column titles come out correct.
SELECT COUNT(A) NOT_NULL, COUNT(*) - COUNT(A) NULLS
FROM US
In some testing on my system, it costs a full table scan.
As i understood your query, You just run this script and get Total Null,Total NotNull rows,
select count(*) - count(a) as 'Null', count(a) as 'Not Null' from us;
usually i use this trick
select sum(case when a is null then 0 else 1 end) as count_notnull,
sum(case when a is null then 1 else 0 end) as count_null
from tab
group by a
Just to provide yet another alternative, Postgres 9.4+ allows applying a FILTER to aggregates:
SELECT
COUNT(*) FILTER (WHERE a IS NULL) count_nulls,
COUNT(*) FILTER (WHERE a IS NOT NULL) count_not_nulls
FROM us;
SQLFiddle: http://sqlfiddle.com/#!17/80a24/5
This is little tricky. Assume the table has just one column, then the Count(1) and Count(*) will give different values.
set nocount on
declare #table1 table (empid int)
insert #table1 values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(NULL),(11),(12),(NULL),(13),(14);
select * from #table1
select COUNT(1) as "COUNT(1)" from #table1
select COUNT(empid) "Count(empid)" from #table1
Query Results
As you can see in the image, The first result shows the table has 16 rows. out of which two rows are NULL. So when we use Count(*) the query engine counts the number of rows, So we got count result as 16. But in case of Count(empid) it counted the non-NULL-values in the column empid. So we got the result as 14.
so whenever we are using COUNT(Column) make sure we take care of NULL values as shown below.
select COUNT(isnull(empid,1)) from #table1
will count both NULL and Non-NULL values.
Note: Same thing applies even when the table is made up of more than one column. Count(1) will give total number of rows irrespective of NULL/Non-NULL values. Only when the column values are counted using Count(Column) we need to take care of NULL values.
I had a similar issue: to count all distinct values, counting null values as 1, too. A simple count doesn't work in this case, as it does not take null values into account.
Here's a snippet that works on SQL and does not involve selection of new values.
Basically, once performed the distinct, also return the row number in a new column (n) using the row_number() function, then perform a count on that column:
SELECT COUNT(n)
FROM (
SELECT *, row_number() OVER (ORDER BY [MyColumn] ASC) n
FROM (
SELECT DISTINCT [MyColumn]
FROM [MyTable]
) items
) distinctItems
Try this..
SELECT CASE
WHEN a IS NULL THEN 'Null'
ELSE 'Not Null'
END a,
Count(1)
FROM us
GROUP BY CASE
WHEN a IS NULL THEN 'Null'
ELSE 'Not Null'
END
Here are two solutions:
Select count(columnname) as countofNotNulls, count(isnull(columnname,1))-count(columnname) AS Countofnulls from table name
OR
Select count(columnname) as countofNotNulls, count(*)-count(columnname) AS Countofnulls from table name
Try
SELECT
SUM(ISNULL(a)) AS all_null,
SUM(!ISNULL(a)) AS all_not_null
FROM us;
Simple!
If you're using MS Sql Server...
SELECT COUNT(0) AS 'Null_ColumnA_Records',
(
SELECT COUNT(0)
FROM your_table
WHERE ColumnA IS NOT NULL
) AS 'NOT_Null_ColumnA_Records'
FROM your_table
WHERE ColumnA IS NULL;
I don't recomend you doing this... but here you have it (in the same table as result)
use ISNULL embedded function.
All the answers are either wrong or extremely out of date.
The simple and correct way of doing this query is using COUNT_IF function.
SELECT
COUNT_IF(a IS NULL) AS nulls,
COUNT_IF(a IS NOT NULL) AS not_nulls
FROM
us
SELECT SUM(NULLs) AS 'NULLS', SUM(NOTNULLs) AS 'NOTNULLs' FROM
(select count(*) AS 'NULLs', 0 as 'NOTNULLs' FROM us WHERE a is null
UNION select 0 as 'NULLs', count(*) AS 'NOTNULLs' FROM us WHERE a is not null) AS x
It's fugly, but it will return a single record with 2 cols indicating the count of nulls vs non nulls.
This works in T-SQL. If you're just counting the number of something and you want to include the nulls, use COALESCE instead of case.
IF OBJECT_ID('tempdb..#us') IS NOT NULL
DROP TABLE #us
CREATE TABLE #us
(
a INT NULL
);
INSERT INTO #us VALUES (1),(2),(3),(4),(NULL),(NULL),(NULL),(8),(9)
SELECT * FROM #us
SELECT CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END AS 'NULL?',
COUNT(CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END) AS 'Count'
FROM #us
GROUP BY CASE WHEN a IS NULL THEN 'NULL' ELSE 'NON-NULL' END
SELECT COALESCE(CAST(a AS NVARCHAR),'NULL') AS a,
COUNT(COALESCE(CAST(a AS NVARCHAR),'NULL')) AS 'Count'
FROM #us
GROUP BY COALESCE(CAST(a AS NVARCHAR),'NULL')
Building off of Alberto, I added the rollup.
SELECT [Narrative] = CASE
WHEN [Narrative] IS NULL THEN 'count_total' ELSE [Narrative] END
,[Count]=SUM([Count]) FROM (SELECT COUNT(*) [Count], 'count_nulls' AS [Narrative]
FROM [CrmDW].[CRM].[User]
WHERE [EmployeeID] IS NULL
UNION
SELECT COUNT(*), 'count_not_nulls ' AS narrative
FROM [CrmDW].[CRM].[User]
WHERE [EmployeeID] IS NOT NULL) S
GROUP BY [Narrative] WITH CUBE;
SELECT
ALL_VALUES
,COUNT(ALL_VALUES)
FROM(
SELECT
NVL2(A,'NOT NULL','NULL') AS ALL_VALUES
,NVL(A,0)
FROM US
)
GROUP BY ALL_VALUES
select count(isnull(NullableColumn,-1))
if its mysql, you can try something like this.
select
(select count(*) from TABLENAME WHERE a = 'null') as total_null,
(select count(*) from TABLENAME WHERE a != 'null') as total_not_null
FROM TABLENAME
Just in case you wanted it in a single record:
select
(select count(*) from tbl where colName is null) Nulls,
(select count(*) from tbl where colName is not null) NonNulls
;-)
for counting not null values
select count(*) from us where a is not null;
for counting null values
select count(*) from us where a is null;
I created the table in postgres 10 and both of the following worked:
select count(*) from us
and
select count(a is null) from us
In my case I wanted the "null distribution" amongst multiple columns:
SELECT
(CASE WHEN a IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS a_null,
(CASE WHEN b IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS b_null,
(CASE WHEN c IS NULL THEN 'NULL' ELSE 'NOT-NULL' END) AS c_null,
...
count(*)
FROM us
GROUP BY 1, 2, 3,...
ORDER BY 1, 2, 3,...
As per the '...' it is easily extendable to more columns, as many as needed
Number of elements where a is null:
select count(a) from us where a is null;
Number of elements where a is not null:
select count(a) from us where a is not null;