SQL query to find the count of non null values in each column of table? - sql

I can find the count of non null values by typing each column name, but is there a way to write it without manually typing the column names as I have 100+ columns in my table.
select 'col1Name', count(col1Name) from table where col1Name is null
union
select 'col2Name', count(col2Name) from table where col2Name is null
union ...
select 'col20Name', count(col20Name) from table where col20Name is null

You can use a case operation here
select
sum(case when a is null then 1 else 0 end) A,
sum(case when b is null then 1 else 0 end) B,
sum(case when c is null then 1 else 0 end) C
from T

Related

Write Hive queries to see how many missing values you have in each attribute

I want write hive query such that it I can see count of null values of each column
You can use this SQL - this will give you total count, null and not null count.
SELECT
count(*) total_cnt,
sum(case when data_col is null then 1 else 0 end) null_cnt,
sum(case when data_col is null then 0 else 1 end) nonnull_cnt
From mytable

Create a Query to check if any Column in a table is Null

I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null.
Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table?
select count(id) from schema.table where id is not null
If there are 100 records I would expect all columns to come back with 100 but if one column is null it will show a 0.
You can count each column in a single query by using sum and case:
select
sum(case when Column1 is null then 1 else 0 end) Column1NullCount
, sum(case when Column2 is null then 1 else 0 end) Column2NullCount
-- ...
, sum(case when ColumnN is null then 1 else 0 end) ColumnNNullCount
from MyScheme.MyTable

how do you check for nulls in any column in an entire table in SQL

I would like to check if any of my columns in a table have any null values. I am sure there is a quicker way than how I am doing it at the moment. I just want to see if there is a NULL in ANY column however my table has a lot of columns, is there a simple and quick way?
This way I have written so far works but it takes a long time to do for every column (hence the etc etc)
select
sum(case when id is null then 1 else 0 end) as id,
sum(case when name is null then 1 else 0 end) as name,
sum(case when review_count is null then 1 else 0 end) as review_coun,
sum(case when positive_review is null then 1 else 0 end) as
positive_review,
sum(etc etc
from user
I don't know if this will work for your scenario, but it's an option. You can CAST all your columns as a string and then concatenate them together. If you concatenate a NULL value with a string, it will return NULL.
SELECT 'Y'
WHERE EXISTS( -- Check if there are any NULL rows
SELECT
CAST(c1 AS CHAR(1)) ||
CAST(c2 AS CHAR(1)) ||
...
AS MyColumns
WHERE MyColumns IS NULL
)
;

SQL : Group by and check if all, some or none are set

Lets say I have the following table:
FKEY A B C D E F
'A' 1 0 1 0 1 0
'A' 0 1 1 1 0 0
Now i want to make a group by FKEY but I just want to know if the A-F columns has 1 in one, all or none of the grouped rows.. The resulton the above table would be:
FKEY A B C D E F
'A' S S A S S N
..where S is "some", A is "all" and N is "none".
What would be the best approach to make this query. I could so some nested queries, but isnt there a smarter way?
In my real life data, the 1's and 0's are actually DATETIME and NULL's
You can use case and aggregation:
select fkey,
(case when sum(a) = 0 then 'N'
when sum(a) = count(*) then 'A'
else 'S'
end) as a,
(case when sum(b) = 0 then 'N'
when sum(b) = count(*) then 'A'
else 'S'
end) as b,
. . .
from t
group by fkey;
The above assumes that the values are only 0 and 1. If that is the case, you can actually phrase this as:
(case when max(a) = 0 then 'N'
when min(a) = 1 then 'A'
else 'S'
end) as a,
You mentioned that your 0 and 1 are actually null or non null dates. Here's a modified version of Gordon's query that caters for that:
select fkey,
(case when count(datecol) = 0 then 'all dates are null'
when count(datecol) = count(*) then 'all dates are filled'
else 'some are null, some filled'
end) as a,
...
from t
group by fkey;
COUNT(null) is 0, COUNT('2001-01-01') is 1, COUNT(*) is the row count independent of any variable. Hence, if our count of the dates was 0, all must be null. If the count of the dates was equal to the count of the rows, then all must be filled with some value, otherwise it's a mix

To get one record if mre than one record exists for same id ,then get only one record based on below condition

e.g. If I get 2 null data in a column for same id, then pass null.
ii) If I get 2 same not null data in a column for same id, then pass not null.
ii) If I get 1 null and 1 not null data in a column for same id, then pass not null.
ii) If I get 2 different not null data in a column for same id, then pass '?'.
Sample data
Please find the sample data in the image.
Thanks in advance.
Output obtained after new code:
Result
You can try following :
SELECT (CASE WHEN PRTY_KEY_ID_NN= 0 THEN NULL /* IF BOTH/ALL VALUES ARE NULL */)
WHEN PRTY_KEY_ID_NN <> TOTAL_COUNT THEN PRTY_KEY_ID_NN_MX
ELSE
CASE WHEN TOTAL_COUNT=PRTY_KEY_ID_NN_DIST THEN NULL /* IF BOTH HAVING DIFFERENT NOT NULL VALUE */
CASE WHEN TOTAL_COUNT<>PRTY_KEY_ID_NN_DIST THEN PRTY_KEY_ID_NN_MX
END
END)PRTY_KEY_ID_VAL,
(CASE WHEN TOTAL_ASSET_A_NN= 0 THEN NULL /* IF BOTH/ALL VALUES ARE NULL */)
WHEN TOTAL_ASSET_A_NN <> TOTAL_COUNT THEN TOTAL_ASSET_A_MX
ELSE
CASE WHEN TOTAL_COUNT=TOTAL_ASSET_A_DIST THEN NULL /* IF BOTH HAVING DIFFERENT NOT NULL VALUE */
CASE WHEN TOTAL_COUNT<>TOTAL_ASSET_A_DIST THEN TOTAL_ASSET_A_MX
END
END)TOTAL_ASSET_VAL,
(CASE WHEN DBUS_NUM_NN= 0 THEN NULL /* IF BOTH/ALL VALUES ARE NULL */)
WHEN DBUS_NUM_NN <> TOTAL_COUNT THEN DBUS_NUM_MX
ELSE
CASE WHEN TOTAL_COUNT=DBUS_NUM_DIST THEN NULL /* IF BOTH HAVING DIFFERENT NOT NULL VALUE */
CASE WHEN TOTAL_COUNT<>DBUS_NUM_DIST THEN DBUS_NUM_MX
END
END)DBUS_NUM_DIST_VAL,
(CASE WHEN BUS_NATURE_DE_NN= 0 THEN NULL /* IF BOTH/ALL VALUES ARE NULL */)
WHEN BUS_NATURE_DE_NN <> TOTAL_COUNT THEN BUS_NATURE_DE_MX
ELSE
CASE WHEN TOTAL_COUNT=BUS_NATURE_DE_DIST THEN NULL /* IF BOTH HAVING DIFFERENT NOT NULL VALUE */
CASE WHEN TOTAL_COUNT<>BUS_NATURE_DE_DIST THEN BUS_NATURE_DE_MX
END
END)BUS_NATURE_DE_VAL
FROM
(SELECT COUNT(*)TOTAL_COUNT,
SUM(CASE WHEN PRTY_KEY_ID IS NULL THEN 0 ELSE 1 END)PRTY_KEY_ID_NN,
SUM(CASE WHEN TOTAL_ASSET_A IS NULL THEN 0 ELSE 1 END)TOTAL_ASSET_A_NN,
SUM(CASE WHEN TOTAL_LIAB_A IS NULL THEN 0 ELSE 1 END)TOTAL_LIAB_A_NN,
SUM(CASE WHEN DBUS_NUM IS NULL THEN 0 ELSE 1 END)DBUS_NUM_NN,
SUM(CASE WHEN BUS_NATURE_DE IS NULL THEN 0 ELSE 1 END)BUS_NATURE_DE_NN,
MAX(PRTY_KEY_ID_NN)PRTY_KEY_ID_NN_MX,
MAX(TOTAL_ASSET_A)TOTAL_ASSET_A_MX,
MAX(TOTAL_LIAB_A)TOTAL_LIAB_A_MX,
MAX(DBUS_NUM)DBUS_NUM_MX,
MAX(BUS_NATURE_DE)BUS_NATURE_DE_MX,
COUNT(DISTINCT PRTY_KEY_ID_NN) PRTY_KEY_ID_NN_DIST,
COUNT(DISTINCT TOTAL_ASSET_A)TOTAL_ASSET_A_DIST,
COUNT(DISTINCT TOTAL_LIAB_A)TOTAL_LIAB_A_DIST,
COUNT(DISTINCT DBUS_NUM)DBUS_NUM_DIST,
COUNT(DISTINCT BUS_NATURE_DE)BUS_NATURE_DE_DIST
FROM YOUR_TABLE)
You can find out the sample data like image in below SQL.If the column data type is varchar(char) please delete the TO_CHAR expression, and replace the table name TEST_TABLE to your table's.
WITH TEST_TABLE AS (
SELECT 1056059 AS PRTY_KEY_ID, NULL AS TOT_ASSET_AM, NULL AS TOT_LIABILITIES_AM, NULL AS DBUS_NM, 'Pediatrics' AS BUS_NATURE_DE FROM DUAL
UNION ALL
SELECT 1056059 AS PRTY_KEY_ID, 5000 AS TOT_ASSET_AM, '300000' AS TOT_LIABILITIES_AM, NULL AS DBUS_NM, 'Medicine' AS BUS_NATURE_DE FROM DUAL
)
SELECT
PRTY_KEY_ID,
MIN(CASE WHEN TOT_ASSET_AM_CNT > 1 THEN '?' ELSE TO_CHAR(TOT_ASSET_AM) END) AS TOT_ASSET_AM,
MIN(CASE WHEN TOT_LIABILITIES_AM_CNT > 1 THEN '?' ELSE TO_CHAR(TOT_LIABILITIES_AM) END) AS TOT_LIABILITIES_AM,
MIN(CASE WHEN DBUS_NM_CNT > 1 THEN '?' ELSE TO_CHAR(DBUS_NM) END) AS DBUS_NM,
MIN(CASE WHEN BUS_NATURE_DE_CNT > 1 THEN '?' ELSE TO_CHAR(BUS_NATURE_DE) END) AS BUS_NATURE_DE
FROM (
SELECT
TEST_TABLE.*,
COUNT(TOT_ASSET_AM) OVER(PARTITION BY PRTY_KEY_ID) AS TOT_ASSET_AM_CNT,
COUNT(TOT_LIABILITIES_AM) OVER(PARTITION BY PRTY_KEY_ID) AS TOT_LIABILITIES_AM_CNT,
COUNT(DBUS_NM) OVER(PARTITION BY PRTY_KEY_ID) AS DBUS_NM_CNT,
COUNT(BUS_NATURE_DE) OVER(PARTITION BY PRTY_KEY_ID) AS BUS_NATURE_DE_CNT
FROM TEST_TABLE
)
GROUP BY PRTY_KEY_ID