Write Hive queries to see how many missing values you have in each attribute - hive

I want write hive query such that it I can see count of null values of each column

You can use this SQL - this will give you total count, null and not null count.
SELECT
count(*) total_cnt,
sum(case when data_col is null then 1 else 0 end) null_cnt,
sum(case when data_col is null then 0 else 1 end) nonnull_cnt
From mytable

Related

Create a Query to check if any Column in a table is Null

I have zero experience with SQL but am trying to learn how to validate tables. I am trying to see within a table if any of the columns are null.
Currently I have been going with a script that is just counting the number of nulls. I am doing this for each column. Is there a better script that I can use to check all the columns in a table?
select count(id) from schema.table where id is not null
If there are 100 records I would expect all columns to come back with 100 but if one column is null it will show a 0.
You can count each column in a single query by using sum and case:
select
sum(case when Column1 is null then 1 else 0 end) Column1NullCount
, sum(case when Column2 is null then 1 else 0 end) Column2NullCount
-- ...
, sum(case when ColumnN is null then 1 else 0 end) ColumnNNullCount
from MyScheme.MyTable

SQL query to find the count of non null values in each column of table?

I can find the count of non null values by typing each column name, but is there a way to write it without manually typing the column names as I have 100+ columns in my table.
select 'col1Name', count(col1Name) from table where col1Name is null
union
select 'col2Name', count(col2Name) from table where col2Name is null
union ...
select 'col20Name', count(col20Name) from table where col20Name is null
You can use a case operation here
select
sum(case when a is null then 1 else 0 end) A,
sum(case when b is null then 1 else 0 end) B,
sum(case when c is null then 1 else 0 end) C
from T

How to aggregate unioned tables with count and dummy values?

I am counting in two tables some stuff and want one aggregated result (one row). I write this SQL for this purpose:
SELECT sum (Amount_New) Amount_New,
sum(Import_Dropout) Import_Dropout,
sum(Import)Import,
sum(Processing)Processing,
sum(Processing_Dropout)Processing_Dropout,
sum(Matching)Matching,
sum(Matching_Dropout)Matching_Dropout,
sum(Export)Export,
sum(Exported)Exported,
sum(Rejected)Rejected,
sum(AmountSubTotal)AmountSubTotal,
sum(AmountTotal)AmountTotal
FROM (
SELECT COUNT(CASE WHEN ProcessStatus='_New' THEN 1 ELSE null END) AS Amount_New,
COUNT(CASE WHEN ProcessStatus='Import_Dropout' THEN 1 ELSE null END) AS Import_Dropout,
COUNT(CASE WHEN ProcessStatus='Import' THEN 1 ELSE null END) AS Import,
COUNT(CASE WHEN ProcessStatus='Processing' THEN 1 ELSE null END) AS Processing,
COUNT(*) AS AmountTotal,
0 as Processing_Dropout,
0 as Matching,
0 as Matching_Dropout,
0 as Export,
0 as Export_Dropout,
0 as Exported,
0 as Rejected,
0 as AmountSubTotal,
0 as UnionOrder
FROM "fileimport$marketscanimportcsv"
WHERE ft_boekdat like '%2018%'
UNION
SELECT 0 AS Amount_New,
0 AS Import_Dropout,
0 AS Import,
COUNT(CASE WHEN ProcessStatus='Processing_Dropout' THEN 1 ELSE null END) AS Processing_Dropout,
COUNT(CASE WHEN ProcessStatus='Processing' THEN 1 ELSE null END) AS Processing,
COUNT(CASE WHEN ProcessStatus='Matching' THEN 1 ELSE null END) AS Matching,
COUNT(CASE WHEN ProcessStatus='Matching_Dropout' THEN 1 ELSE null END) AS Matching_Dropout,
COUNT(CASE WHEN ProcessStatus='Export' THEN 1 ELSE null END) AS Export,
COUNT(CASE WHEN ProcessStatus='Export_Dropout' THEN 1 ELSE null END) AS Export_Dropout,
COUNT(CASE WHEN ProcessStatus='Exported' THEN 1 ELSE null END) AS Exported,
COUNT(CASE WHEN ProcessStatus='Rejected' THEN 1 ELSE null END) AS Rejected,
COUNT(CASE WHEN ProcessStatus!= '_New' and ProcessStatus!= 'Import_Dropout' and ProcessStatus!= 'Import' THEN 1 ELSE null END) AS AmountSubTotal,
COUNT(*) AS AmountTotal,
1 as UnionOrder
FROM "matching$marketscanmovement"
WHERE date_part ('year', BookingDate_BA_MS)= 2018
) SK GROUP by unionorder order by unionorder asc
1) The result is 2 rows, which is not one value as total of that column.
Why is this query not summarizing the unioned same column values? How should it be written?
2) When I try sum "(Amount_New1+Amount_New2) Amount_New" (and changing the subquery column names to amount_new1 / amount_new2) it is not working neither. Why?
If you want to return a single row, conceptually representing an aggregate over the entire union table, then remove GROUP BY:
SELECT SUM(Amount_New) Amount_New,
SUM(Import_Dropout) Import_Dropout,
SUM(Import) Import,
SUM(Processing) Processing,
SUM(Processing_Dropout) Processing_Dropout,
SUM(Matching) Matching,
SUM(Matching_Dropout) Matching_Dropout,
SUM(Export) Export,
SUM(Exported) Exported,
SUM(Rejected) Rejected,
SUM(AmountSubTotal) AmountSubTotal,
SUM(AmountTotal) AmountTotal
FROM
(
-- ... your current query
) SK;
You also may want to remove the UnionOrder computed column, since there is no need to sort a single row result set.

Problems with my WHERE clause (SQL)

I'm trying to write a query that returns the following columns:
owner_id,
number_of_concluded_bookings,
number_of_declined_bookings,
number_of_inquiries
However, the problem is that my WHERE clause messes up the query because I am querying the same table. Here is the code:
SELECT owner_id,
Count(*) AS number_of_cancelled_bookings
FROM bookings
WHERE state IN ('cancelled')
GROUP BY owner_id
ORDER BY 1;
It's easy to retrieve the columns individually, but I want all of them. Say I wanted number_of_concluded_bookings as well, that would mean I'd have to alter the WHERE clause ...
Help is greatly appreciated!
Consider conditional aggregations:
SELECT owner_id,
SUM(CASE WHEN state='concluded' THEN 1 ELSE 0 END) AS number_of_concluded_bookings,
SUM(CASE WHEN state='cancelled' THEN 1 ELSE 0 END) AS number_of_cancelled_bookings,
SUM(CASE WHEN state='declined' THEN 1 ELSE 0 END) AS number_of_declined_bookings,
SUM(CASE WHEN state='inquiries' THEN 1 ELSE 0 END) AS number_of_inquiries
FROM bookings
GROUP BY owner_id

SQL Query to add result set values as column values

I want to add the below queries as the column values without creating a table.
Select 'NetworkKey' as AuthKey
Select count(NetworkSK) as Totalcount from EDW.Fact.AuthorizationRequest
Select count(*) as NUllcount from EDW.Fact.AuthorizationRequest where NetworkSK is NULL
Select count(*) as NotNullcount from EDW.Fact.AuthorizationRequest where NetworkSK is Not NULL
My result should look like this without creating a table physically...
AuthKey Totalcount Nullcount NotNullCount
NetworkKey 100 5 95
YOU WANT TO DO IT LIKE THIS BECAUSE THIS WILL WORK TO SOLVE YOUR PROBLEM.
SELECT 'NetworkKey' AS AuthKey,
COUNT(*) AS TotalCount,
SUM(CASE WHEN NetworkSK IS NULL THEN 1 ELSE 0 END) AS NUllcount,
SUM(CASE WHEN NetworkSK IS NOT NULL THEN 1 ELSE 0 END) AS NotNullcount
FROM EDW.Fact.AuthorizationRequest
Happy holidays.