counting number of rows having not null values in hive

counting number of rows having not null values in hive - hive

I have a query how to count no of rows present in a table having not null value.
Suppose, below data set is having 8 column h1,h2,h3......h8. if all the all the column is having NULL then the count is 0. if at least one column is having value then the count is 1.
h1 h2 h3 h4 h5 h6 h7 h8
U U NULL U Y NULL Y X
U NULL U U Y Y X X
U U U NULL U NULL Y NULL
NULL NULL NULL NULL NULL NULL NULL NULL
X V U U Y NULL Z X
Y X NULL X Y Z U
X NULL U NULL NULL U Z Y
NULL NULL NULL NULL NULL NULL NULL NULL
FOr above data set the answer will be 6. since only two rows(4 and 6) are such which is having all the column as NULL.
Please suggest the command in hive to get the result.

You could use a combination of CASE ,COALESCE and SUM.
SELECT SUM (
CASE
WHEN COALESCE (h1, h2, h3, h4, h5, h6, h7, h8) IS NOT NULL
THEN 1
ELSE 0
END)
FROM yourtable;

Related

Re-arrange or Merge multiple rows in SQL

Assuming I have a table containing the following information:
ID
Column A
Column B
Column C
1
A
NULL
NULL
1
NULL
B
NULL
1
NULL
C
NULL
1
NULL
NULL
D
1
NULL
NULL
E
1
NULL
F
NULL
2
NULL
X
NULL
2
NULL
Y
NULL
2
NULL
NULL
Z
is there a way I can perform a select on the table to get the following
ID
Column A
Column B
Column C
1
A
B
D
1
NULL
C
E
1
NULL
F
NULL
2
NULL
X
Z
2
NULL
Y
NULL

Assuming you want to get rows where column B is not null:
SELECT *
FROM yourTable t
WHERE t.ColumnB is not null

SQL outer join on two columns, returning null in one column if only the other matches

I couldn't find exactly what I'm looking for in another thread.
Let's say I have these two tables:
left:
x
left_y
a
1
a
2
b
4
b
5
right:
x
right_y
a
2
a
3
b
5
b
6
I want to run a query close to this in intention:
SELECT *
FROM left FULL OUTER JOIN right
ON (left.x = right.x AND left.left_y = right.right_y)
OR left.x = right.x
And get an output that has no nulls in x, but maybe has a null in left_y or right_y
x
left_y
right_y
a
1
null
a
2
2
a
null
3
b
4
null
b
5
5
b
null
6

You can use coalesce:
select coalesce(l.x, r.x) as x,
left_y,
right_y
from l full outer join r
on l.x = r.x
and l.left_y = r.right_y
Fiddle

Copying missing data between tables

I have a table Alpha
A
B
C
2
4
3
1
5
1
4
3
null
I have a reference of table like BETA of one column
like
a
1
2
3
4
5
I want to copy the data missing in Alpha with respect to Beta to another table Gamma such that
The expected result is as follows
A
B
C
3
1
2
5
2
4
null
null
5
It has to refer to the beta table as it is not always in order,
eg. beta table can be 2,3,5 and alpha table has 2 and 3 so the missing value is just 5
PS: this is a minimal representation, in real there are more than 20 columns in Alpha but only one column in beta
The table Alpha and expected result table are same

I have to put my crystal ball into overdrive.
I think you wanted to generate the missing value for each column in table Alpha based on the list of value in table `Beta'
What the below query doing is find out the missing value for each column (A, B, C). After that PIVOT it
; with missing as
(
select col = 'A', V = A
from Beta b
where not exists (select * from Alpha a where a.A = b.A)
union all
select col = 'B', V = A
from Beta b
where not exists (select * from Alpha a where a.B = b.A)
union all
select col = 'C', V = A
from Beta b
where not exists (select * from Alpha a where a.C = b.A)
)
select [A], [B], [C]
from (
select *, rn = row_number() over (partition by col order by V)
from missing
) m
pivot
(
max(V)
for col in ([A], [B], [C])
) p
PS : if you really have 50 columns in table Alpha, you need to do the union all query 50 times, each for one of the column.

I think you can use several insert statement for this problem
INSERT GAMMA(A)
SELECT BETA.a FROM BETA
LEFT JOIN ALPHA ON BETA.a = ALPHA.A
WHERE ALPHA.A IS NULL
Change the A to B or another column name.
But this table will produce table like this
A
B
C
3
null
null
5
null
null
null
1
null
null
2
null
null
null
2
null
null
4
This might not be the answer you want, but i hope it gives you alternatives.

SQL - Redshift Lag Function getting duplicates

I have a table below
ID Type Sub_ID Date CNT
A P A1 4/1/2020 5
A P A2 4/5/2020 NULL
A P A3 4/8/2020 NULL
What I want to get is
ID Type Sub_ID Date CNT LAG
A P A1 4/1/2020 5 NULL
A P A2 4/5/2020 NULL 5
A P A3 4/8/2020 NULL NULL
I have below queries but it's giving me duplicates like
ID Type Sub_ID Date CNT LAG
A P A1 4/1/2020 5 NULL
A P A1 4/1/2020 5 5 (duplicate)
A P A2 4/5/2020 NULL 5
A P A2 4/5/2020 NULL NULL (duplicate)
A P A3 4/8/2020 NULL NULL
select *, lag(cnt,1) over (partition by id, type order by date)
from mytable
Anything wrong?

Ok...I have duplicate data in my table..Need to dedup first and then do the lag on top of the cleaned table

sql query to find second not null value in a table (oracle)

I have have a table which has n columns with n columns.Some of the column name are as follows
c, c2,c3,c4,c5 , c25
sample data
c1 c2 c3 c4 c5 (consider only 5 in this case)
x y z z y
x y
x
a b j k
a c g h i
k l m n o
Now op = second not null value from right side
sample op for above data
z
x
x (special case as no data left of x)
j
h
n
Cannot use COALESCE ad i need second not null not first
Can someone help me with this query

You can do this with a more complex case statement:
select (case when c5 is not null
then coalesce(c4, c3, c2, c1)
when c4 is not null
then coalesce(c3, c2, c1)
when c3 is not null
then coalesce(c2, c1)
else c1
end)
. . .

Something like:
select nvl2(c5,c4,nvl2(c4,c3,nvl2(c3,c2,c1))) result
from my_table
Not tested.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

counting number of rows having not null values in hive - hive

You could use a combination of CASE ,COALESCE and SUM. SELECT SUM ( CASE WHEN COALESCE (h1, h2, h3, h4, h5, h6, h7, h8) IS NOT NULL THEN 1 ELSE 0 END) FROM yourtable;

Related

Re-arrange or Merge multiple rows in SQL

SQL outer join on two columns, returning null in one column if only the other matches

Copying missing data between tables

SQL - Redshift Lag Function getting duplicates

sql query to find second not null value in a table (oracle)

Categories

Resources