How to remove the elements with value as zero in hive array - sql

I have an array column in hive which will be having 7 numbers.
For Ex: [32,4,0,43,23,0,1]
I want my output to be [32,4,43,23,1] (with all the zero elements removed)
Someone help me to accomplish this?

Explode array, filter, collect again.
Demo:
with mydata as (
select array(32,4,0,43,23,0,1) as initial_array
)
select initial_array, collect_set(element) as result_array
from
(
select initial_array, e.element
from mydata
lateral view outer explode(initial_array)e as element
) s
where element != 0
group by initial_array
Result:
initial_array result_array
[32,4,0,43,23,0,1] [32,4,43,23,1]

Related

hive string json list to array with specific field

I want to select array at string json list with specific field in hive.
For example,
[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]
return array of key1 value
[val1,val3,val5]
How can I make it possible?
Convert string to JSON array: remove [], split by comma between } and {. Then extract val1 and collect_list to get an array of val1, see comments in the code:
with mytable as(--data example with single row
select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string
)
select collect_list( --collect array
get_json_object(json_map_string,'$.key1') --key1 extracted
) as key1_array
from
(
select split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) as json_array --converted to array of json strings (map)
from mytable
)s
lateral view outer explode(json_array) e as json_map_string --explode array elements
;
Result:
key1_array
["val1","val3","val5"]

how to remove duplicate value in a cell Hive table

I got a column in my Hive SQL table where values are separated by comma (,) for each cell. Some values in this string are duplicated which I want to remove. Here is an example of my data:
test, test1, test,test1
rest,rest1,rest1,rest
chest,nest,lest,gest
The result should replace any duplicates:
test,test1
rest,rest1
chest,nest,lest,gest
I want to remove duplicates. Could anyone help me with this issue?
Thank you
Solution for Hive.
Split to get an array, explode, use collect_set to get array without duplicates, concatenate array using concat_ws.
Demo (Hive):
with your_table as(
select stack(3,
1, 'test, test1, test,test1',
2, 'rest,rest1,rest1,rest',
3, 'chest,nest,lest,gest'
) as (id, colname)
)
select t.id, t.colname, concat_ws(',',collect_set(trim(e.elem))) result
from your_table t
lateral view outer explode(split(colname,',')) e as elem
group by t.id, t.colname
trim() is used to remove spaces which present in your data example.
Result:
t.id t.colname result
1 test, test1, test,test1 test,test1
2 rest,rest1,rest1,rest rest,rest1
3 chest,nest,lest,gest chest,nest,lest,gest

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

getting duplicate values in jsonb query posgresql

In table I have used jsonb to store multiple values in json array. now i want to write a query where day is monday. [{'day':'monday','time':"8 am"},{'day':'tuesday','time':"8 am"}{'day':'monday','time':"11 am"},{'day':'friday','time':"8 am"}]
Query:
SELECT array_to_json(array_agg(j))
FROM demo t, jsonb_array_elements(t.di_item ) j
WHERE j->>'day' = 'monday'
Result:
[{'day':'monday','time':"8 am"},{'day':'monday','time':"11 am"},{'day':'monday','time':"8
am"},{'day':'monday','time':"11 am"}]
Expected:
[{'day':'monday','time':"8 am"},{'day':'monday','time':"11 am"}]
One value getting two times.
First: no need to aggregate the json objects as an array, and then convert it to a json array, you can use json[b]_agg() directly. Then: use distinct to avoid duplicates.
SELECT jsonb_agg(distinct j)
FROM demo t
CROSS JOIN LATERAL jsonb_array_elements(t.di_item) j
WHERE j->>'day' = 'monday'

Extract a string in Postgresql and remove null/empty elements

I Need to extract values from string with Postgresql
But for my special scenario - if an element value is null i want to remove it and bring the next element 1 index closer.
e.g.
assume my string is: "a$$b"
If i will use
select string_to_array('a$$b','$')
The result is:
{a,,b}
If Im trying
SELECT unnest(string_to_array('a__b___d_','_')) EXCEPT SELECT ''
It changes the order
1.d
2.a
3.b
order changes which is bad for me.
I have found a other solution with:
select array_remove( string_to_array(a||','||b||','||c,',') , '')
from (
select
split_part('a__b','_',1) a,
split_part('a__b','_',2) b,
split_part('a__b','_',3) c
) inn
Returns
{a,b}
And then from the Array - i need to extract values by index
e.g. Extract(ARRAY,2)
But this one seems to me like an overkill - is there a better or something simpler to use ?
You can use with ordinality to preserve the index information during unnesting:
select a.c
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null
order by idx;
If you want that back as an array:
select array_agg(a.c order by a.idx)
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null;