hive string json list to array with specific field - sql

I want to select array at string json list with specific field in hive.
For example,
[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]
return array of key1 value
[val1,val3,val5]
How can I make it possible?

Convert string to JSON array: remove [], split by comma between } and {. Then extract val1 and collect_list to get an array of val1, see comments in the code:
with mytable as(--data example with single row
select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string
)
select collect_list( --collect array
get_json_object(json_map_string,'$.key1') --key1 extracted
) as key1_array
from
(
select split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) as json_array --converted to array of json strings (map)
from mytable
)s
lateral view outer explode(json_array) e as json_map_string --explode array elements
;
Result:
key1_array
["val1","val3","val5"]

Related

Is there a way of replace null values in a JSON bigquery field?

I have a JSON value like the one below in a certain column of my table:
{"values":[1, 2, null, 4, null]}
What I want is to convert the value in a bigquery ARRAY: ARRAY<INT64>
I tried JSON_VALUE_ARRAY but it throws an error because the final output cannot be anarray with NULLs.
Said that, what should be the correct approach for that?
You can unnest an array with null elements. For building a new array you can provided the flag ignore nulls to remove null values.
with tbl as (select JSON '{"values":[1, 2, null, 4, null]}' as data union all select JSON ' {"values":[ ] }')
select *,
((Select array_agg(x ignore nulls) from unnest(JSON_VALUE_ARRAY (data.values ) ) x))
from tbl

Strings of arrays and maps?

I have a table that contains a field of a varchar type. Within this field, there are empty arrays, arrays and maps stored as strings. something like this:
id
dumped_field
1
[]
2
[123,456,789]
3
{'0":123, "1":456}
4
NULL
The goal would be to try and convert this string field as an array as opposed to a string:
id
dumped_field
1
[]
2
[123,456,789]
3
[123,456]
4
NULL
The problem is that these various data types have been stored as strings into this field. Is there a way to A) convert the string of array into an array and B) convert the string of json into an array?
Assuming your data is json (and after fixing the quotes in the object) you can process it as json (leveraging try and try_cast):
-- sample data
WITH dataset (id, dumped_field) AS (
VALUES (1, '[]'),
(2, '[123,456,789]'),
(3, '{"0":123, "1":456}'),
(4, NULL)
)
-- query
select coalesce(
try_cast(json_parse(dumped_field) as array(varchar)), -- try process as array
try(map_values(cast(json_parse(dumped_field) as map(varchar, varchar))))) -- try process as object
from dataset;
Output:
_col0
[]
[123, 456, 789]
[123, 456]
NULL

Presto Unnest varchar array field with {}

I have a column with inconsistent data format, some of them are a list of array [], some of them are JSON_like objects {}
id
prices
1
[100,100,110]
2
{200,210,190}
create table test(id integer, prices varchar(255));
insert into test
values
(1,'[100,100,110]'),
(2,'{200,210,190}');
When I tried to unnest, my query works fine for the first row, but it fails on the second row. Is there a way I can convert the {} to a list of array []?
This is my query:
select id,prices,price from test
cross join UNNEST(cast(json_parse(prices) as array<varchar>)) as t (price)
You can use replace and then parse the data into array:
select json_parse(replace(replace('{200,210,190}', '}', ']'), '{', '['))
Output:
_col0
[200,210,190]

Querying One Dimensional JSON string

I'm trying to check to see if a one dimensional array string like:
[1,2,3,4,5]
contains a certain value, like 4.
I know with multi-dimensional arrays I could do something like:
JSON_VALUE(columnName, '$.key')
but for the life of me, I can't figure out how to search a keyless json string.
I've tried:
WHERE JSON_VALUE(columnName, '$') = 1
WHERE JSON_VALUE(columnName, '$.') = 1
WHERE 1 IN JSON_VALUE(columnName, '$')
WHERE 1 IN JSON_VALUE(columnName, '$.')
and nothing works.
Assuming that the string '[1,2,3,4,5]' is in a column in your table, you could use an EXISTS with OPENJSON:
SELECT V.YourColumn
FROM (VALUES('[1,2,3,4,5]'),('[7,8,9]'))V(YourColumn)
WHERE EXISTS (SELECT 1
FROM OPENJSON(V.YourColumn)
WITH (Value int '$') OJ
WHERE Value = 4);

How to remove the elements with value as zero in hive array

I have an array column in hive which will be having 7 numbers.
For Ex: [32,4,0,43,23,0,1]
I want my output to be [32,4,43,23,1] (with all the zero elements removed)
Someone help me to accomplish this?
Explode array, filter, collect again.
Demo:
with mydata as (
select array(32,4,0,43,23,0,1) as initial_array
)
select initial_array, collect_set(element) as result_array
from
(
select initial_array, e.element
from mydata
lateral view outer explode(initial_array)e as element
) s
where element != 0
group by initial_array
Result:
initial_array result_array
[32,4,0,43,23,0,1] [32,4,43,23,1]