hive convert array<map<string, string>> to string - sql

I have a column in hive table which type is array<map<string, string>>, I am struggling how to convert this column into string using hql?
I found post here Convert Map<string,string> to just string in hive to convert map<string, string> to string. However, I still failed to convert array<map<string, string>> to string.

Building off of the original answer you linked to, you can first explode the array into the individual maps using posexplode to maintain a positition column. Then you can use the method from the original post, but additionally group by the position column to convert each map to a string. Then you collect your maps into the final string. Here’s an example:
with test_data as (
select stack(2,
1, array(map('m1key1', 'm1val1', 'm1key2', 'm1val2'), map('m2key1', 'm2val1', 'm2key2', 'm2val2')),
2, array(map('m1key1', 'm1val1', 'm1key2', 'm1val2'), map('m2key1', 'm2val1', 'm2key2', 'm2val2'))
) as (id, arr_col)
),
map_data as (
select id, arr_col as original_arr, m.pos as map_num, m.val as map_col
from test_data d
lateral view posexplode(arr_col) m as pos, val
),
map_strs as (
select id, original_arr, map_num,
concat('{',concat_ws(',',collect_set(concat(m.key,':', m.val))),'}') map_str
from map_data d
lateral view explode(map_col) m as key, val
group by id, original_arr, map_num
)
select id, original_arr, concat('[', concat_ws(',', collect_set(map_str)), ']') as arr_str
from map_strs
group by id, original_arr;

Related

Sum values in Athena table where column having key/value pair json

I have an Athena table with one column having JSON and key/value pairs.
Ex:
Select test_client, test_column from ABC;
test_client, test_column
john, {"d":13, "e":210}
mark, {"a":1,"b":10,"c":1}
john, {"e":100,"a":110,"b":10, "d":10}
mark, {"a":56,"c":11,"f":9, "e": 10}
And I need to sum the values corresponding to keys and return in some sort like the below: return o/p format doesn't matter. I want to sum it up.
john: d: 23, e:310, a:110, b:10
mark: a:57, b:10, c:12, f:9, e:10
It is a combination of a few useful functions in Trino:
WITH example_table AS
(SELECT 'john' as person, '{"d":13, "e":210}' as _json UNION ALL
SELECT 'mark', ' {"a":1,"b":10,"c":1}' UNION ALL
SELECT 'john', '{"e":100,"a":110,"b":10, "d":10}' UNION ALL
SELECT 'mark', '{"a":56,"c":11,"f":9, "e": 10}')
SELECT person, reduce(
array_agg(CAST(json_parse(_json) AS MAP(VARCHAR, INTEGER))),
MAP(ARRAY['a'],ARRAY[0]),
(s, x) -> map_zip_with(
s,x, (k, v1, v2) ->
if(v1 is null, 0, v1) +
if(v2 is null, 0, v2)
),
s -> s
)
FROM example_table
GROUP BY person
json_parse - Parses the string to a JSON object
CAST ... AS MAP... - Creates a MAP from the JSON object
array_agg - Aggregates the maps for each Person based on the group by
reduce - steps through the aggregated array and reduce it to a single map
map_zip_with - applies a function on each similar key in two maps
if(... is null ...) - puts 0 instead of null if the key is not present

How to get values from a string dict list in sql bigquery?

Hi I have a sql table with the following data
Outbound click
[{'action_type': 'outbound_click', 'value': '1025'}]
How do I get the just the value as this is a string in bigquery sql?
I want the output to be
Outbound click
1025
Just UNNEST the array then extract the json value to get 1025. See approach below:
with sample_data as (
select array(select "{'action_type': 'outbound_click', 'value': '1025'}") as arr_data
)
select
JSON_VALUE(json_data, '$.value') as outbound_click
from sample_data,
unnest(arr_data) as json_data
Output:

PostgreSQL: select from field with json format

Table has column, named "config" with following content:
{
"A":{
"B":[
{"name":"someName","version":"someVersion"},
{"name":"someName","version":"someVersion"}
]
}
}
The task is to select all name and version values. The output is expected selection with 2 columns: name and value.
I successfully select the content of B:
select config::json -> 'A' -> 'B' as B
from my_table;
But when I'm trying to do something like:
select config::json -> 'A' -> 'B' ->> 'name' as name,
config::json -> 'A' -> 'B' ->> 'version' as version
from my_table;
I receive selection with empty-value columns
If the array size is fixed, you just need to tell which element of the array you want to retrieve,e.g.:
SELECT config->'A'->'B'->0->>'name' AS name,
config->'A'->'B'->0->>'version' AS version
FROM my_table;
But as your array contains multiple elements, use the function jsonb_array_elements in a subquery or CTE and in the outer query parse the each element individually, e.g:
SELECT rec->>'name', rec->>'version'
FROM (SELECT jsonb_array_elements(config->'A'->'B')
FROM my_table) j (rec);
Demo: db<>fiddle
First you should use the jsonb data type instead of json, see the documentation :
In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.
Using jsonb, you can do the following :
SELECT DISTINCT ON (c) c->'name' AS name, c->'version' AS version
FROM my_table
CROSS JOIN LATERAL jsonb_path_query(config :: jsonb, '$.** ? (exists(#.name))') AS c
dbfiddle
select e.value ->> 'name', e.value ->> 'version'
from
my_table cross join json_array_elements(config::json -> 'A' -> 'B') e

Array operation on hive collect_set

I am working on hive on large dataset, I have table with colum array and the content of the colum is as follows.
["20190302Prod4"
"20190303Prod1"
"20190303Prod4"
"20190304Prod4"
"20190305Prod3"
"20190307Prod4"
"20190308Prod4"
"20190309Prod4"
"20190310Prod2"
"20190311Prod1"
"20190311Prod4"
"20190312Prod1"
"20190312Prod4"
"20190313Prod2"
"20190313Prod1"
"20190313Prod4"
"20190314Prod4"
"20190315Prod4"
"20190316Prod4"
"20190317Prod1"
"20190317Prod4"]
I need a set as per the asc date of prod e.g. I need to trim date from the array and apply collect_set to get below result.
["Prod4",
"Prod1",
"Prod3",
"Prod2"]
Explode array, remove date (digits at the beginning of the string), aggregate using collect_set:
with mydata as (--use your table instead of this
select array(
"20190302Prod4",
"20190303Prod1",
"20190303Prod4",
"20190304Prod4",
"20190305Prod3",
"20190307Prod4",
"20190308Prod4",
"20190309Prod4",
"20190310Prod2",
"20190311Prod1",
"20190311Prod4",
"20190312Prod1",
"20190312Prod4",
"20190313Prod2",
"20190313Prod1",
"20190313Prod4",
"20190314Prod4",
"20190315Prod4",
"20190316Prod4",
"20190317Prod1",
"20190317Prod4"
) myarray
)
select collect_set(regexp_extract(elem,'^\\d*(.*?)$',1)) col_name
from mydata a --Use your table instead
lateral view outer explode(myarray) s as elem;
Result:
col_name
["Prod4","Prod1","Prod3","Prod2"]
One more possible method is to concatenate array first, remove dates from the string, split to get an array. Unfortunately we still need to explode to do collect_set to remove duplicates (example using the same WITH mydata CTE):
select collect_set(elem) col_name
from mydata a --Use your table instead
lateral view outer explode(split(regexp_replace(concat_ws(',',myarray),'(^|,)\\d{8}','$1'),',')) s as elem
;

Alias nested struct columns

How can I alias field1 as index & & field 2 as value
The query gives me an error:
#standardsql
with q1 as (select 1 x, ARRAY<struct<id string, cd ARRAY<STRUCT<index STRING,value STRING>>>>
[struct('h1',[('1','a')
,('2','b')
])
,('h2',[('3','c')
,('4','d')
])] hits
)
Select * from q1
ORDER by x
Error: Array element type STRUCT<STRING, ARRAY<STRUCT<STRING, STRING>>> does not coerce to STRUCT<id STRING, cd ARRAY<STRUCT<index STRING, value STRING>>> at [5:26]
Thanks a lot for your time in responding
Cheers!
#standardsql
WITH q1 AS (
SELECT
1 AS x,
[
STRUCT('h1' AS id, [STRUCT('1' AS index, 'a' AS value), ('2','b')] AS cd),
STRUCT('h2', [STRUCT('3' AS index, 'c' AS value), ('4','d')] AS cd)
] AS hits
)
SELECT *
FROM q1
-- ORDER BY x
or below might be even more "readable"
#standardsql
WITH q1 AS (
SELECT
1 AS x,
[
STRUCT('h1' AS id, [STRUCT<index STRING, value STRING>('1', 'a'), ('2','b')] AS cd),
STRUCT('h2', [STRUCT<index STRING, value STRING>('3', 'c'), ('4','d')] AS cd)
] AS hits
)
SELECT *
FROM q1
-- ORDER BY x
When I try to simulate data in BigQuery using the Standard Version I usually try to name all variables and aliases everywhere possible. For instance, your data works if you build it like so:
with q1 as (
select 1 x, ARRAY<struct<id string, cd ARRAY<STRUCT<index STRING,value STRING>>>> [struct('h1' as id,[STRUCT('1' as index,'a' as value) ,STRUCT('2' as index ,'b' as value)] as cd), STRUCT('h2',[STRUCT('3' as index,'c' as value) ,STRUCT('4' as index,'d' as value)] as cd)] hits
)
select * from q1
order by x
Notice I've built structs and put aliases inside of them in order for this to work (if you remove the aliases and the structs it might not work, but I found that this seems to be rather intermittent. If you fully describe the variables then it works all the time).
Also as a recommendation, I try to build simulated data piece by piece. First I create the struct and test it to see if BigQuery accepts it. After the validator is green, then I proceed to add more values. If you try to build everything at once you might find this a somewhat challenging task.