Sum values in Athena table where column having key/value pair json - sql

I have an Athena table with one column having JSON and key/value pairs.
Ex:
Select test_client, test_column from ABC;
test_client, test_column
john, {"d":13, "e":210}
mark, {"a":1,"b":10,"c":1}
john, {"e":100,"a":110,"b":10, "d":10}
mark, {"a":56,"c":11,"f":9, "e": 10}
And I need to sum the values corresponding to keys and return in some sort like the below: return o/p format doesn't matter. I want to sum it up.
john: d: 23, e:310, a:110, b:10
mark: a:57, b:10, c:12, f:9, e:10

It is a combination of a few useful functions in Trino:
WITH example_table AS
(SELECT 'john' as person, '{"d":13, "e":210}' as _json UNION ALL
SELECT 'mark', ' {"a":1,"b":10,"c":1}' UNION ALL
SELECT 'john', '{"e":100,"a":110,"b":10, "d":10}' UNION ALL
SELECT 'mark', '{"a":56,"c":11,"f":9, "e": 10}')
SELECT person, reduce(
array_agg(CAST(json_parse(_json) AS MAP(VARCHAR, INTEGER))),
MAP(ARRAY['a'],ARRAY[0]),
(s, x) -> map_zip_with(
s,x, (k, v1, v2) ->
if(v1 is null, 0, v1) +
if(v2 is null, 0, v2)
),
s -> s
)
FROM example_table
GROUP BY person
json_parse - Parses the string to a JSON object
CAST ... AS MAP... - Creates a MAP from the JSON object
array_agg - Aggregates the maps for each Person based on the group by
reduce - steps through the aggregated array and reduce it to a single map
map_zip_with - applies a function on each similar key in two maps
if(... is null ...) - puts 0 instead of null if the key is not present

Related

sql json extract scalar with different keys in json

have a 'dateinfo' column that I extract from a table in Athena, which is a json like the ones you can see below.
[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]
[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]
As you can see inside the json there are different keys. I am interested in extracting the values for 'pickuprequesteddate' and 'deliveryrequesteddate' if they are in the json array. That is, for the examples above I would like to obtain as a result a column with the following values:
[2022-08-09T00:00:00,deliveryrequesteddate":"2022-08-09T00:00:00]
[null, null, null, null]
I know how to extract the values of each key but separately, using
TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.pickuprequesteddate')) as pickup,
TRANSFORM(CAST(stopinfo AS ARRAY<JSON>), x -> JSON_EXTRACT_SCALAR(x, '$.dateinfo.deliveryrequesteddate')) as delivery
However, this gives me two separate columns.
How could I extract the values the way I want?
If only one is present in the object you can use coalesce:
WITH dataset (stopinfo) AS (
VALUES (JSON '[{"pickuprequesteddate":"2022-08-09T00:00:00"}, {"deliveryrequesteddate":"2022-08-09T00:00:00"}]'),
(JSON '[{"departureestimateddate":"2022-08-25T00:00:00"}, {"arrivalestimateddate":"2022-10-07T00:00:00"}, {}, {}]')
)
-- query
select TRANSFORM(
CAST(stopinfo AS ARRAY(JSON)),
x-> coalesce(
JSON_EXTRACT_SCALAR(x, '$.pickuprequesteddate'),
JSON_EXTRACT_SCALAR(x, '$.deliveryrequesteddate')
)
)
from dataset;
Output:
_col0
[2022-08-09T00:00:00, 2022-08-09T00:00:00]
[NULL, NULL, NULL, NULL]

hive convert array<map<string, string>> to string

I have a column in hive table which type is array<map<string, string>>, I am struggling how to convert this column into string using hql?
I found post here Convert Map<string,string> to just string in hive to convert map<string, string> to string. However, I still failed to convert array<map<string, string>> to string.
Building off of the original answer you linked to, you can first explode the array into the individual maps using posexplode to maintain a positition column. Then you can use the method from the original post, but additionally group by the position column to convert each map to a string. Then you collect your maps into the final string. Here’s an example:
with test_data as (
select stack(2,
1, array(map('m1key1', 'm1val1', 'm1key2', 'm1val2'), map('m2key1', 'm2val1', 'm2key2', 'm2val2')),
2, array(map('m1key1', 'm1val1', 'm1key2', 'm1val2'), map('m2key1', 'm2val1', 'm2key2', 'm2val2'))
) as (id, arr_col)
),
map_data as (
select id, arr_col as original_arr, m.pos as map_num, m.val as map_col
from test_data d
lateral view posexplode(arr_col) m as pos, val
),
map_strs as (
select id, original_arr, map_num,
concat('{',concat_ws(',',collect_set(concat(m.key,':', m.val))),'}') map_str
from map_data d
lateral view explode(map_col) m as key, val
group by id, original_arr, map_num
)
select id, original_arr, concat('[', concat_ws(',', collect_set(map_str)), ']') as arr_str
from map_strs
group by id, original_arr;

Presto Build JSON Array with Different Data Types

My goal is to get a JSON array of varchar name, varchar age, and a LIST of books_read (array(varchar)) for EACH id
books_read has following format: ["book1", "book2"]
Example Given Table:
id
name
age
books_read
1
John
21
["book1", "book2"]
Expected Output:
id
info
1
[{"name":"John", "age":"21", "books_read":["book1", "book2"]}]
When I use the following query I get an error (All ARRAY elements must be the same type: row(varchar, varchar)) because books_read is not of type varchar like name and age.
select id,
array_agg(CAST(MAP_FROM_ENTRIES(ARRAY[
('name', name),
('age', age),
('books_read', books)
]) AS JSON)) AS info
from tbl
group by id
Is there an alternative method that allows multiple types as input to the array?
I've also tried doing MAP_CONCAT(MAP_AGG(name), MAP_AGG(age), MULTIMAP_AGG(books_read)) but it also gives me an issue with the books_read column: Unexpected parameters for the function map_concat
Cast data to json before placing it into the map:
-- sample data
WITH dataset (id, name, age, books_read) AS (
VALUES (1, 'John', 21, array['book1', 'book2'])
)
-- query
select id,
cast(
map(
array [ 'name', 'age', 'books_read' ],
array [ cast(name as json), cast(age as json), cast(books_read as json) ]
) as json
) info
from dataset
Output:
id
info
1
{"age":21,"books_read":["book1","book2"],"name":"John"}

How to select single field from array of json objects?

I have a JSONB column with values in following JSON structure
{
"a": "value1", "b": [{"b1": "value2", "b3": "value4"}, {"b1": "value5", "b3": "value6"}]
}
I need to select only b1 field in the result. So expected result would be
["value2", "value5"]
I can select complete array using query
select columnname->>'b' from tablename
step-by-step demo:db<>fiddle
SELECT
jsonb_agg(elements -> 'b1') -- 2
FROM mytable,
jsonb_array_elements(mydata -> 'b') as elements -- 1
a) get the JSON array from the b element (b) extract the array elements into one row each
a) get the b1 values from the array elements (b) reaggregate these values into a new JSON array
If you are using Postgres 12 or later, you an use a JSON path query:
select jsonb_path_query_array(the_column, '$.b[*].b1')
from the_table;

How to make a IN query with hstore?

I have a field (content) in a table containing keys and values (hstore) like this :
content: {"price"=>"15.2", "quantity"=>"3", "product_id"=>"27", "category_id"=>"2", "manufacturer_id"=>"D"}
I can easily select product having ONE category_id with :
SELECT * FROM table WHERE "content #> 'category_id=>27'"
I want to select all lines having (for example) category_id IN a list of value.
In classic SQL it would be something like this :
SELECT * FROM TABLE WHERE category_id IN (27, 28, 29, ....)
Thanks you in advance
De-reference the key and test it with IN as normal.
CREATE TABLE hstoredemo(content hstore not null);
INSERT INTO hstoredemo(content) VALUES
('"price"=>"15.2", "quantity"=>"3", "product_id"=>"27", "category_id"=>"2", "manufacturer_id"=>"D"');
then one of these. The first is cleaner, as it casts the extracted value to integer rather than doing string compares on numbers.
SELECT *
FROM hstoredemo
WHERE (content -> 'category_id')::integer IN (2, 27, 28, 29);
SELECT *
FROM hstoredemo
WHERE content -> 'category_id' IN ('2', '27', '28', '29');
If you had to test more complex hstore contains operations, say with multiple keys, you could use #> ANY, e.g.
SELECT *
FROM hstoredemo
WHERE
content #> ANY(
ARRAY[
'"category_id"=>"27","product_id"=>"27"',
'"category_id"=>"2","product_id"=>"27"'
]::hstore[]
);
but it's not pretty, and it'll be a lot slower, so don't do this unless you have to.
category_ids = ["27", "28", "29"]
Tablename.where("category_id IN(?)", category_ids)