pivot from multiple rows to multiple columns in hive - hive

I have a hive table like following
(id:int, vals: Map<String, int> , type: string)
id, vals, type
1, {"foo": 1}, "a"
1, {"foo": 2}, "b"
2, {"foo": 3}, "a"
2, {"foo": 1}, "b"
Now, there are only two types
I want to change this to following schema
id, type_a_vals, type_b_vals
1, {"foo", 1}, {"foo": 2}
2, {"foo": 3}, {"foo": 1}
and if any "type" is missing, it can be null?

An easy way keeping in mind the map column would be a self join.
select ta.id,ta.vals,tb.vals
from (select * from tbl where type = 'a') ta
full join (select * from tbl where type = 'b') tb on ta.id = tb.id
You can use conditional aggregation to solve questions like these as below. However, doing so on a map column would produce an error.
select id
,max(case when type = 'a' then vals end) as type_a_vals
,max(case when type = 'b' then vals end) as type_a_vals
from tbl
group by id

Related

Aggregate jsonb map with string array as value in postgresql

I have postgresql table with a jsonb column containing maps with strings as keys and string arrays as values. I want to aggregate all the maps into a single jsonb map. There should be no duplicate values in string array. How can I do this in postgres.
Eg:
Input: {"a": ["1", "2"]}, {"a": ["2", "3"], "b": ["5"]}
Output: {"a": ["1", "2", "3"], "b": ["5"]}
I tried '||' operator but it overwrites values if there as same keys in maps.
Eg:
Input: SELECT '{"a": ["1", "2"]}'::jsonb || '{"a": ["3"], "b": ["5"]}'::jsonb;
Output: {"a": ["3"], "b": ["5"]}
Using jsonb_object_agg with a series of cross joins:
select jsonb_object_agg(t.key, t.a) from (
select v.key, jsonb_agg(distinct v1.value) a from objects o
cross join jsonb_each(o.tags) v
cross join jsonb_array_elements(v.value) v1
group by v.key) t
See fiddle.
You can use the jsonb_object_agg aggregate function to achieve this. The jsonb_object_agg function takes a set of key-value pairs and returns a JSONB object. You can use this function to aggregate all the maps into a single JSONB map by concatenating all the maps as key-value pairs. Here is an example query:
SELECT jsonb_object_agg(key, value)
FROM (
SELECT key, jsonb_agg(value) AS value
FROM (
SELECT key, value
FROM (
SELECT 'a' AS key, '["1", "2"]'::jsonb AS value
UNION ALL
SELECT 'a' AS key, '["3"]'::jsonb AS value
UNION ALL
SELECT 'b' AS key, '["5"]'::jsonb AS value
) subq
) subq2
GROUP BY key
) subq3;
This will give you the following result:
{"a": ["1", "2", "3"], "b": ["5"]}

Key value table to json in BigQuery

Hey all,
I have a table that looks like this:
row
key
val
1
a
100
2
b
200
3
c
"apple
4
d
{}
I want to convert it into JSON:
{
"a": 100,
"b": 200,
"c": "apple",
"d": {}
}
Note: the number of lines can change so this is only an example
Thx in advanced !
With string manipulation,
WITH sample_table AS (
SELECT 'a' key, '100' value UNION ALL
SELECT 'b', '200' UNION ALL
SELECT 'c', '"apple"' UNION ALL
SELECT 'd', '{}'
)
SELECT '{' || STRING_AGG(FORMAT('"%s": %s', key, value)) || '}' json
FROM sample_table;
You can get following result similar to your expected output.

PostgreSQL get any value from jsonb object

I want to get the value of either key 'a' or 'b' if either one exists. If neither exists, I want the value of any key in the map.
Example:
'{"a": "aaa", "b": "bbbb", "c": "cccc"}' should return aaa.
'{"b": "bbbb", "c": "cccc"}' should return bbb.
'{"c": "cccc"}' should return cccc.
Currently I'm doing it like this:
SELECT COALESCE(o ->> 'a', o ->> 'b', o->> 'c') FROM...
The problem is that I don't really want to name key 'c' explicitly since there are objects that can have any key.
So how do I achieve the desired effect of "Get value of either 'a' or 'b' if either exists. If neither exists, grab anything that exists."?
I am using postgres 9.6.
maybe too long:
t=# with c(j) as (values('{"a": "aaa", "b": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
t-# select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
aaa
(1 row)
and with no a,b keys:
t=# with c(j) as (values('{"a1": "aaa", "b1": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
cccc
(1 row)
Idea is to extract all keys with jsonb_object_keys and get the first "random"(because I don't order by anything) (limit 1) and then use it for last coalesce invariant

Idiomatic equivalent to map structure

My analytics involves the need to aggregate rows and to store the number of different values occurrences of a field someField in all the rows.
Sample data structure
[someField, someKey]
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
Example:
[someField: a, someKey: 1],
[someField: a, someKey: 1],
[someField: b, someKey: 1],
[someField: c, someKey: 2],
[someField: d, someKey: 2]
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
Does it work for you?
WITH data AS (
select 'a' someField, 1 someKey UNION all
select 'a', 1 UNION ALL
select 'b', 1 UNION ALL
select 'c', 2 UNION ALL
select 'd', 2)
SELECT
someKey,
ARRAY_AGG(STRUCT(someField, freq)) fields
FROM(
SELECT
someField,
someKey,
COUNT(someField) freq
FROM data
GROUP BY 1, 2
)
GROUP BY 1
Results:
It won't give exactly the results you are looking for, but it might work to receive the same queries your previous result would. As you said, for each key you can retrieve how many times (column freq) someField happened.
I've been looking for a way on how to aggregate structs and couldn't find one. But retrieving the results as an ARRAY of STRUCTS turned out to be quite straightforward.
There's probably a smarter way to do this (and get it in the format you want e.g. using an Array for the 2nd column), but this might be enough for you:
with sample as (
select 'a' as someField, 1 as someKey UNION all
select 'a' as someField, 1 as someKey UNION ALL
select 'b' as someField, 1 as someKey UNION ALL
select 'c' as someField, 2 as someKey UNION ALL
select 'd' as someField, 2 as someKey)
SELECT
someKey,
SUM(IF(someField = 'a', 1, 0)) AS a,
SUM(IF(someField = 'b', 1, 0)) AS b,
SUM(IF(someField = 'c', 1, 0)) AS c,
SUM(IF(someField = 'd', 1, 0)) AS d
FROM
sample
GROUP BY
someKey order by somekey asc
Results:
someKey a b c d
---------------------
1 2 1 0 0
2 0 0 1 1
This is well used technique in BigQuery (see here).
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
#standardSQL
SELECT
someKey,
someField,
COUNT(someField) freq
FROM yourTable
GROUP BY 1, 2
-- ORDER BY someKey, someField
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
This is different from what you expressed in words - it is called pivoting and based on your comment - The a, b, c, and d keys are potentially infinite - most likely is not what you need. At the same time - pivoting is easily doable too (if you have some finite number of field values) and you can find plenty of related posts

Order by a value of an arbitrary attribute in hstore

I have records like these:
id, hstore_col
1, {a: 1, b: 2}
2, {c: 3, d: 4}
3, {e: 1, f: 5}
How to order them by a maximum/minimum value inside hstore for any attribute?
The result should be like this(order by lowest):
id, hstore_col
1, {a: 1, b: 2}
3, {e: 1, f: 5}
2, {c: 3, d: 4}
I know, I can only order them by specific attribute like this: my_table.hstore_fields -> 'a', but it doesn't work for my issue.
Convert to an array using avals and cast the resulting array from text to ints. Then sort the array and order the results by the 1st element of the sorted array.
select * from mytable
order by (sort(avals(attributes)::int[]))[1]
http://sqlfiddle.com/#!15/84f31/5
If you know all of the elements, you can just piece them all together like this:
ORDER BY greatest(my_table.hstore_fields -> 'a', my_table.hstore_fields -> 'b',my_table.hstore_fields -> 'c', my_table.hstore_fields -> 'd', my_table.hstore_fields -> 'e', my_table.hstore_fields -> 'f')
or
ORDER BY least(my_table.hstore_fields -> 'a', my_table.hstore_fields -> 'b',my_table.hstore_fields -> 'c', my_table.hstore_fields -> 'd', my_table.hstore_fields -> 'e', my_table.hstore_fields -> 'f')
By using svals you can create an exploded version of the hstore_col's values - then you can sort on those values and get the first entry from each of them. There is doubtlessly a much more efficient way to do this, but here's a first pass:
select my_table.id, my_table.hstore_col
from my_table
join (
select id, svals(hstore_col) as hstore_val
from my_table
) exploded_table
on my_table.id = exploded_table.id
group by my_table.id, my_table.hstore_col
order by my_table.id, exploded_table.hstore_val desc