How to explode array of map in Hive - hive

Consider a table my_table with the following structure:
>> describe my_table
id bigint
info_detail map<bigint,array<string>>
If I explode info_detail, I end up with arrays:
>> select explode(info_detail) as (info_id, detail)
from my_table
info_id detail
112344 ["something about 112344", "other things"]
342302 ["something about 342302"]
How to explode detail as well, so result looks something like this:
info_id detail
112344 "something about 112344"
112344 "other things"
342302 "something about 342302"

You should be able to explode the array after explode the map as follow
select info_id, d from (
select explode(info_detail) as (info_id, detail)
from my_table
) t lateral view explode(detail) detailexploded as d;

You have to explode twice, once on the map column and then on the resulting array in it.
select tbl.info_id,tbl1.details
from my_table m
lateral view explode(info_detail) tbl as info_id,detail
lateral view explode(detail) tbl1 as details

Related

BigQuery SQL: select struct as single record, not array of records

I want to create a RECORD as the outcome of a select in BigQuery standard SQL.
If I run this SQL snippet,
WITH
mock_data AS (
SELECT 'foo1' as foo, 'bar1' as bar, 'bla1' as bla UNION ALL
SELECT 'foo2' as foo, 'bar2' as bar, 'bla2' as bla
)
SELECT
*,
STRUCT(
m.foo as foo,
m.bar as bar
) as foobar
FROM mock_data m
the output of foobar is an array of records, not a single record.
How could I have the foobar column be a single record and not an array of records?
Thanks a lot in advance!
It is a single record - not an array
You can clearly see this in JSON Tab
Also you can go to "Job Information" Tab
and click on "Temporary Table" to see the schema of output
In the BigQuery IDE that I am using - it is even more visible
The UNNEST operator flattens arrays into rows, i.e. it breaks an array into several rows. STRUCT elements get broken down into multi-column records, which is what I think you require. Try something like this out:
SELECT
*,
UNNEST(
STRUCT(
m.foo as foo,
m.bar as bar
) as foobar
)

Hadoop/Hive - Split a single row into multiple rows and store to a new table

Currently, I solve my initial problem with this topic: Hadoop/Hive - Split a single row into multiple rows and store to a new table.
Does anyone have a clue how to create a new table with the grouped subs?
ID Subs
1 deep-learning, machine-learning, python
2 java, c++, python, javascript
with the code below I get the return I'm looking for but could not figure out how to save the output into a new table
use demoDB
Select id_main , topic_tag from demoTable
lateral view explode (split(topic_tag , ',')) topic_tag as topic
Thanks
Nico
In Hive, you can use create ... as select ...:
create table newtable as
select id_main, topic_tag
from demoTable
lateral view explode (split(topic_tag , ',')) topic_tag as topic
This creates a new table and initiates its content from the resultset of the query. If the new table exists already, then use insert ... select instead:
insert into newtable (id_main, topic_tag)
select id_main, topic_tag
from demoTable
lateral view explode (split(topic_tag , ',')) topic_tag as topic

SELECT on JSON operations of Postgres array column?

I have a column of type jsonb[] (a Postgres array of jsonb objects) and I'd like to perform a SELECT on rows where a criteria is met on at least one of the objects. Something like:
-- Schema would be something like
mytable (
id UUID PRIMARY KEY,
col2 jsonb[] NOT NULL
);
-- Query I'd like to run
SELECT
id,
x->>'field1' AS field1
FROM
mytable
WHERE
x->>'field2' = 'user' -- for any x in the array stored in col2
I've looked around at ANY and UNNEST but it's not totally clear how to achieve this, since you can't run unnest in a WHERE clause. I also don't know how I'd specify that I want the field1 from the matching object.
Do I need a WITH table with the values expanded to join against? And how would I achieve that and keep the id from the other column?
Thanks!
You need to unnest the array and then you can access each json value
SELECT t.id,
c.x ->> 'field1' AS field1
FROM mytable t
cross join unnest(col2) as c(x)
WHERE c.x ->> 'field2' = 'user'
This will return one row for each json value in the array.

Getting selection of structs from an array of structs in BQ

I have a table where one column is defined as:
my_column ARRAY<STRUCT<key STRING, value FLOAT64, description STRING>>
Is there some easy way how to specify list of parameters to be returned in a SELECT statement? For instance removing description, so the result column would be still an array of structs but containing only key and value.
Below is for BigQuery Standard SQL
#standardSQL
SELECT * REPLACE(
ARRAY(
SELECT AS STRUCT * EXCEPT(description)
FROM UNNEST(my_column)
) AS my_column)
FROM `project.dataset.table`
Above fully preserves schema of table and only does change in my_column field by removing description
I would just unnest and then re-aggregate your selected fields.
select array_agg(struct(m.key,m.value)) as my_new_column
from table
left join unnest(my_column) m
I found this way:
SELECT
ARRAY(SELECT AS VALUE STRUCT(key, value) FROM a.my_column) as my_new_column
FROM my_table a
No joining or unnesting needed.

How to write a select statement that outputs all key values in all rows

My hive table has a map of none or many key value pairs. I don't even know most of the keys. I want to write a select statement that outputs all key values in all rows.
something like
select t.additional_fields[*]
from mytable as t
map_keys(map<K,V>) returns array of all keys, you can explode it. The following query will return all distinct keys:
select
s.key
from
(
select m.key
from mytable t
lateral view explode(map_keys(t.additional_fields)) m as key
) s
group by s.key