explode function in hive - hive

I have the following sample data and I am trying to explode it in hive.. I used split but I know I am missing something..
I used the below query
select explode(split(col, ',')) from sample2;
and the result is this
I need it in below format
Any help over here..?

You have your data set as arrays of array and you want to explode your data at first level only, so use LATERAL VIEW explode(colname) to explode at the first level.
Below is the SELECT query with explode():
SELECT col1 FROM sample2 LATERAL VIEW EXPLODE(col) explodeVal AS col1;
output generated from your input data set as below:


extract values inside an array column in amazon athena

I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?
json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)

How to flatten nested array data into row in bigquery

I am trying to flatten inside_array or sub array of nested array data into table rows.
I am able to flatten array_data which is outside array.
Anybody have any suggestion.Thanks in advance
FROM `project.dataset.table`,
UNNEST(array_data) AS array_data_rec,
UNNEST(array_data_rec.inside_array) AS inside_array_rec
To handle "no data inside the inside_array" - use LEFT JOIN instead as in below example
FROM `project.dataset.table`,
UNNEST(array_data) AS array_data_rec
LEFT JOIN UNNEST(array_data_rec.inside_array) AS inside_array_rec
You can do following
UNNEST(array_data) as array_data,
UNNEST(array_data.inside_array) as array_data_inside_array

Hive and sql( any sql query)

I am posting question here related to hive dataware house?I have below sample data
In transctions column my data is array type. I am expecting below output.
I am guessing you want the size of the array:
select id, name, [size(transactions)][1]
from t;

Array operation on hive collect_set

I am working on hive on large dataset, I have table with colum array and the content of the colum is as follows.
I need a set as per the asc date of prod e.g. I need to trim date from the array and apply collect_set to get below result.
Explode array, remove date (digits at the beginning of the string), aggregate using collect_set:
with mydata as (--use your table instead of this
select array(
) myarray
select collect_set(regexp_extract(elem,'^\\d*(.*?)$',1)) col_name
from mydata a --Use your table instead
lateral view outer explode(myarray) s as elem;
One more possible method is to concatenate array first, remove dates from the string, split to get an array. Unfortunately we still need to explode to do collect_set to remove duplicates (example using the same WITH mydata CTE):
select collect_set(elem) col_name
from mydata a --Use your table instead
lateral view outer explode(split(regexp_replace(concat_ws(',',myarray),'(^|,)\\d{8}','$1'),',')) s as elem

hive explode list from json-string

I have table with jsons:
json_body string
Json has structure:
{ obj1: { fields ... }, obj2: [array] }
I want to select all elements from array, but I can't.
For example, I can get all fields from first object:
SELECT f.fields...
SELECT q1.obj1, q1.obj2
LATERAL VIEW JSON_TUPLE(jt.json_body, 'obj1', 'obj2') q1 AS obj1, obj2
) as json_table2
LATERAL VIEW JSON_TUPLE(TABLE_JSON.obj1, 'fields...') f AS fields...;
But with array this method doesnt work.
I've tried to use
LATERAL VIEW explode(json_table2.obj2) adTable AS arr;
hive explode doc
But obj2 - string with array. How to transform string-json to array and explode it?
The json_split UDF from Brickhouse ( http://github.com/klout/brickhouse ) can convert a JSON array to a Hive List, and then you can explode that.
See http://mail-archives.apache.org/mod_mbox/hive-user/201406.mbox/%3CCAO78EnLgSrrUY3Ad_ZWS9zWNKLQRwS9jXrqEE869FhUNiWgCXA#mail.gmail.com%3E and https://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/
You can consider using Hive-JSON SerDe to read the data from JSON.
Refer: https://github.com/rcongiu/Hive-JSON-Serde
This may not be an optimal solution but can help unblock you. For a JSON object which looks like below
this query can help you obtain all items of array into individual columns given that the size of the array is constant across all rows.
SELECT split(results,",")[0] AS arrayItem1,
split(results,",")[1] AS arrayItem2,
regexp_replace(split(results,",")[2], "[\\]|}]", "") AS arrayItem3
(SELECT split(translate(get_json_object(TABLE_JSON.json_body,'$.obj2'), '"\\[|]|\""',''), "},") AS r
FROM TABLE_JSON) t1 LATERAL VIEW explode(r) rr AS results
It produces the result which looks like this
arrayitem1| arrayitem2| arrayitem3
a1 | a2 | a3
You can scale it to any number of array size on a condition that size is constant across the table.