explode function in hive - hive

I have the following sample data and I am trying to explode it in hive.. I used split but I know I am missing something..
["[[-80.742426,35.23248],[-80.740424,35.23184],[-80.739583,35.231562],[-80.735935,35.23041],[-80.728624,35.228069],[-80.727753,35.227836],[-80.727294,35.227741],[-80.726762,35.227647],[-80.726321,35.227594],[-80.725687,35.227544],[-80.725134,35.227535],[-80.721502,35.227615],[-80.691298,35.216202],[-80.688009,35.215396],[-80.686516,35.215016],[-80.598433,35.234307]]"]
I used the below query
select explode(split(col, ',')) from sample2;
and the result is this
["[[-80.742426
35.23248]
[-80.740424
35.23184]
[-80.739583
35.231562]
[-80.735935
35.23041]
[-80.728624
35.228069]
[-80.727753
35.227836]
[-80.71143
35.227831]
[-80.711007
35.227795]
[-80.710638
35.227741]
[-80.673884
35.21014]
[-80.672358
35.209481]
[-80.672036
35.209356]
[-80.671686
35.209234]
[-80.67124
35.209099]
[-80.670815
35.209006]
[-80.670267
35.208906]
[-80.669612
35.208833]
[-80.668924
35.208806]
[-80.598433
35.234307]]"]
I need it in below format
[-80.742426,35.23248]
[-80.740424,35.23184]
[-80.739583,35.231562]
[-80.735935,35.23041]
[-80.728624,35.228069]
[-80.727753,35.227836]
[-80.727294,35.227741]
[-80.726762,35.227647]
[-80.726321,35.227594]
[-80.725687,35.227544]
[-80.725134,35.227535]
[-80.721502,35.227615]
[-80.691298,35.216202]
[-80.688009,35.215396]
[-80.686516,35.215016]
[-80.684281,35.214466]
[-80.68396,35.214395]
[-80.683375,35.214231]
[-80.682908,35.214079]
[-80.682444,35.213905]
[-80.682045,35.213733]
[-80.68062,35.213112]
[-80.678078,35.211983]
[-80.676836,35.211447]
[-80.598433,35.234307]
Any help over here..?

You have your data set as arrays of array and you want to explode your data at first level only, so use LATERAL VIEW explode(colname) to explode at the first level.
Below is the SELECT query with explode():
SELECT col1 FROM sample2 LATERAL VIEW EXPLODE(col) explodeVal AS col1;
output generated from your input data set as below:
[-80.742426,35.23248]
[-80.740424,35.23184]
[-80.739583,35.231562]
[-80.735935,35.23041]
[-80.728624,35.228069]
[-80.727753,35.227836]
[-80.727294,35.227741]
[-80.726762,35.227647]
[-80.726321,35.227594]
[-80.725687,35.227544]
[-80.725134,35.227535]
[-80.721502,35.227615]
[-80.691298,35.216202]
[-80.688009,35.215396]
[-80.686516,35.215016]
[-80.684281,35.214466]
[-80.68396,35.214395]
[-80.683375,35.214231]
[-80.682908,35.214079]
[-80.682444,35.213905]
[-80.682045,35.213733]
[-80.68062,35.213112]
[-80.678078,35.211983]
[-80.676836,35.211447]
[-80.598433,35.234307]

Related

extract values inside an array column in amazon athena

I have a table in athena aws where the column 'metadata_stopinfo' has the structure that you can see in the image.
I am trying to extract values that are inside that array, however when I try
SELECT
"json_extract_scalar"(metadata_stopinfo, '$.city')
FROM "table"
I have the following problem
SYNTAX_ERROR: line 2:5: Unexpected parameters (array(row("address" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"carrierreference" varchar,"contacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"mobilephone" varchar,"name" varchar,"officephone" varchar,"userid" varchar)),"containerinfo" array(row("containerid" varchar,"containeridtype" varchar,"equipmentcode" varchar,"equipmenttype" varchar)),"conveyancelinenumber" varchar,"conveyancetype" varchar,"conveyancetypeoriginal" varchar,"dateinfo" row("arrivalestimateddate" varchar,"arrivalestimateddateend" varchar,"arrivalestimatedendoffset" varchar,"arrivalestimatedoffset" varchar,"arrivalrequesteddate" varchar,"deliveryestimateddate" varchar,"deliveryestimateddateend" varchar,"deliveryestimatedendoffset" varchar,"deliveryestimatedoffset" varchar,"deliveryrequesteddate" varchar,"deliveryrequesteddateend" varchar,"deliveryrequestedendoffset" varchar,"deliveryrequestedoffset" varchar,"departureestimateddate" varchar,"departureestimateddateend" varchar,"departureestimatedendoffset" varchar,"departureestimatedoffset" varchar,"departurerequesteddate" varchar,"pickuprequesteddate" varchar,"pickuprequesteddateend" varchar,"pickuprequestedendoffset" varchar,"pickuprequestedoffset" varchar,"pickupestimateddate" varchar,"pickupestimateddateend" varchar,"pickupestimatedendoffset" varchar,"pickupestimatedoffset" varchar),"deliverynotenumber" varchar,"instructions" array(row("customerspecificsubtype" varchar,"header" boolean,"instructionsubtype" varchar,"instructiontype" varchar,"text" varchar)),"locationid" varchar,"partnercarrieraddress" row("addressline" varchar,"city" varchar,"countrycode" varchar,"countrycodeoriginal" varchar,"state" varchar,"zipcode" varchar),"partnercarriercontacts" array(row("contacttype" varchar,"email" varchar,"fax" varchar,"name" varchar,"officephone" varchar)),"partnercarrierid" varchar,"partnercarriername" varchar,"partnerid" varchar,"partnername" varchar,"partnertimezone" varchar,"partnertype" varchar,"productquantity" row("number" double,"originalunitofmeasure" varchar,"quantitytype" varchar,"unitofmeasure" varchar),"sequencenumber" bigint,"shipmentidentifier" varchar,"stoptype" varchar,"transportinfo" row("description" varchar,"transportcode" varchar,"transportoriginalcode" varchar),"vesselinfo" row("lloydsnumber" varchar,"shipsradiocallnumber" varchar,"vesselname" varchar,"vesselnumber" varchar,"voyagetripnumber" varchar))), varchar(6)) for function json_extract_scalar. Expected: json_extract_scalar(varchar(x), JsonPath) , json_extract_scalar(json, JsonPath)
My question is, how can i extract values inside de column ?
json_extract_scalar unsurprisingly works with json (note that even if yur data was in json format, json_extract_scalar(metadata_stopinfo, '$.city') still would not have worked cause your data is an array), while your column contains array's of row's, so you need to work with it correspondingly. For example you can use indexes to access elements in array (in presto array indexes start from 1):
SELECT
metadata_stopinfo[1] r
FROM "table"
And then access the fields:
The fields may be of any SQL type, and are accessed with field reference operator .
SELECT
metadata_stopinfo[1].city city
FROM "table"
Also you can flatten the array with unnest:
SELECT r.city
FROM "table",
unnest(metadata_stopinfo) as t(r)

How to flatten nested array data into row in bigquery

I am trying to flatten inside_array or sub array of nested array data into table rows.
I am able to flatten array_data which is outside array.
Anybody have any suggestion.Thanks in advance
#standardSQL
SELECT ...
FROM `project.dataset.table`,
UNNEST(array_data) AS array_data_rec,
UNNEST(array_data_rec.inside_array) AS inside_array_rec
To handle "no data inside the inside_array" - use LEFT JOIN instead as in below example
#standardSQL
SELECT ...
FROM `project.dataset.table`,
UNNEST(array_data) AS array_data_rec
LEFT JOIN UNNEST(array_data_rec.inside_array) AS inside_array_rec
You can do following
...
FROM
AA.nested_array,
UNNEST(array_data) as array_data,
UNNEST(array_data.inside_array) as array_data_inside_array

Hive and sql( any sql query)

I am posting question here related to hive dataware house?I have below sample data
Id,name,transctions
1,apple,(1,1,1,1)
2,mango,(1,1,1)
3,kiwi,(1,1,1)
...........
.......
In transctions column my data is array type. I am expecting below output.
Id,name,transtion
1,apple,4
2,mango,3
3,kiwi,3
I am guessing you want the size of the array:
select id, name, [size(transactions)][1]
from t;

Array operation on hive collect_set

I am working on hive on large dataset, I have table with colum array and the content of the colum is as follows.
["20190302Prod4"
"20190303Prod1"
"20190303Prod4"
"20190304Prod4"
"20190305Prod3"
"20190307Prod4"
"20190308Prod4"
"20190309Prod4"
"20190310Prod2"
"20190311Prod1"
"20190311Prod4"
"20190312Prod1"
"20190312Prod4"
"20190313Prod2"
"20190313Prod1"
"20190313Prod4"
"20190314Prod4"
"20190315Prod4"
"20190316Prod4"
"20190317Prod1"
"20190317Prod4"]
I need a set as per the asc date of prod e.g. I need to trim date from the array and apply collect_set to get below result.
["Prod4",
"Prod1",
"Prod3",
"Prod2"]
Explode array, remove date (digits at the beginning of the string), aggregate using collect_set:
with mydata as (--use your table instead of this
select array(
"20190302Prod4",
"20190303Prod1",
"20190303Prod4",
"20190304Prod4",
"20190305Prod3",
"20190307Prod4",
"20190308Prod4",
"20190309Prod4",
"20190310Prod2",
"20190311Prod1",
"20190311Prod4",
"20190312Prod1",
"20190312Prod4",
"20190313Prod2",
"20190313Prod1",
"20190313Prod4",
"20190314Prod4",
"20190315Prod4",
"20190316Prod4",
"20190317Prod1",
"20190317Prod4"
) myarray
)
select collect_set(regexp_extract(elem,'^\\d*(.*?)$',1)) col_name
from mydata a --Use your table instead
lateral view outer explode(myarray) s as elem;
Result:
col_name
["Prod4","Prod1","Prod3","Prod2"]
One more possible method is to concatenate array first, remove dates from the string, split to get an array. Unfortunately we still need to explode to do collect_set to remove duplicates (example using the same WITH mydata CTE):
select collect_set(elem) col_name
from mydata a --Use your table instead
lateral view outer explode(split(regexp_replace(concat_ws(',',myarray),'(^|,)\\d{8}','$1'),',')) s as elem
;

hive explode list from json-string

I have table with jsons:
CREATE TABLE TABLE_JSON (
json_body string
);
Json has structure:
{ obj1: { fields ... }, obj2: [array] }
I want to select all elements from array, but I can't.
For example, I can get all fields from first object:
SELECT f.fields...
FROM (
SELECT q1.obj1, q1.obj2
FROM TABLE_JSON jt
LATERAL VIEW JSON_TUPLE(jt.json_body, 'obj1', 'obj2') q1 AS obj1, obj2
) as json_table2
LATERAL VIEW JSON_TUPLE(TABLE_JSON.obj1, 'fields...') f AS fields...;
But with array this method doesnt work.
I've tried to use
...
LATERAL VIEW explode(json_table2.obj2) adTable AS arr;
hive explode doc
But obj2 - string with array. How to transform string-json to array and explode it?
The json_split UDF from Brickhouse ( http://github.com/klout/brickhouse ) can convert a JSON array to a Hive List, and then you can explode that.
See http://mail-archives.apache.org/mod_mbox/hive-user/201406.mbox/%3CCAO78EnLgSrrUY3Ad_ZWS9zWNKLQRwS9jXrqEE869FhUNiWgCXA#mail.gmail.com%3E and https://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/
You can consider using Hive-JSON SerDe to read the data from JSON.
Refer: https://github.com/rcongiu/Hive-JSON-Serde
This may not be an optimal solution but can help unblock you. For a JSON object which looks like below
'{"obj1":"field1","obj2":["a1","a2","a3"]}'
this query can help you obtain all items of array into individual columns given that the size of the array is constant across all rows.
SELECT split(results,",")[0] AS arrayItem1,
split(results,",")[1] AS arrayItem2,
regexp_replace(split(results,",")[2], "[\\]|}]", "") AS arrayItem3
FROM
(SELECT split(translate(get_json_object(TABLE_JSON.json_body,'$.obj2'), '"\\[|]|\""',''), "},") AS r
FROM TABLE_JSON) t1 LATERAL VIEW explode(r) rr AS results
It produces the result which looks like this
arrayitem1| arrayitem2| arrayitem3
a1 | a2 | a3
You can scale it to any number of array size on a condition that size is constant across the table.