aws athena convert array<array<string>> to table

aws athena convert array<array<string>> to table - sql

I have an s3 bucket that got a lot of files with this content:
{time:123456, state:{{1,2,3,4},{4,5,6,7}...}
and after I use Athena read it, the result is the dataset with 2 colums: the first one is time int and the second is array<array<string>>, is it possible to convert with athena sql this table to:
select time, col1,col2,col3,col4
from table
when the col1...4 is the names of the columns on the array?

Use CROSS JOIN UNNEST to unnest upper array:
select a.time,
state_array[1] as col1,
state_array[2] as col2,
state_array[3] as col3,
state_array[4] as col4
from my_table a
CROSS JOIN UNNEST(state) as t(state_array)

Related

Hive- how to get the derive column names and use it in the same query?

I am trying to run the below query :
select [every_column],count(*) from <table> group by [every_column] having count(*) >1
But column names should be derived in the same query. I believe show columns in would list down the column names separated by new line. But I need to use it in one query to retrieve the result.
Appreciate any help in this regard.

You can use shell sed to search the newlines(\n) and replace with comma(,).
Assign the comma separated column names to a hive variable, use the variable name in your hive query.
References for sed and set hive variables

Have you thought of using subqueries or even CTE? Maybe this helps you find your answer:
select outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6, count(*) as cnt
from (
select <some logic> as col1,
<some logic> as col2,
<some logic> as col3,
<some logic> as col4,
<some logic> as col5,
<some logic> as col6
from innerTable
)outer
group by outer.col1,
outer.col2,
outer.col3,
outer.col4,
outer.col5,
outer.col6

cast array<struct<key:string,value:array<string>>> into map<string,array<string>>

I have a table like
name string
one_key_value array<struct<key:string,value:array<string>>
two_key_value array<struct<key:string,value:array<string>>
and want to convert it to
name string
one_key_value map<string,array<string>>
two_key_value map<string,array<string>>
In presto I use
SELECT name,
MAP(TRANSFORM(one_key_value, kv -> kv.key), TRANSFORM(one_key_value, kv -> kv.value)) AS one_key_value,
MAP(TRANSFORM(two_key_value, kv -> kv.key), TRANSFORM(two_key_value, kv -> kv.value)) AS two_key_value
FROM table_a;
In hive I use
SELECT name,
map(k1,v1) AS one_key_value,
map(k2,v2) AS one_key_value
FROM table_a
lateral view inline(one_key_value) t1 as k1,v1
lateral view inline(two_key_value) t2 as k2,v2;
The count in hive is a lot higher as compared to presto, I guess it's because one key has lot of values and they are being exploded to different rows in hive. Is there a way to make the hive query similar to the presto query.

how to convert columns into rows in HIVE

I have a table in HIVE which looks like this
cust_1,month_1, f1,f2,f3
cust_1,month_2, f2,f3,f4
cust_2,month_1, f1,f5,f4
I would like to convert it in the following format
cust_1,month_1, f1
cust_1,month_1, f2
cust_1,month_1, f3
....
How that is possible in HIVE?

you can use this sql:
select col1, col2, value
from orig_table lateral view explode(array(col3,col4,col5)) orig_table_alias as value;

HIVE- Divide single row in to multiple rows

I have to prepare the query from source table to target table. table structure are shown in the image. Can any one help on this.http://i.stack.imgur.com/wnUuZ.png
[Tables image]

Hive's stack function should work here.
SELECT stack(2,
col1, col2, col3, '',
col1, col2, '', col4
) AS (newCol1, newCol2, newCol3, newCol4)
FROM source;
Basically, stack generates N rows for each row in the source, and you define each of these new rows.

How to append distinct records from one table to another

How do I append only distinct records from a master table to another table, when the master may have duplicates. Example - I only want the distinct records in the smaller table but I need to insert/append records to what I already have in the smaller table.

Ignoring any concurency issues:
insert into smaller (field, ... )
select distinct field, ... from bigger
except
select field, ... from smaller;
You can also rephrase it as a join:
insert into smaller (field, ... )
select distinct b.field, ...
from bigger b
left join smaller s on s.key = b.key
where s.key is NULL

If you don't like NOT EXISTS and EXCEPT/MINUS (cute, Remus!), you have also LEFT JOIN solution:
INSERT INTO smaller(a,b)
SELECT DISTINCT master.a, master.b FROM master
LEFT JOIN smaller ON smaller.a=master.a AND smaller.b=master.b
WHERE smaller.pkey IS NULL

You don't say the scale of the problem so I'll mention something I recently helped a friend with.
He works for an insurance company that provides supplemental Dental and Vision benefits management for other insurance companies. When they get a new client they also get a new database that can have 10's of millions of records. They wanted to identify all possible dupes with the data they already had in a master database of 100's of millions of records.
The solution we came up with was to identify two distinct combinations of field values (normalized in various ways) that would indicate a high probability of a dupe. We then created a new table containing MD5 hashes of the combos plus the id of the master record they applied to. The MD5 columns were indexed. All new records would have their combo hashes computed and if either of them had a collision with the master the new record would be kicked out to an exceptions file for some human to deal with it.
The speed of this surprised the hell out of us (in a nice way) and it has had a very acceptable false-positive rate.

You could use the distinct keyword to filter out duplicates:
insert into AnotherTable
(col1, col2, col3)
select distinct col1, col2, col3
from MasterTable

Based on Microsoft SQL Server and its Transact-SQL. Untested as always and the target_table has the same amount of rows as the source table (otherwise use columnnames between INSERT INTO and SELECT
INSERT INTO target_table
SELECT DISTINCT row1, row2
FROM source_table
WHERE NOT EXISTS(
SELECT row1, row2
FROM target_table)

Something like this would work for SQL Server (you don't mention what RDBMS you're using):
INSERT INTO table (col1, col2, col3)
SELECT DISTINCT t2.a, t2.b, t2.c
FROM table2 AS t2
WHERE NOT EXISTS (
SELECT 1
FROM table
WHERE table.col1 = t2.a AND table.col2 = t2.b AND table.col3 = t2.c
)
Tune where appropriate, depending on exactly what defines "distinctness" for your table.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

aws athena convert array<array<string>> to table - sql

Use CROSS JOIN UNNEST to unnest upper array: select a.time, state_array[1] as col1, state_array[2] as col2, state_array[3] as col3, state_array[4] as col4 from my_table a CROSS JOIN UNNEST(state) as t(state_array)

Related

Hive- how to get the derive column names and use it in the same query?

cast array<struct<key:string,value:array<string>>> into map<string,array<string>>

how to convert columns into rows in HIVE

HIVE- Divide single row in to multiple rows

How to append distinct records from one table to another

Categories

Resources