Query a hive table with array<array<string>> type - hive

I have a hive table and had to put a filter where the value of the column =[]. The type of the column in array<array<string>>. I tried to use array_contains but gave the following error
Error while compiling statement: FAILED: SemanticException [Error
10016]: line 2:41 Argument type mismatch ''[]'': "array"
expected at function ARRAY_CONTAINS, but "string" is found
The sample values of the column could be
[]
[['a','b', 'c']]
[['a'],['b'], ['c']]
[]

Related

How to resolve this sql error of schema_of_json

I need to find out the schema of a given JSON file, I see sql has schema_of_json function
and something like this works flawlessly
> SELECT schema_of_json('[{"col":0}]');
ARRAY<STRUCT<`col`: BIGINT>>
But if I query for my table name, it gives me the following error
>SELECT schema_of_json(Transaction) as json_data from table_name;
Error in SQL statement: AnalysisException: cannot resolve 'schemaofjson(`Transaction`)' due to data type mismatch: The input json should be a string literal and not null; however, got `Transaction`.; line 1 pos 7;
The Transaction is one of the columns in my table and after checking it manually I can attest that it is of String type(json).
The SQL statement has it to give me the schema of the JSON, how to do it?
after looking further into the documentation that it is clear that the word foldable means that of the static one, and a column from a table JSON won't work
for minimal reroducible example here you go:
SELECT schema_of_json(CAST('{ "a": "b" }' AS STRING))
As soon as the cast is introduced in the above statement, the schema_of_json will fail......... It needs a static JSON as it's input

PostgreSQL - Query nested json in text column

My situation is the following
-> The table A has a column named informations whose type is text
-> Inside the informations column is stored a JSON string (but is still a string), like this:
{
"key": "value",
"meta": {
"inner_key": "inner_value"
}
}
I'm trying to query this table by seraching its informations.meta.inner_key column with the given query:
SELECT * FROM A WHERE (informations::json#>>'{meta, inner_key}' = 'inner_value')
But I'm getting the following error:
ERROR: invalid input syntax for type json
DETAIL: The input string ended unexpectedly.
CONTEXT: JSON data, line 1:
SQL state: 22P02
I've built the query following the given link: DevHints - PostgreSQL
Does anyone know how to properly build the query ?
EDIT 1:
I solved with this workaround, but I think there are better solutions to the problem
WITH temporary_table as (SELECT A.informations::json#>>'{meta, inner_key}' as inner_key FROM A)
SELECT * FROM temporary_table WHERE inner_key = 'inner_value'

Coalesce array of integers in Hive

foo_ids is an array of type bigint, but the entire array could be null. If the array is null, I want an empty array instead.
If I do this: COALESCE(foo_ids, ARRAY())
I get:
FAILED: SemanticException [Error 10016]: Line 13:45 Argument type mismatch 'ARRAY': The expressions after COALESCE should all have the same type: "array<bigint>" is expected but "array<string>" is found
If I do this: COALESCE(foo_ids, ARRAY<BIGINT>())
I get a syntax error: FAILED: ParseException line 13:59 cannot recognize input near ')' ')' 'AS' in expression specification
What's the proper syntax here?
Use this one:
coalesce(foo_ids, array(cast(null as bigint)))
Before, hive is treating empty array [] as []. But in Hadoop2, hive is now showing empty array [] as null (see refence below). Use array(cast(null as bigint)) for empty array of type bigint. Strangely, the size of empty array is -1 (instead of 0). Hope this helps. Thanks.
Sample data:
foo_ids
[112345677899098765,1123456778990987633]
[null,null]
NULL
select foo_ids, size(foo_ids) as sz from tbl;
Result:
foo_ids sz
[112345677899098765,1123456778990987633] 2
[null,null] 2
NULL -1
select foo_ids, coalesce(foo_ids, array(cast(null as bigint))) as newfoo from tbl;
Result:
foo_ids newfoo
[112345677899098765,1123456778990987633] [112345677899098765,1123456778990987633]
[null,null] [null,null]
NULL NULL
Reference: https://docs.treasuredata.com/articles/hive-change-201602

Hive sql struct mismatch

I have a table with columns like this:
table<mytable>
field<myfield1> type(array<struct>)
item<struct>
cars(string)
isRed(boolean)
information(bigint)
When I perform the following query
select myfield1.isRed
from mytable
where myfield1.isRed = true
I get an error:
Argument type mismatch '': The 1st argument of EQUAL is expected to a primitive type, but list is found
When I query without the where the data looks something like this
[true,true,true]
[true,true,true,true,true,true]
[true]
[true, true]
Try this:
select myfield1[1]
from mytable
where myfield1[1] = true
You can find more info about how to access complex types here

Hive will not recognize column name in the 2nd condition in the query

I ran the following and query and got the following error. Please see that error message mentions the column name (platform) that it thinks is not there. weird.
hive -S -e 'select * from devices.device_app_action where ds= '20160511'
AND platform= 'ios' limit 3;'
FAILED: SemanticException [Error 10004]: Line 1:73 Invalid table alias or column reference 'ios': (possible column names are: duid, id, dt, app, platform, app_level, tier1, tier2, tier3, tier4, tier5, tier6, first_geo, first_v, first_lang, total_events, min_ats, max_ats, ds)
Its telling me the column platform does not exist and its there in the list
could be you have to enclose the query in proper quotes
'select * from devices.device_app_action where ds= "20160511" AND platform= "ios" limit 3;'