Coalesce array of integers in Hive - hive

foo_ids is an array of type bigint, but the entire array could be null. If the array is null, I want an empty array instead.
If I do this: COALESCE(foo_ids, ARRAY())
I get:
FAILED: SemanticException [Error 10016]: Line 13:45 Argument type mismatch 'ARRAY': The expressions after COALESCE should all have the same type: "array<bigint>" is expected but "array<string>" is found
If I do this: COALESCE(foo_ids, ARRAY<BIGINT>())
I get a syntax error: FAILED: ParseException line 13:59 cannot recognize input near ')' ')' 'AS' in expression specification
What's the proper syntax here?

Use this one:
coalesce(foo_ids, array(cast(null as bigint)))
Before, hive is treating empty array [] as []. But in Hadoop2, hive is now showing empty array [] as null (see refence below). Use array(cast(null as bigint)) for empty array of type bigint. Strangely, the size of empty array is -1 (instead of 0). Hope this helps. Thanks.
Sample data:
foo_ids
[112345677899098765,1123456778990987633]
[null,null]
NULL
select foo_ids, size(foo_ids) as sz from tbl;
Result:
foo_ids sz
[112345677899098765,1123456778990987633] 2
[null,null] 2
NULL -1
select foo_ids, coalesce(foo_ids, array(cast(null as bigint))) as newfoo from tbl;
Result:
foo_ids newfoo
[112345677899098765,1123456778990987633] [112345677899098765,1123456778990987633]
[null,null] [null,null]
NULL NULL
Reference: https://docs.treasuredata.com/articles/hive-change-201602

Related

divide operator error in PostgreSql: operator does not exist: unknown /

I have a Trip table in PostgreSQL DB, there is a column called meta in the table.
A example of meta in one row looks like:
meta = {"runTime": 3922000, "distance": 85132, "duration": 4049000, "fuelUsed": 19.595927498516176}
To select the trip which has largest value divided by "distance" and "runTime", I run query:
select MAX(tp."meta"->>'distance'/tp."meta"->>'runTime') maxkph FROM "Trip" tp
but I get ERROR:
/* ERROR: operator does not exist: unknown / jsonb LINE 1: MAX(tp."meta"->>'distance'/tp."meta"...
I also tried:
select MAX((tp."meta"->>'distance')/(tp."meta"->>'runTime')) maxkph FROM "Trip" tp
but get another ERROR:
/* ERROR: operator does not exist: text / text LINE 1: ...MAX((tp."meta"->>'distance')/(tp."meta...
Could you please help me to solve this problem?
There is not operator div for jsonb values. You have to cast a values on both sizes to some numeric type first:
MAX( ((tp."meta"->>'distance')::numeric) / ((tp."meta"->>'runTime')::numeric) ) maxkph
Try using parentheses:
MAX( (tp."meta"->>'distance') / (tp."meta"->>'runTime') ) as maxkph
Your second problem suggests that these values are stored as strings. So convert them:
MAX( (tp."meta"->>'distance')::numeric / (tp."meta"->>'runTime')::numeric ) as maxkph

Using REPLACE_REGEXPR in BW transformation throws syntax error

I'm trying to implement a routine for replacing some invalid characters in a BW transformation. But I keep getting a syntax error. This is my current code:
METHOD S0001_G01_R40 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT
OPTIONS READ-ONLY.
-- target field: 0POSTXT
-- Note the _M class are not considered for DTP execution.
-- AMDP Breakpoints must be set in the _A class instead.
outTab = SELECT REPLACE_REGEXPR('([^[:print:]|^[\x{00C0}-\x{017F}]|[#])'
IN "SGTXT" WITH '' OCCURRENCE ALL ) AS "/BI0/OIPOSTXT"
FROM :inTab;
errorTab = SELECT '' AS ERROR_TEXT,
'' AS SQL__PROCEDURE__SOURCE__RECORD FROM DUMMY
WHERE DUMMY <> 'X';
ENDMETHOD.
I keep getting the following error:
SQLSCRIPT message: return type mismatch: Procedure
/BIC/QCW72C4IJDC8JAFRICAU_M=>S0001_G01_R40: OUTTAB[ /BI0/OIPOSTXT:NVARCHAR(5000) ]
!= expected result [ POSTXT:NVARCHAR(60) RECORD:NVARCHAR(56)
SQL__PROCEDURE__SOURCE__RECORD:NVARCHAR(56) ]
Can anyone give me an idea of what I'm doing wrong here?
For those wondering how to correct this problem, here is the solution.
Everything is in the error message:
OUTTAB[ /BI0/OIPOSTXT:NVARCHAR(5000) ]
!= expected result [ POSTXT:NVARCHAR(60) RECORD:NVARCHAR(56)
SQL__PROCEDURE__SOURCE__RECORD:NVARCHAR(56) ]
It means the result table OutTab contains only one field (/BI0/OIPOSTXT) and so is different by the OutTab expected which should contain 3 fields POSTXT, RECORD and SQL__PROCEDURE__SOURCE__RECORD.
The expected structure can usually be seen on top of the public section:
types:
begin of TN_S_IN_S0001_G01_R1_1,
POSTXT type C length 60,
RECORD type C length 56,
SQL__PROCEDURE__SOURCE__RECORD type C length 56,
end of TN_S_IN_S0001_G01_R1_1 .
So the correct syntax would be:
outTab =
SELECT CAST(REPLACE_REGEXPR('([^[:print:]|^[\x{00C0}-\x{017F}]|[#])' IN "SGTXT" WITH '' OCCURRENCE ALL) AS NVARCHAR(60)) AS "POSTXT"
,"RECORD" AS "RECORD"
,SQL__PROCEDURE__SOURCE__RECORD AS "SQL__PROCEDURE__SOURCE__RECORD"
FROM :inTab;
Regards,
Jean-Guillaume
You might want to enclose the regex expression in a CAST( ... AS NVARCHAR(60)) to ensure that the resulting record structure matches the expected return type.

Query a hive table with array<array<string>> type

I have a hive table and had to put a filter where the value of the column =[]. The type of the column in array<array<string>>. I tried to use array_contains but gave the following error
Error while compiling statement: FAILED: SemanticException [Error
10016]: line 2:41 Argument type mismatch ''[]'': "array"
expected at function ARRAY_CONTAINS, but "string" is found
The sample values of the column could be
[]
[['a','b', 'c']]
[['a'],['b'], ['c']]
[]

PostgreSQL : Use of ANY for multiple values

I am trying to use the ANY function of PostgreSQL to search the value from array interger type column.
My SQL:
SELECT
*
FROM
company_employee_contacts
WHERE
corporate_complaint_type_ids = ANY(ARRAY[1,3]::integer[])
But it is giving me below error:
ERROR: operator does not exist: integer[] = integer
Can anyone tell me why I am getting this error while I am typecasting it?
because corporate_complaint_type_ids is not integer, but rather array of integers... You can't:
select '{2,3,4}'::int[] = ANY(ARRAY[1,3]::integer[]);
ERROR: operator does not exist: integer[] = integer
LINE 1: select '{2,3,4}'::int[] = ANY(ARRAY[1,3]::integer[]);
instead you can check if arrays overlap:
postgres#pond93# select '{2,3,4}'::int[] && ARRAY[1,3]::integer[];
?column?
----------
t
(1 row)
or you can check one array value against ANY(array):
postgres#pond93# select ('{2,3,4}'::int[])[1] = ANY(ARRAY[1,3]::integer[]);
?column?
----------
f
(1 row)

How can I use arrayExists function when the array contains a null value?

I have a nullable array column in my table: Array(Nullable(UInt16)). I want to be able to query this column using arrayExists (or arrayAll) to check if it contains a value above a certain threshold but I'm getting an exception when the array contains a null value:
Exception: Expression for function arrayExists must return UInt8, found Nullable(UInt8)
My query is below where distance is the array column:
SELECT * from TracabEvents_ArrayTest
where arrayExists(x -> x > 9, distance);
I've tried updating the comparison in the lambda to "(isNotNull(x) and x > 9)" but I'm still getting the error. Is there any way of handling nulls in these expressions or are they not supported yet?
Add a condition to filter rows with empty list using notEmpty and assumeNotNull for x in arrayExists.
SELECT * FROM TracabEvents_ArrayTest WHERE notEmpty(distance) AND arrayExists(x -> assumeNotNull(x) > 9, distance)