How to use Hive MACRO to reduce boilerplate

How to use Hive MACRO to reduce boilerplate - hive

My Hive code has the repeating pattern with 15 complex LATER VIEWs.
Below I simplified the explode code for the sake of brevity:
SELECT a,b,c,d FROM t
LATERAL VIEW explode(split(regexp(s,'A',''),',')) a as a
LATERAL VIEW explode(split(regexp(s,'B',''),',')) b as b
LATERAL VIEW explode(split(regexp(s,'C',''),',')) c as c
LATERAL VIEW explode(split(regexp(s,'D',''),',')) d as d
...
I tried to use MACRO to eliminate typing 15 times the complex explode expression which are very similar (differ only by 1 argument).
I created the following MACRO:
CREATE TEMPORARY MACRO explode_me(s string, p string)
explode(split(regexp(s,p,''),','))
;
SELECT a FROM t
LATERAL VIEW explode_me(s,'A') a as a
I got the error:
SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
I understand the error.
I do not understand how to make my code more compact.

I was able to solve it by removing the explode() from the macro body and keep only split() inside the macro.

Related

Hive: Parse nested json list

I have data which comprises nested json list, like:
{"id":"aaa", "list":[{"eventId":222},{"details":[{"sub1":333},{"sub2":444}]},{"name":555}]}
The target is to extract the "outer" list, like
id data
aaa {"eventId":222}
aaa {"details":[{"sub1":333},{"sub2":444}]}
aaa {"name":555}
How to explode the list without split the inner nested json list? Any help is appreciated.

You'll have to use the built in function get_json_object or json_tupple to get the json objects together with hive posexplode to retrieve values from list
For example,
SELECT get_json_object(json_object, $.id) as id,
posexplode(get_json_object(json_object, $.list)) as pos, val
FROM your table;
From that you can just use get_json_object on your pos,val columns.
You could also use explode with lateral VIEW on hive
Note
This SQL code was only for ilustrating, it may have some errors
edit:
As pointed by #leftjoin, would be impossible to handle this king with the return from get_json_object. Maybe se solution would be use a udf to handle this case.

How to compare an array from a json object with a normal array?

I want to do something like this
SELECT t.*
FROM table t
WHERE json_array_elements(t.data->'other_field'->my_array) && ARRAY['some_values']
But I can't, due to this error
ERROR: set-returning functions are not allowed in WHERE
I searched a lot for a solution without using other joins or stuff like that.
So how can I do something like this in query as less complex as it can be?

If the array elements are strings, you can use the ?| operator. But that only works with jsonb values. As your column seems to be a json you need to cast it:
select *
from the_table t
where (t.data::jsonb -> 'other_field' -> 'my_array') ?| array['..', '..'];

Is it possible to use commodin in the left side of a LIKE operator?

Is there a way to be able to use at least _ in the left side so the following statement returns 1:
SELECT 1 FROM DUAL WHERE
'te_ephone' like 'tele_ho%'
I want oracle to parse the left side as it parses the right one, to make _ match 'any' char. Is this possible or is there any workaround to make this work?
To give some context, the final objective is that things like remoñoson matchs with remonos%.
Left hand side is a column where I am replacing some characters by _ whilst the start with query with the same replacement.

Based on your context what you are expecting can be achieved using Linguistic Sort which gives detailed information about searching Linguistic strings and sorting
Example1(case-insensitive or accent-insensitive comparisons):-
SELECT word FROM test1
WHERE NLS_UPPER(word, 'NLS_SORT = XGERMAN') = 'GROSSE';
WORD
------------
GROSSE
Große
große
Example 2 using Regular expression with the Base Letter Operator [==]:-
Oracle SQL syntax:
SQL> SELECT col FROM test WHERE REGEXP_LIKE(col,'r[[=e=]]sum[[=e=]]');
Expression: r[[=e=]]sum[[=e=]]
Matches:
resume
résumé
résume
resumé

Check if value exists in Postgres array

Using Postgres 9.0, I need a way to test if a value exists in a given array. So far I came up with something like this:
select '{1,2,3}'::int[] #> (ARRAY[]::int[] || value_variable::int)
But I keep thinking there should be a simpler way to this, I just can't see it. This seems better:
select '{1,2,3}'::int[] #> ARRAY[value_variable::int]
I believe it will suffice. But if you have other ways to do it, please share!

Simpler with the ANY construct:
SELECT value_variable = ANY ('{1,2,3}'::int[])
The right operand of ANY (between parentheses) can either be a set (result of a subquery, for instance) or an array. There are several ways to use it:
SQLAlchemy: how to filter on PgArray column types?
IN vs ANY operator in PostgreSQL
Important difference: Array operators (<#, #>, && et al.) expect array types as operands and support GIN or GiST indices in the standard distribution of PostgreSQL, while the ANY construct expects an element type as left operand and can be supported with a plain B-tree index (with the indexed expression to the left of the operator, not the other way round like it seems to be in your example). Example:
Index for finding an element in a JSON array
None of this works for NULL elements. To test for NULL:
Check if NULL exists in Postgres array

Watch out for the trap I got into: When checking if certain value is not present in an array, you shouldn't do:
SELECT value_variable != ANY('{1,2,3}'::int[])
but use
SELECT value_variable != ALL('{1,2,3}'::int[])
instead.

but if you have other ways to do it please share.
You can compare two arrays. If any of the values in the left array overlap the values in the right array, then it returns true. It's kind of hackish, but it works.
SELECT '{1}' && '{1,2,3}'::int[]; -- true
SELECT '{1,4}' && '{1,2,3}'::int[]; -- true
SELECT '{4}' && '{1,2,3}'::int[]; -- false
In the first and second query, value 1 is in the right array
Notice that the second query is true, even though the value 4 is not contained in the right array
For the third query, no values in the left array (i.e., 4) are in the right array, so it returns false

unnest can be used as well.
It expands array to a set of rows and then simply checking a value exists or not is as simple as using IN or NOT IN.
e.g.
id => uuid
exception_list_ids => uuid[]
select * from table where id NOT IN (select unnest(exception_list_ids) from table2)

Hi that one works fine for me, maybe useful for someone
select * from your_table where array_column ::text ilike ANY (ARRAY['%text_to_search%'::text]);

"Any" works well. Just make sure that the any keyword is on the right side of the equal to sign i.e. is present after the equal to sign.
Below statement will throw error: ERROR: syntax error at or near "any"
select 1 where any('{hello}'::text[]) = 'hello';
Whereas below example works fine
select 1 where 'hello' = any('{hello}'::text[]);

When looking for the existence of a element in an array, proper casting is required to pass the SQL parser of postgres. Here is one example query using array contains operator in the join clause:
For simplicity I only list the relevant part:
table1 other_name text[]; -- is an array of text
The join part of SQL shown
from table1 t1 join table2 t2 on t1.other_name::text[] #> ARRAY[t2.panel::text]
The following also works
on t2.panel = ANY(t1.other_name)
I am just guessing that the extra casting is required because the parse does not have to fetch the table definition to figure the exact type of the column. Others please comment on this.

Searching a column containing CSV data in a MySQL table for existence of input values

I have a table say, ITEM, in MySQL that stores data as follows:
ID FEATURES
--------------------
1 AB,CD,EF,XY
2 PQ,AC,A3,B3
3 AB,CDE
4 AB1,BC3
--------------------
As an input, I will get a CSV string, something like "AB,PQ". I want to get the records that contain AB or PQ. I realized that we've to write a MySQL function to achieve this. So, if we have this magical function MATCH_ANY defined in MySQL that does this, I would then simply execute an SQL as follows:
select * from ITEM where MATCH_ANY(FEAURES, "AB,PQ") = 0
The above query would return the records 1, 2 and 3.
But I'm running into all sorts of problems while implementing this function as I realized that MySQL doesn't support arrays and there's no simple way to split strings based on a delimiter.
Remodeling the table is the last option for me as it involves lot of issues.
I might also want to execute queries containing multiple MATCH_ANY functions such as:
select * from ITEM where MATCH_ANY(FEATURES, "AB,PQ") = 0 and MATCH_ANY(FEATURES, "CDE")
In the above case, we would get an intersection of records (1, 2, 3) and (3) which would be just 3.
Any help is deeply appreciated.
Thanks

First of all, the database should of course not contain comma separated values, but you are hopefully aware of this already. If the table was normalised, you could easily get the items using a query like:
select distinct i.Itemid
from Item i
inner join ItemFeature f on f.ItemId = i.ItemId
where f.Feature in ('AB', 'PQ')
You can match the strings in the comma separated values, but it's not very efficient:
select Id
from Item
where
instr(concat(',', Features, ','), ',AB,') <> 0 or
instr(concat(',', Features, ','), ',PQ,') <> 0

For all you REGEXP lovers out there, I thought I would add this as a solution:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]';
and for case sensitivity:
SELECT * FROM ITEM WHERE FEATURES REGEXP BINARY '[[:<:]]AB|PQ[[:>:]]';
For the second query:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]' AND FEATURES REGEXP '[[:<:]]CDE[[:>:]];
Cheers!

select *
from ITEM where
where CONCAT(',',FEAURES,',') LIKE '%,AB,%'
or CONCAT(',',FEAURES,',') LIKE '%,PQ,%'
or create a custom function to do your MATCH_ANY

Alternatively, consider using RLIKE()
select *
from ITEM
where ','+FEATURES+',' RLIKE ',AB,|,PQ,';

Just a thought:
Does it have to be done in SQL? This is the kind of thing you might normally expect to write in PHP or Python or whatever language you're using to interface with the database.
This approach means you can build your query string using whatever complex logic you need and then just submit a vanilla SQL query, rather than trying to build a procedure in SQL.
Ben

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use Hive MACRO to reduce boilerplate - hive

I was able to solve it by removing the explode() from the macro body and keep only split() inside the macro.

Related

Hive: Parse nested json list

How to compare an array from a json object with a normal array?

Is it possible to use commodin in the left side of a LIKE operator?

Check if value exists in Postgres array

Searching a column containing CSV data in a MySQL table for existence of input values

Categories

Resources