Optimizing multiple identical operator and function calls in Hive?

Optimizing multiple identical operator and function calls in Hive? - hive

I'm new to Hive and trying to optimize a query that is taking a while to run. I have identical calls to regexp_extract and get_json in my SELECT and WHERE statements, and I was wondering if there is a way to optimize this by storing the results from one statement and using them in the other (or if Hive is already doing something like this in the background).
Example query:
SELECT
regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)') as query
FROM
api_request_logs
WHERE
LENGTH(regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)')) > 0
Thanks!

You can use a derived table to specify the regex only once but I don't think it runs faster
select * from (
select regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)') as query
from api_request_logs
) t where length(query) > 0

Related

Force Presto to maintain order of WHERE clauses

I'm trying to run something like the following query:
SELECT * FROM foo WHERE cardinality(bar) > 0 AND bar[1] = '...';
However, I'm getting Query failed: Array subscript out of bounds. I'm assuming this is because Presto is trying to optimize the query by checking bar[1] = '...' before checking cardinality(bar) > 0. Is there a way to force Presto to maintain the order of the clauses?

I've solved this in two ways when I've needed it.
Use the element_at function instead of the [] subscript notation. element_at returns a NULL when indexing past the end of an array, so you could reduce your example to one condition. element_at also works in the SELECT clause, although it isn't needed with your WHERE clause:
SELECT bar[1] FROM foo WHERE element_at(bar,1) = '...';
Do the first condition in a subquery using the with clause:
WITH (SELECT * FROM foo WHERE cardinality(bar) > 0) AS populated_foo
SELECT * FROM populated_foo WHERE bar[1] = '...';
The 2nd approach doesn't make much sense for your example, but I've found it useful for more complex conditions involving row objects inside of arrays.

Cannot figure out SQL Select statement in Access 2016

I'm having some trouble understanding WHY a select statement isn't working in a query I'm making.
I've got the SELECT and FROM lines functioning. With just those, ALL results from my selected table are displayed - 517 or so
What I want to do is display results based on a pattern using LIKE - What I have so far
SELECT *
FROM Tbl_ServiceRequestMatrix
WHERE Tbl_ServiceRequestMatrix.[Application/Form] LIKE 'P%';
This returns 0 results - despite the fact that the column selected DOES have entries that start with 'P'
I also tried utilising brackets, see if that was the issue - still displays 0 results:
SELECT *
FROM Tbl_ServiceRequestMatrix
WHERE ((Tbl_ServiceRequestMatrix.[Application/Form])='p%');
Can any one help me understand why my WHERE ** LIKE statement is causing 0 results to be displayed?

The wildcard character in MS Access is (by default) * instead of %:
WHERE Tbl_ServiceRequestMatrix.[Application/Form] LIKE "P*"

LIKE Statement has different parameters in different sql languages.
In MS Access you need * Instead of % in LIKE Statement.

Implement an IN Query using XQuery in MSSQLServer 2005

I'm trying to query an xml column using an IN expression. I have not found a native XQuery way of doing such a query so I have tried two work-arounds:
Implement the IN query as a concatenation of ORs like this:
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:variable("#Param1368320145") or
text() = sql:variable("#Param2043685301") or ...
Implement the IN query with the String fn:contains(...) method like this:
WHERE Data.exist('/Document/Field2[fn:contains(sql:variable("#Param1412022317"), .)]') = 1
Where the given parameter is a (long) string with the values separated by "|"
The problem is that Version 1. doesn't work for more than about 50 arguments. The server throws an out of memory exception. Version 2. works, but is very, very slow.
Has anyone a 3. idea? To phrase the problem more complete: Given a list of values, of any sql native type, select all rows whose xml column has one of the given values at a specific field in the xml.

Try to insert all your parameters in a table and query using sql:column clause:
SELECT Mytable.Column FROM MyTable
CROSS JOIN (SELECT '#Param1' T UNION ALL SELECT '#Param2') B
WHERE Data.exist('/Document/ParentKTMNode[text() = sql:column("T")

Searching a column containing CSV data in a MySQL table for existence of input values

I have a table say, ITEM, in MySQL that stores data as follows:
ID FEATURES
--------------------
1 AB,CD,EF,XY
2 PQ,AC,A3,B3
3 AB,CDE
4 AB1,BC3
--------------------
As an input, I will get a CSV string, something like "AB,PQ". I want to get the records that contain AB or PQ. I realized that we've to write a MySQL function to achieve this. So, if we have this magical function MATCH_ANY defined in MySQL that does this, I would then simply execute an SQL as follows:
select * from ITEM where MATCH_ANY(FEAURES, "AB,PQ") = 0
The above query would return the records 1, 2 and 3.
But I'm running into all sorts of problems while implementing this function as I realized that MySQL doesn't support arrays and there's no simple way to split strings based on a delimiter.
Remodeling the table is the last option for me as it involves lot of issues.
I might also want to execute queries containing multiple MATCH_ANY functions such as:
select * from ITEM where MATCH_ANY(FEATURES, "AB,PQ") = 0 and MATCH_ANY(FEATURES, "CDE")
In the above case, we would get an intersection of records (1, 2, 3) and (3) which would be just 3.
Any help is deeply appreciated.
Thanks

First of all, the database should of course not contain comma separated values, but you are hopefully aware of this already. If the table was normalised, you could easily get the items using a query like:
select distinct i.Itemid
from Item i
inner join ItemFeature f on f.ItemId = i.ItemId
where f.Feature in ('AB', 'PQ')
You can match the strings in the comma separated values, but it's not very efficient:
select Id
from Item
where
instr(concat(',', Features, ','), ',AB,') <> 0 or
instr(concat(',', Features, ','), ',PQ,') <> 0

For all you REGEXP lovers out there, I thought I would add this as a solution:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]';
and for case sensitivity:
SELECT * FROM ITEM WHERE FEATURES REGEXP BINARY '[[:<:]]AB|PQ[[:>:]]';
For the second query:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]' AND FEATURES REGEXP '[[:<:]]CDE[[:>:]];
Cheers!

select *
from ITEM where
where CONCAT(',',FEAURES,',') LIKE '%,AB,%'
or CONCAT(',',FEAURES,',') LIKE '%,PQ,%'
or create a custom function to do your MATCH_ANY

Alternatively, consider using RLIKE()
select *
from ITEM
where ','+FEATURES+',' RLIKE ',AB,|,PQ,';

Just a thought:
Does it have to be done in SQL? This is the kind of thing you might normally expect to write in PHP or Python or whatever language you're using to interface with the database.
This approach means you can build your query string using whatever complex logic you need and then just submit a vanilla SQL query, rather than trying to build a procedure in SQL.
Ben

Scalar Max in Sql Server

How to implement the scalar MAX in Sql server (like Math.Max). (In essense I want to implement something like Max(expression, 0), to make negative values replaced by 0.)
I've seen in other threads solutions with
creating a scalar function (how's that with performance?)
case when expression > 0 THEN expression ELSE 0) (then 'expression' is evaluated twice?)
complicated uses of
Max(aggregate).
What's the best?
Why does Sql Server not have something like this built in? Any complications I don't see?

In all other major systems this function is called GREATEST.
SQL Server seriously lacks it.
You can make a bunch of case statements, or use something like this:
SELECT (
SELECT MAX(expression)
FROM (
SELECT expression
UNION ALL
SELECT 0
) q
) AS greatest
FROM table_that_has_a_field_named_expression
The latter one is a trifle less performant than CASE statements.

you want to create a user-defined scalar function named MAX in sql server to take two parameters as input, an expression and a min value, and return the expression if it exceeds the min value, otherwise return the min value?
I wouldn't worry about the performance of scalar functions, let the server do that.
I'd also use IF instead of CASE to test for > min value.
SQL server doesn't have a built in function for you because they didn't think it would be widely enough used, I guess.

The query optimizer should prevent the expression from being calculated multiple times.
For readability / maintainability, consider using a CTE to calculate the expression before the CASE statement.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimizing multiple identical operator and function calls in Hive? - hive

You can use a derived table to specify the regex only once but I don't think it runs faster select * from ( select regexp_extract(get_json(json, 'url'), '.[&?]q=([^&])') as query from api_request_logs ) t where length(query) > 0

Related

Force Presto to maintain order of WHERE clauses

Cannot figure out SQL Select statement in Access 2016

Implement an IN Query using XQuery in MSSQLServer 2005

Searching a column containing CSV data in a MySQL table for existence of input values

Scalar Max in Sql Server

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Optimizing multiple identical operator and function calls in Hive? - hive

You can use a derived table to specify the regex only once but I don't think it runs faster select * from ( select regexp_extract(get_json(json, 'url'), '.*[&?]q=([^&]*)') as query from api_request_logs ) t where length(query) > 0

Related

Force Presto to maintain order of WHERE clauses

Cannot figure out SQL Select statement in Access 2016

Implement an IN Query using XQuery in MSSQLServer 2005

Searching a column containing CSV data in a MySQL table for existence of input values

Scalar Max in Sql Server

Categories

Resources

You can use a derived table to specify the regex only once but I don't think it runs faster select * from ( select regexp_extract(get_json(json, 'url'), '.[&?]q=([^&])') as query from api_request_logs ) t where length(query) > 0