Optimize pass parameter to view - sql

I have quite complicated view in mysql, like
select filter.id as filter_id, person.id, person.name
from person, filter
inner join
...
left join
...
where person_match_filter_condition ...
group by filter.filter_id, person.id, person.name
Query filters person which corresponds domain specific conditions.
Typical use of view is:
select * from where filter_id = some_value
Problem is that mysql cannot optimize query. It applies confition by filter_id AFTER get data for all filters - very ineffective.
Idea to get filter_id from other tables is not good for my case.
How can I transform my query to make it more effective?

Wrap the long query in a procedure, and pass the filters to the procedure call as parameters. Then instead of using views you call the procedure, the procedure will build you the entire query and will run optimized query.

Better yet, you can pass parameters to your views in a simple manner by creating a Function to GET your values from Session Variables. See https://www.stackoverflow.com/questions/14511760 for the technique. This is a copy of my create function you may wish to pattern after.
DELIMITER //
CREATE FUNCTION fn_getcase_id()
RETURNS MEDIUMINT(11)
DETERMINISTIC NO SQL
BEGIN
see stackoverflow.com/questions/14511760 and read ALL the info TWICE or MORE. wh 04/13/2017
RETURN #sv_case_id;
END//
DELIMITER ;
You will need to create a similar FN (one for each variable).

Related

How to re-use a SQL query in a PL/SQL procedure?

I am writing a PL/SQL procedure. In the body of this procedure, how can I use twice the same query without re-writing it ?
To simplify, let's say that I have this SQL query :
SELECT *
FROM mytable
WHERE age > 18
Is there a way to "store it", so I could do for example :
SELECT COUNT(*) INTO var1
FROM myQuery
I know the WITH ... AS keywords, but as I know it can be only used in the current statement, and I want to be able to call it from different statements.
Thanks !
There are various possibilities. Here are the ones I think of immediately, there are probably others:
Declare an explicit CURSOR using your query, and use that cursor multiple times in the body of your procedure.
Store the query in a string variable, and use EXECUTE IMMEDIATE to run it multiple times
Execute the query once, storing the results in a local collection (nested table, most likely), and process those stored results multiple times
Create a function that executes the query and returns its results as a nested-table type. Then SELECT FROM TABLE( my_function ) multiple times

How to pass a set of rows from one function into another?

Overview
I'm using PostgreSQL 9.1.14, and I'm trying to pass the results of a function into another function. The general idea (specifics, with a minimal example, follow) is that we can write:
select * from (select * from foo ...)
and we can abstract the sub-select away in a function and select from it:
create function foos()
returns setof foo
language sql as $$
select * from foo ...
$$;
select * from foos()
Is there some way to abstract one level farther, so as to be able to do something like this (I know functions cannot actually have arguments with setof types):
create function more_foos( some_foos setof foo )
language sql as $$
select * from some_foos ... -- or unnest(some_foos), or ???
$$:
select * from more_foos(foos())
Minimal Example and Attempted Workarounds
I'm using PostgreSQL 9.1.14. Here's a minimal example:
-- 1. create a table x with three rows
drop table if exists x cascade;
create table if not exists x (id int, name text);
insert into x values (1,'a'), (2,'b'), (3,'c');
-- 2. xs() is a function with type `setof x`
create or replace function xs()
returns setof x
language sql as $$
select * from x
$$;
-- 3. xxs() should return the context of x, too
-- Ideally the argument would be a `setof x`,
-- but that's not allowed (see below).
create or replace function xxs(x[])
returns setof x
language sql as $$
select x.* from x
join unnest($1) y
on x.id = y.id
$$;
When I load up this code, I get the expected output for the table definitions, and I can call and select from xs() as I'd expect. But when I try to pass the result of xs() to xxs(), I get an error that "function xxs(x) does not exist":
db=> \i test.sql
DROP TABLE
CREATE TABLE
INSERT 0 3
CREATE FUNCTION
CREATE FUNCTION
db=> select * from xs();
1 | a
2 | b
3 | c
db=> select * from xxs(xs());
ERROR: function xxs(x) does not exist
LINE 1: select * from xxs(xs());
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
I'm a bit confused about "function xxs(x) does not exist"; since the return type of xs() was setof x, I'd expected that its return type would be setof x (or maybe x[]), not x. Following the complaints about the type, I can get to either of the following , but while with either definition I can select xxs(xs());, I can't select * from xxs(xs());.
create or replace function xxs( x )
returns setof x
language sql as $$
select x.* from x
join unnest(array[$1]) y -- unnest(array[...]) seems pretty bad
on x.id = y.id
$$;
create or replace function xxs( x )
returns setof x
language sql as $$
select * from x
where x.id in ($1.id)
$$;
db=> select xxs(xs());
(1,a)
(2,b)
(3,c)
db=> select * from xxs(xs());
ERROR: set-valued function called in context that cannot accept a set
Summary
What's the right way to pass the results of a set-returning function into another function?
(I have noted that create function … xxs( setof x ) … results in the error: ERROR: functions cannot accept set arguments, so the answer won't literally be passing a set of rows from one function to another.)
Table functions
I perform very high speed, complex database migrations for a living, using SQL as both the client and server language (no other language is used), all running server side, where the code rarely surfaces from the database engine. Table functions play a HUGE role in my work. I don't use "cursors" since they are too slow to meet my performance requirements, and everything I do is result set oriented. Table functions have been an immense help to me in completely eliminating use of cursors, achieving very high speed, and have contributed dramatically towards reducing code volume and improving simplicity.
In short, you use a query that references two (or more) table functions to pass the data from one table function to the next. The select query result set that calls the table functions serves as the conduit to pass the data from one table function to the next. On the DB2 platform / version I work on, and it appears based on a quick look at the 9.1 Postgres manual that the same is true there, you can only pass a single row of column values as input to any of the table function calls, as you've discovered. However, because the table function call happens in the middle of a query's result set processing, you achieve the same effect of passing a whole result set to each table function call, albeit, in the database engine plumbing, the data is passed only one row at a time to each table function.
Table functions accept one row of input columns, and return a single result set back into the calling query (i.e. select) that called the function. The result set columns passed back from a table function become part of the calling query's result set, and are therefore available as input to the next table function, referenced later in the same query, typically as a subsequent join. The first table function's result columns are fed as input (one row at a time) to the second table function, which returns its result set columns into the calling query's result set. Both the first and second table function result set columns are now part of the calling query's result set, and are now available as input (one row at a time) to a third table function. Each table function call widens the calling query's result set via the columns it returns. This can go on an on until you start hitting limits on the width of a result set, which likely varies from one database engine to the next.
Consider this example (which may not match Postgres' syntax requirements or capabilities as I work on DB2). This is one of many design patterns in which I use table functions, is one of the simpler ones, that I think is very illustrative, and one that I anticipate would have broad appeal if table functions were in heavy mainstream use (to my knowledge they are not, but I think they deserve more attention than they are getting).
In this example, the table functions in use are: VALIDATE_TODAYS_ORDER_BATCH, POST_TODAYS_ORDER_BATCH, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH. On the DB2 version I work on, you wrap the table function inside "TABLE( place table function call and parameters here )", but based on quick look at a Postgres manual it appears you omit the "TABLE( )" wrapper.
create table TODAYS_ORDER_PROCESSING_EXCEPTIONS as (
select TODAYS_ORDER_BATCH.*
,VALIDATION_RESULT.ROW_VALID
,POST_RESULT.ROW_POSTED
,WAREHOUSE_RESULT.ROW_WAREHOUSED
from TODAYS_ORDER_BATCH
cross join VALIDATE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as VALIDATION_RESULT ( ROW_VALID ) --example: 1/0 true/false Boolean returned
left join POST_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as POST_RESULT ( ROW_POSTED ) --example: 1/0 true/false Boolean returned
on ROW_VALIDATED = '1'
left join DATA_WAREHOUSE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as WAREHOUSE_RESULT ( ROW_WAREHOUSED ) --example: 1/0 true/false Boolean returned
on ROW_POSTED = '1'
where coalesce( ROW_VALID, '0' ) = '0' --Capture only exceptions and unprocessed work.
or coalesce( ROW_POSTED, '0' ) = '0' --Or, you can flip the logic to capture only successful rows.
or coalesce( ROW_WAREHOUSED, '0' ) = '0'
) with data
If table TODAYS_ORDER_BATCH contains 1,000,000 rows, then
VALIDATE_TODAYS_ORDER_BATCH will be called 1,000,000 times, once for
each row.
If 900,000 rows pass validation inside VALIDATE_TODAYS_ORDER_BATCH, then POST_TODAYS_ORDER_BATCH will be called 900,000 times.
If only 850,000 rows successfully post, then VALIDATE_TODAYS_ORDER_BATCH needs some loopholes closed LOL, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH will be called 850,000 times.
If 850,000 rows successfully made it into the Data Warehouse (i.e. no additional exceptions were generated), then table TODAYS_ORDER_PROCESSING_EXCEPTIONS will be populated with 1,000,000 - 850,000 = 150,000 exception rows.
The table function calls in this example are only returning a single column, but they could be returning many columns. For example, the table function validating an order row could return the reason why an order failed validation.
In this design, virtually all the chatter between a HLL and the database is eliminated, since the HLL requestor is asking the database to process the whole batch in ONE request. This results in a reduction of millions of SQL requests to the database, in a HUGE removal of millions of HLL procedure or method calls, and as a result provides a HUGE runtime improvement. In contrast, legacy code which often processes a single row at a time, would typically send 1,000,000 fetch SQL requests, 1 for each row in TODAYS_ORDER_BATCH, plus at least 1,000,000 HLL and/or SQL requests for validation purposes, plus at least 1,000,000 HLL and/or SQL requests for posting purposes, plus 1,000,000 HLL and/or SQL requests for sending the order to the data warehouse. Granted, using this table function design, inside the table functions SQL requests are being sent to the database, but when the database makes requests to itself (i.e from inside a table function), the SQL requests are serviced much faster (especially in comparison to a legacy scenario where the HLL requestor is doing single row processing from a remote system, with the worst case over a WAN - OMG please don't do that).
You can easily run into performance problems if you use a table function to "fetch a result set" and then join that result set to other tables. In that case, the SQL optimizer can't predict what set of rows will be returned from the table function, and therefore it can't optimize the join to subsequent tables. For that reason, I rarely use them for fetching a result set, unless I know that result set will be a very small number of rows, hence not causing a performance problem, or I don't need to join to subsequent tables.
In my opinion, one reason why table functions are underutilized is that they are often perceived as only a tool to fetch a result set, which often performs poorly, so they get written off as a "poor" tool to use.
Table functions are immensely useful for pushing more functionality over to the server, for eliminating most of the chatter between the database server and programs on remote systems, and even for eliminating chatter between the database server and external programs on the same server. Even chatter between programs on the same server carries more overhead than many people realize, and much of it is unnecessary. The heart of the power of table functions lies in using them to perform actions inside result set processing.
There are more advanced design patterns for using table functions that build on the above pattern, where you can maximize result set processing even further, but this post is a lot for most to absorb already.

SQL: How to rewrite left join with unknown parameter in view

This is meant as kind of a simplified / more to the point question of:
NHibernate / QueryOver: How to left join with parameter
The core problem is that I have the following query:
1)
Select v.*, ...
from someView v
LEFT JOIN someTable t on v.ForeignKey = t.ForeignKey
AND t.SomeOtherValue = #myParam
where #myParam is some parameter.
I want to use this query inside a view but since I don´t know #myParam when creating the view I don´t know any way to attach it to the query so it is used directly inside the join. All I can do is get a version of the query like this:
2)
Select v.*, ...
from (someView v
LEFT JOIN someTable t ON v.ForeignKey = t.ForeignKey)
WHERE SomeOtherValue = 123
wich would in the view it would look like this:
3)
CREATE VIEW myView AS
Select v.*, t.SomeOtherValue, ...
from someView v
LEFT JOIN someTable t on v.ForeignKey = t.ForeignKey
and then say:
SELECT *
from myView
where SomeOtherValue = #myParam
In both cases (2 and 3) #myParam gets applied only after the left join already happened, so the result set is different (and in my case incorrect).
So I am searching for a way to rewrite 1) in a way so that I can use it inside a view (with similar syntax as in 2 and 3)
NOTE:
Using a table valued function with #myParam as parameter would work but then again I can´t use it as a model for NHibernate or with QueryOver, so that is not really an option.
On DB2, I accomplished what you're trying to do once by:
Making a scalar UDF function that was referenced in place of your #myParam.
The scalar UDF function retrieved the parameter value from a SESSION (temporary table).
Prior to referencing the view at runtime, my code created or recreated and populated the SESSION temporary table with a single row with the parameter value (giving the scalar UDF something to feed off of).
At runtime, the SQL view reference would call the scalar UDF function, which would access the temp table, and return the parameter value to the view, and voila, it worked.
On DB2, a table UDF function could perform in a similar manner. In general, table functions are more flexible than scalar functions, and are often a better choice.
DB2 supports late binding when using functions in that manner.
I don't know what database you're using, but you might have good luck trying something similar.

Execute Subqueries of a View

I have a rather large view constructed using the "WITH" statement. I've build the view's logic using small, understandable sub queries which are built on top of each other.
The result is a clearly structured SQL which (I believe ) is good to follow even if you're not the creator.
My problem comes with debugging in the future. If at some stage a collegue wants to understand how the result of the view is computed, it's a good approach to to execute some of the sub queries.
The usual approach would be to copy the view's SQL to an SQL-editor (e.g. SQL-Developer) and to replace the main statement with the subquery you're interested in.
Example:
WITH
all_orders AS (
SELECT order, price ... FROM ...
),
all_customers AS (
SELECT id,
last_name,
first_name,
first_order_date ...
FROM...
),
new_customers AS (
SELECT id,
last_name,
first_name
FROM all_customers
WHERE first_order_date > ...
)
-- main SQL
SELECT ... FROM all_orders a
INNER JOIN new_customers ON (...)
If I have the feeling that something is wrong with "new_customers" I would comment out the main SQL and replace it with:
...
-- main SQL
-- SELECT ... FROM all_orders a
-- INNER JOIN new_customers ON (...)
SELECT * FROM new_customers;
If I see that new_customers contains wrong data and I want to check if at least it's source "all_customers" is correct, I replace my main SQL again with
...
-- main SQL
-- SELECT ... FROM all_orders a
-- INNER JOIN new_customers ON (...)
SELECT * FROM all_customers;
That works really well but as soon as the SQL is inside a view I only have access to the result of the main SQL as the normal output of the view.
However, for simple debugging (meaning without going to the SQL editor, looking up the views definition and coping the SQL to the SQL editor) it would be really helpful to have some kind of database function where I could say:
SELECT * FROM RUN_SUBQUERY('my_view_name', 'new_customers');
My question: Is there such a database function or some similar approach which would allow me to quickly execute sub queries of a database view without splitting the logic up into sub views?
See below for my experience with sub views.
Alternatives
Split the big view in separate smaller views:
I tried this. The execution speed of the SQL dropped by factor 10. Since this is way too slow I'm also looking at some possible optimization here - however, I've seen it running really well when it was all in one statement/ view so it's hard to justify the extra effort here. Again, I only need the sub query results for debugging.
Keep the big view and additionally split it into smaller views which are just used for debugging:
This might be a way to go but we all know that having logic defined in two places is never a good idea (DRY).
You could use a function that returns a table of records.
In this function you can according to the input parameters construct your sql differently and return the result with a cursor.
So for example:
select * from table(cast(run_subquery('my_view_name','new_customers' as a_table_of_records_type));
and this calls the function run_subquery:
function RUN_SUBQUERY (VNAME IN VARCHAR2,QNAME IN VARCHAR2)
return a_table_of_records_type is
query_string varchar2(4000);
begin
if VNAME = '...' then
query_string := query_string || '....';
end if;
if VNAME = '...' then
query_string := query_string || '....';
end if;
-- Execute string through refcursor and put the output in a_table_of_records_type
return a_table_of_records_type;
end;

How to reuse a large query without repeating it?

If I have two queries, which I will call horrible_query_1 and ugly_query_2, and I want to perform the following two minus operations on them:
(horrible_query_1) minus (ugly_query_2)
(ugly_query_2) minus (horrible_query_1)
Or maybe I have a terribly_large_and_useful_query, and the result set it produces I want to use as part of several future queries.
How can I avoid copying and pasting the same queries in multiple places? How can I "not repeat myself," and follow DRY principles. Is this possible in SQL?
I'm using Oracle SQL. Portable SQL solutions are preferable, but if I have to use an Oracle specific feature (including PL/SQL) that's OK.
create view horrible_query_1_VIEW as
select .. ...
from .. .. ..
create view ugly_query_2_VIEW as
select .. ...
from .. .. ..
Then
(horrible_query_1_VIEW) minus (ugly_query_2_VIEW)
(ugly_query_2_VIEW) minus (horrible_query_1_VIEW)
Or, maybe, with a with clause:
with horrible_query_1 as (
select .. .. ..
from .. .. ..
) ,
ugly_query_2 as (
select .. .. ..
.. .. ..
)
(select * from horrible_query_1 minus select * from ugly_query_2 ) union all
(select * from ugly_query_2 minus select * from horrible_query_1)
If you want to reuse the SQL text of the queries, then defining views is the best way, as described earlier.
If you want to reuse the result of the queries, then you should consider global temporary tables. These temporary tables store data for the duration of session or transaction (whichever you choose). These are really useful in case you need to reuse calculated data many times over, especially if your queries are indeed "ugly" and "horrible" (meaning long running). See Temporary tables for more information.
If you need to keep the data longer than a session, you can consider materialized views.
Since you're using Oracle, I'd create Pipelined TABLE functions.
The function takes parameters and returns an object (which you have to create)
and then you SELECT * or even specific columns from it using the TABLE() function and can use it with a WHERE clause or with JOINs. If you want a unit of reuse (a function) you're not restricted to just returning values (i.e a scalar function) you can write a function that returns rows or recordsets.
something like this:
FUNCTION RETURN_MY_ROWS(Param1 IN type...ParamX IN Type)
RETURN PARENT_OBJECT PIPELINED
IS
local_curs cursor_alias; --you need a cursor alias if this function is in a Package
out_rec ROW_RECORD_OF_CUSTOM_OBJECT:=ROW_RECORD_OF_CUSTOM_OBJECT(NULL, NULL,NULL) --one NULL for each field in the record sub-object
BEGIN
OPEN local_curs FOR
--the SELECT query that you're trying to encapsulate goes here
-- and it can be very detailed/complex and even have WITH () etc..
SELECT * FROM baseTable WHERE col1 = x;
-- now that you have captured the SELECT into a Cursor
-- here you put a LOOP to take what's in the cursor and put it in the
-- child object (that holds the individual records)
LOOP
FETCH local_curs --opening the ref-cursor
INTO out_rec.COL1,
out_rec.COL2,
out_rec.COL3;
EXIT WHEN local_curs%NOTFOUND;
PIPE ROW(out_rec); --piping out the Object
END LOOP;
CLOSE local_curs; -- always do this
RETURN; -- we're now done
END RETURN_MY_ROWS;
after you've done that, you can use it like so
SELECT * FROM TABLE(RETURN_MY_ROWS(val1, val2));
you can INSERT SELECT or even CREATE TABLE out of it , you can have it in joins.
two more things to mention:
--ROW_RECORD_OF_CUSTOM_OBJECT is something along these lines
CREATE or REPLACE TYPE ROW_RECORD_OF_CUSTOM_OBJECT AS OBJECT
(
col1 type;
col2 type;
...
colx type;
);
and PARENT_OBJECT is a table of the other object (with the field definitions) we just made
create or replace TYPE PARENT_OBJECT IS TABLE OF ROW_RECORD_OF_CUSTOM_OBJECT;
so this function needs two OBJECTs to support it, but one is a record, the other is a table of that record (you have to create the record first).
In a nutshell, the function is easy to write, you need a child object (with fields), and a parent object that will house that child object that is of type TABLE of the child object, and you open the original base-table fetching SQL into a SYS_REFCURSOR (which you may need to alias) if you're in a package and you read from that cursor from a loop into the individual records.
The function returns a type of PARENT_OBJECT but inside it packs the records sub-object with values from the cursor.
I hope this works for you (there may be permissioning issues with your DBA if you want to create OBJECTs and Table functions)*/
If you operate with values, you could write functions.
Here you find infos on how to do it. It basically works like writing a function in any language. You can define parameters and return values.
Which gives you the cool possibility to write code just once. Here is how you do it:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5009.htm
Have you tried using RESULT_CACHE hint in your queries? Also, you could
ALTER SESSION SET RESULT_CACHE_MODE=FORCE
and see if it helps.