How to reuse a large query without repeating it? - sql

If I have two queries, which I will call horrible_query_1 and ugly_query_2, and I want to perform the following two minus operations on them:
(horrible_query_1) minus (ugly_query_2)
(ugly_query_2) minus (horrible_query_1)
Or maybe I have a terribly_large_and_useful_query, and the result set it produces I want to use as part of several future queries.
How can I avoid copying and pasting the same queries in multiple places? How can I "not repeat myself," and follow DRY principles. Is this possible in SQL?
I'm using Oracle SQL. Portable SQL solutions are preferable, but if I have to use an Oracle specific feature (including PL/SQL) that's OK.

create view horrible_query_1_VIEW as
select .. ...
from .. .. ..
create view ugly_query_2_VIEW as
select .. ...
from .. .. ..
Then
(horrible_query_1_VIEW) minus (ugly_query_2_VIEW)
(ugly_query_2_VIEW) minus (horrible_query_1_VIEW)
Or, maybe, with a with clause:
with horrible_query_1 as (
select .. .. ..
from .. .. ..
) ,
ugly_query_2 as (
select .. .. ..
.. .. ..
)
(select * from horrible_query_1 minus select * from ugly_query_2 ) union all
(select * from ugly_query_2 minus select * from horrible_query_1)

If you want to reuse the SQL text of the queries, then defining views is the best way, as described earlier.
If you want to reuse the result of the queries, then you should consider global temporary tables. These temporary tables store data for the duration of session or transaction (whichever you choose). These are really useful in case you need to reuse calculated data many times over, especially if your queries are indeed "ugly" and "horrible" (meaning long running). See Temporary tables for more information.
If you need to keep the data longer than a session, you can consider materialized views.

Since you're using Oracle, I'd create Pipelined TABLE functions.
The function takes parameters and returns an object (which you have to create)
and then you SELECT * or even specific columns from it using the TABLE() function and can use it with a WHERE clause or with JOINs. If you want a unit of reuse (a function) you're not restricted to just returning values (i.e a scalar function) you can write a function that returns rows or recordsets.
something like this:
FUNCTION RETURN_MY_ROWS(Param1 IN type...ParamX IN Type)
RETURN PARENT_OBJECT PIPELINED
IS
local_curs cursor_alias; --you need a cursor alias if this function is in a Package
out_rec ROW_RECORD_OF_CUSTOM_OBJECT:=ROW_RECORD_OF_CUSTOM_OBJECT(NULL, NULL,NULL) --one NULL for each field in the record sub-object
BEGIN
OPEN local_curs FOR
--the SELECT query that you're trying to encapsulate goes here
-- and it can be very detailed/complex and even have WITH () etc..
SELECT * FROM baseTable WHERE col1 = x;
-- now that you have captured the SELECT into a Cursor
-- here you put a LOOP to take what's in the cursor and put it in the
-- child object (that holds the individual records)
LOOP
FETCH local_curs --opening the ref-cursor
INTO out_rec.COL1,
out_rec.COL2,
out_rec.COL3;
EXIT WHEN local_curs%NOTFOUND;
PIPE ROW(out_rec); --piping out the Object
END LOOP;
CLOSE local_curs; -- always do this
RETURN; -- we're now done
END RETURN_MY_ROWS;
after you've done that, you can use it like so
SELECT * FROM TABLE(RETURN_MY_ROWS(val1, val2));
you can INSERT SELECT or even CREATE TABLE out of it , you can have it in joins.
two more things to mention:
--ROW_RECORD_OF_CUSTOM_OBJECT is something along these lines
CREATE or REPLACE TYPE ROW_RECORD_OF_CUSTOM_OBJECT AS OBJECT
(
col1 type;
col2 type;
...
colx type;
);
and PARENT_OBJECT is a table of the other object (with the field definitions) we just made
create or replace TYPE PARENT_OBJECT IS TABLE OF ROW_RECORD_OF_CUSTOM_OBJECT;
so this function needs two OBJECTs to support it, but one is a record, the other is a table of that record (you have to create the record first).
In a nutshell, the function is easy to write, you need a child object (with fields), and a parent object that will house that child object that is of type TABLE of the child object, and you open the original base-table fetching SQL into a SYS_REFCURSOR (which you may need to alias) if you're in a package and you read from that cursor from a loop into the individual records.
The function returns a type of PARENT_OBJECT but inside it packs the records sub-object with values from the cursor.
I hope this works for you (there may be permissioning issues with your DBA if you want to create OBJECTs and Table functions)*/

If you operate with values, you could write functions.
Here you find infos on how to do it. It basically works like writing a function in any language. You can define parameters and return values.
Which gives you the cool possibility to write code just once. Here is how you do it:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5009.htm

Have you tried using RESULT_CACHE hint in your queries? Also, you could
ALTER SESSION SET RESULT_CACHE_MODE=FORCE
and see if it helps.

Related

How can I create a calculate column in the creation of table in POSTGRESQL, for example in sql server LineTotal AS Price * Quantity [duplicate]

Does PostgreSQL support computed / calculated columns, like MS SQL Server? I can't find anything in the docs, but as this feature is included in many other DBMSs I thought I might be missing something.
Eg: http://msdn.microsoft.com/en-us/library/ms191250.aspx
Postgres 12 or newer
STORED generated columns are introduced with Postgres 12 - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL, and Oracle. Or the similar "computed columns" of SQL Server.
Trivial example:
CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);
fiddle
VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 15, yet).
Related:
Attribute notation for function call gives error
Postgres 11 or older
Up to Postgres 11 "generated columns" are not supported.
You can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:
Store common query as column?
The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.
Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:
CREATE FUNCTION col(tbl) ... AS ... -- your computed expression here
CREATE INDEX ON tbl(col(tbl));
Alternatives
Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.
"Persisted" (STORED) computed columns can be implemented with triggers in a functionally equivalent way.
Materialized views are a related concept, implemented since Postgres 9.3.
In earlier versions one can manage MVs manually.
YES you can!! The solution should be easy, safe, and performant...
I'm new to postgresql, but it seems you can create computed columns by using an expression index, paired with a view (the view is optional, but makes makes life a bit easier).
Suppose my computation is md5(some_string_field), then I create the index as:
CREATE INDEX some_string_field_md5_index ON some_table(MD5(some_string_field));
Now, any queries that act on MD5(some_string_field) will use the index rather than computing it from scratch. For example:
SELECT MAX(some_field) FROM some_table GROUP BY MD5(some_string_field);
You can check this with explain.
However at this point you are relying on users of the table knowing exactly how to construct the column. To make life easier, you can create a VIEW onto an augmented version of the original table, adding in the computed value as a new column:
CREATE VIEW some_table_augmented AS
SELECT *, MD5(some_string_field) as some_string_field_md5 from some_table;
Now any queries using some_table_augmented will be able to use some_string_field_md5 without worrying about how it works..they just get good performance. The view doesn't copy any data from the original table, so it is good memory-wise as well as performance-wise. Note however that you can't update/insert into a view, only into the source table, but if you really want, I believe you can redirect inserts and updates to the source table using rules (I could be wrong on that last point as I've never tried it myself).
Edit: it seems if the query involves competing indices, the planner engine may sometimes not use the expression-index at all. The choice seems to be data dependant.
One way to do this is with a trigger!
CREATE TABLE computed(
one SERIAL,
two INT NOT NULL
);
CREATE OR REPLACE FUNCTION computed_two_trg()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
BEGIN
NEW.two = NEW.one * 2;
RETURN NEW;
END
$BODY$;
CREATE TRIGGER computed_500
BEFORE INSERT OR UPDATE
ON computed
FOR EACH ROW
EXECUTE PROCEDURE computed_two_trg();
The trigger is fired before the row is updated or inserted. It changes the field that we want to compute of NEW record and then it returns that record.
PostgreSQL 12 supports generated columns:
PostgreSQL 12 Beta 1 Released!
Generated Columns
PostgreSQL 12 allows the creation of generated columns that compute their values with an expression using the contents of other columns. This feature provides stored generated columns, which are computed on inserts and updates and are saved on disk. Virtual generated columns, which are computed only when a column is read as part of a query, are not implemented yet.
Generated Columns
A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables.
CREATE TABLE people (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54) STORED
);
db<>fiddle demo
Well, not sure if this is what You mean but Posgres normally support "dummy" ETL syntax.
I created one empty column in table and then needed to fill it by calculated records depending on values in row.
UPDATE table01
SET column03 = column01*column02; /*e.g. for multiplication of 2 values*/
It is so dummy I suspect it is not what You are looking for.
Obviously it is not dynamic, you run it once. But no obstacle to get it into trigger.
Example on creating an empty virtual column
,(SELECT *
From (values (''))
A("virtual_col"))
Example on creating two virtual columns with values
SELECT *
From (values (45,'Completed')
, (1,'In Progress')
, (1,'Waiting')
, (1,'Loading')
) A("Count","Status")
order by "Count" desc
I have a code that works and use the term calculated, I'm not on postgresSQL pure tho we run on PADB
here is how it's used
create table some_table as
select category,
txn_type,
indiv_id,
accum_trip_flag,
max(first_true_origin) as true_origin,
max(first_true_dest ) as true_destination,
max(id) as id,
count(id) as tkts_cnt,
(case when calculated tkts_cnt=1 then 1 else 0 end) as one_way
from some_rando_table
group by 1,2,3,4 ;
A lightweight solution with Check constraint:
CREATE TABLE example (
discriminator INTEGER DEFAULT 0 NOT NULL CHECK (discriminator = 0)
);

How to use a temp sequence within a Postgresql function

I have some lines of SQL which will take a set of IDs from the same GROUP_ID that are not contiguous (ex. if some rows got deleted) and will make them contiguous again. I wanted to turn this into a function for reusability purposes. The lines work if executed individually but when I try to create the function I get the error
ERROR: relation "id_seq_temp" does not exist
LINE 10: UPDATE THINGS SET ID=nextval('id_se...
If I create a sequence outside of the function and use that sequence in the function instead then the function is created successfully (schema qualified or unqualified). However I felt like creating the temp sequence inside of the function rather than leaving it in the schema was a cleaner solution.
I have seen this question: Function shows error "relation my_table does not exist"
However, I'm using the public schema and schema qualifying the sequence with public. does not seem to help.
I've also seen this question: How to create a sql function using temp sequences and a SELECT on PostgreSQL8. I probably could use generate_series but this adds a lot of complexity that SERIES solves such as needing to know how big of a series to generate.
Here is my function, I anonymized some of the names - just in case there's a typo.
CREATE OR REPLACE FUNCTION reindex_ids(IN BIGINT) RETURNS VOID
LANGUAGE SQL
AS $$
CREATE TEMPORARY SEQUENCE id_seq_temp
MINVALUE 1
START WITH 1
INCREMENT BY 1;
ALTER SEQUENCE id_seq_temp RESTART;
UPDATE THINGS SET ID=ID+2000 WHERE GROUP_ID=$1;
UPDATE THINGS SET ID=nextval('id_seq_temp') WHERE GROUP_ID=$1;
$$;
Is it possible to use a sequence you create within a function later in the function?
Answer to question
The reason is that SQL functions (LANGUAGE sql) are parsed and planned as one. All objects used must exist before the function runs.
You can switch to PL/pgSQL, (LANGUAGE plpgsql) which plans each statement on demand. There you can create objects and use them in the next command.
See:
Why can PL/pgSQL functions have side effect, while SQL functions can't?
Since you are not returning anything, consider a PROCEDURE. (FUNCTION works, too.)
CREATE OR REPLACE PROCEDURE reindex_ids(IN bigint)
LANGUAGE plpgsql AS
$proc$
BEGIN
IF EXISTS ( SELECT FROM pg_catalog.pg_class
WHERE relname = 'id_seq_temp'
AND relnamespace = pg_my_temp_schema()
AND relkind = 'S') THEN
ALTER SEQUENCE id_seq_temp RESTART;
ELSE
CREATE TEMP SEQUENCE id_seq_temp;
END IF;
UPDATE things SET id = id + 2000 WHERE group_id = $1;
UPDATE things SET id = nextval('id_seq_temp') WHERE group_id = $1;
END
$proc$;
Call:
CALL reindex_ids(123);
This creates your temp sequence if it does not exist already.
If the sequence exists, it is reset. (Remember that temporary objects live for the duration of a session.)
In the unlikely event that some other object occupies the name, an exception is raised.
Alternative solutions
Solution 1
This usually works:
UPDATE things t
SET id = t1.new_id
FROM (
SELECT pk_id, row_number() OVER (ORDER BY id) AS new_id
FROM things
WHERE group_id = $1 -- your input here
) t1
WHERE t.pk_id = t1.pk_id;
And only updates each row once, so half the cost.
Replace pk_id with your PRIMARY KEY column, or any UNIQUE NOT NULL (combination of) column(s).
The trick is that the UPDATE typically processes rows according to the sort order of the subquery in the FROM clause. Updating in ascending order should never hit a duplicate key violation.
And the ORDER BY clause of the window function row_number() imposes that sort order on the resulting set. That's an undocumented implementation detail, so you might want to add an explicit ORDER BY to the subquery. But since the behavior of UPDATE is undocumented anyway, it still depends on an implementation detail.
You can wrap that into a plain SQL function.
Solution 2
Consider not doing what you are doing at all. Gaps in sequential numbers are typically expected and not a problem. Just live with it. See:
Serial numbers per group of rows for compound key

FOR loop in Oracle SQL or Apply SQL to multiple Oracle tables

My SQL is a bit rusty, so I don't know whether the following is even possible:
I have multiple tables t_a, t_b, t_c with the same column layout and I want to apply the same operation to them, namely output some aggregation into another table. For a table t_x this would look like this:
CREATE TABLE t_x_aggregate (
<here the col definitions which are the same for all new tables t_[abc]_aggregate>
);
INSERT INTO t_x_aggregate(id, ...)
SELECT id, SUM(factor*amount)
FROM t_x
WHERE some fixed condition
GROUP BY id;
I now want to execute something like a FOR loop around this:
for t_x in t_a, t_b, t_c
CREATE TABLE ...
INSERT INTO ...
end for
Is this possible in SQL? Or would I need to build a wrapper in another language for this?
So, the result of that operation would be 3 new tables? T_A_AGGREGATE, T_B_AGGREGATE and T_C_AGGREGATE?
I think that the fastest way is to write 3 separate CREATE TABLE statements, e.g.
create table t_a_aggregate as
select id, sum(factor * amount) suma
from t_a
where some_condition
group by id;
create table t_b_aggregate as
select id, sum(factor * amount) suma
from t_b
where some_condition
group by id;
create table t_c_aggregate as
select id, sum(factor * amount) suma
from t_c
where some_condition
group by id;
OK; I understand that queries aren't that simple, but nothing much changes - only table names in CREATE and FROM (perhaps somewhere else, but that's more or less "it"). Any decent text editor's search/replace capabilities should be able to do it quickly.
If you want to do it dynamically in a loop (read: PL/SQL), you can - but dynamic SQL doesn't scale, is difficult to maintain, is painful to debug. Therefore, if you're doing it only once, consider running 3 separate statements.
How to do it dynamically?
You'd have to create a string (we usually put them into a locally declared variable) which contains the whole DDL statement. Why? Because you can't execute DDL from a PL/SQL otherwise.
If there are multiple tables and/or columns involved, you'll have to combine "fixed" parts of the statement (like create table, select, from, order by) concatenated with "dynamic" parts - such as column names. Note that in between you have to concatenate commas as separators. Pay attention to usage of multiple single quotes as you have to escape them (or use the q-quoting mechanism).
Also, for multiple columns you'll probably have to do it in a loop, concatenating each new column to previously composed string.
It (the statement stored into the varirable) is the executed by EXECUTE IMMEDIATE. If it is correctly written, it'll succeed. Otherwise, it'll fail, but it won't tell you why it failed (that's why I said difficult debugging").
So, instead of executing it, we usually display that string (using dbms_output.put_line) so that we see how it looks like and - using copy/paste - try to execute it.
Basically, it can be quite complex and - as I said - difficult to maintain and debug.
For the FOR loop you need to use PL/SQL like this:(*)
declare
type array_t is table of varchar2(10);
array array_t := array_t('a', 'b', 'c');
lo_stmt varchar2(2000);
begin
lo_stmt :=
'CREATE TABLE t_'||array(i)||'_aggregate ('||
' <here the col definitions which are the same for all new tables t_[abc]_aggregate>'||
');'||
''||
'INSERT INTO t_'||array(i)||'_aggregate(id, ...)'||
'SELECT id, SUM(factor*amount)'||
'FROM t_'||array(i)||
'WHERE some fixed condition'||
'GROUP BY id;'||
execute immediate lo_stmt;
end loop;
end;
/
Look also at this SO question: How to use Oracle PL/SQL to create...
(*) #Littlefoot describes in the 2nd part of his answer valuable background to this program.

Fetch cursor from multiple tables view PostgreSQL

I try until couple of hours to fetch all the row from a view, created from multiple tables.
I have two tables (Position, and Vector) to which correspond respectively two custom composite types , type_position and type_vector
Position (id, sys_time, lat, lon)
Vector (id, sys_time, speed)
I want to create a temporary VIEW in plpgsql procedure to bring together all Position and Vector and order them by sys_time so i wrote :
CREATE TEMP VIEW posAndVectView AS
SELECT * from position
UNION ALL
SELECT * from vector;
I need to iterate over this view and made some work depends on Position and Vector attributes.
So i think i should use a cursor :
DECLARE
manyRows refcursor;
BEGIN
OPEN manyRows FOR
SELECT * FROM posAndVectView ORDER BY sys_time ASC;
LOOP
FETCH manyRows INTO posOrVect;
EXIT WHEN NOT FOUND;
...
END LOOP;
CLOSE manyRows;
So my question is what should the type of the posOrVect variable be in the DECLARE section ?
It seems to be sometimes a type_vector, sometimes a type_position...
I found the answer it is pretty simple.
I need to declare the posOrVect variable with the record type as its mentionned in the PostgreSQL documentation :
Record variables are similar to row-type variables, but they have no
predefined structure. They take on the actual row structure of the row
they are assigned during a SELECT or FOR command. The substructure of
a record variable can change each time it is assigned to. A
consequence of this is that until a record variable is first assigned
to, it has no substructure, and any attempt to access a field in it
will draw a run-time error.
Simple. The type should be posAndVectView.
Like for any other table, there is also a composite type with the same name as a temporary table.
But do you need a view? You could just open the cursor for the query with the UNION. In that case, you'd use type position because it is the first table.

How to pass a set of rows from one function into another?

Overview
I'm using PostgreSQL 9.1.14, and I'm trying to pass the results of a function into another function. The general idea (specifics, with a minimal example, follow) is that we can write:
select * from (select * from foo ...)
and we can abstract the sub-select away in a function and select from it:
create function foos()
returns setof foo
language sql as $$
select * from foo ...
$$;
select * from foos()
Is there some way to abstract one level farther, so as to be able to do something like this (I know functions cannot actually have arguments with setof types):
create function more_foos( some_foos setof foo )
language sql as $$
select * from some_foos ... -- or unnest(some_foos), or ???
$$:
select * from more_foos(foos())
Minimal Example and Attempted Workarounds
I'm using PostgreSQL 9.1.14. Here's a minimal example:
-- 1. create a table x with three rows
drop table if exists x cascade;
create table if not exists x (id int, name text);
insert into x values (1,'a'), (2,'b'), (3,'c');
-- 2. xs() is a function with type `setof x`
create or replace function xs()
returns setof x
language sql as $$
select * from x
$$;
-- 3. xxs() should return the context of x, too
-- Ideally the argument would be a `setof x`,
-- but that's not allowed (see below).
create or replace function xxs(x[])
returns setof x
language sql as $$
select x.* from x
join unnest($1) y
on x.id = y.id
$$;
When I load up this code, I get the expected output for the table definitions, and I can call and select from xs() as I'd expect. But when I try to pass the result of xs() to xxs(), I get an error that "function xxs(x) does not exist":
db=> \i test.sql
DROP TABLE
CREATE TABLE
INSERT 0 3
CREATE FUNCTION
CREATE FUNCTION
db=> select * from xs();
1 | a
2 | b
3 | c
db=> select * from xxs(xs());
ERROR: function xxs(x) does not exist
LINE 1: select * from xxs(xs());
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
I'm a bit confused about "function xxs(x) does not exist"; since the return type of xs() was setof x, I'd expected that its return type would be setof x (or maybe x[]), not x. Following the complaints about the type, I can get to either of the following , but while with either definition I can select xxs(xs());, I can't select * from xxs(xs());.
create or replace function xxs( x )
returns setof x
language sql as $$
select x.* from x
join unnest(array[$1]) y -- unnest(array[...]) seems pretty bad
on x.id = y.id
$$;
create or replace function xxs( x )
returns setof x
language sql as $$
select * from x
where x.id in ($1.id)
$$;
db=> select xxs(xs());
(1,a)
(2,b)
(3,c)
db=> select * from xxs(xs());
ERROR: set-valued function called in context that cannot accept a set
Summary
What's the right way to pass the results of a set-returning function into another function?
(I have noted that create function … xxs( setof x ) … results in the error: ERROR: functions cannot accept set arguments, so the answer won't literally be passing a set of rows from one function to another.)
Table functions
I perform very high speed, complex database migrations for a living, using SQL as both the client and server language (no other language is used), all running server side, where the code rarely surfaces from the database engine. Table functions play a HUGE role in my work. I don't use "cursors" since they are too slow to meet my performance requirements, and everything I do is result set oriented. Table functions have been an immense help to me in completely eliminating use of cursors, achieving very high speed, and have contributed dramatically towards reducing code volume and improving simplicity.
In short, you use a query that references two (or more) table functions to pass the data from one table function to the next. The select query result set that calls the table functions serves as the conduit to pass the data from one table function to the next. On the DB2 platform / version I work on, and it appears based on a quick look at the 9.1 Postgres manual that the same is true there, you can only pass a single row of column values as input to any of the table function calls, as you've discovered. However, because the table function call happens in the middle of a query's result set processing, you achieve the same effect of passing a whole result set to each table function call, albeit, in the database engine plumbing, the data is passed only one row at a time to each table function.
Table functions accept one row of input columns, and return a single result set back into the calling query (i.e. select) that called the function. The result set columns passed back from a table function become part of the calling query's result set, and are therefore available as input to the next table function, referenced later in the same query, typically as a subsequent join. The first table function's result columns are fed as input (one row at a time) to the second table function, which returns its result set columns into the calling query's result set. Both the first and second table function result set columns are now part of the calling query's result set, and are now available as input (one row at a time) to a third table function. Each table function call widens the calling query's result set via the columns it returns. This can go on an on until you start hitting limits on the width of a result set, which likely varies from one database engine to the next.
Consider this example (which may not match Postgres' syntax requirements or capabilities as I work on DB2). This is one of many design patterns in which I use table functions, is one of the simpler ones, that I think is very illustrative, and one that I anticipate would have broad appeal if table functions were in heavy mainstream use (to my knowledge they are not, but I think they deserve more attention than they are getting).
In this example, the table functions in use are: VALIDATE_TODAYS_ORDER_BATCH, POST_TODAYS_ORDER_BATCH, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH. On the DB2 version I work on, you wrap the table function inside "TABLE( place table function call and parameters here )", but based on quick look at a Postgres manual it appears you omit the "TABLE( )" wrapper.
create table TODAYS_ORDER_PROCESSING_EXCEPTIONS as (
select TODAYS_ORDER_BATCH.*
,VALIDATION_RESULT.ROW_VALID
,POST_RESULT.ROW_POSTED
,WAREHOUSE_RESULT.ROW_WAREHOUSED
from TODAYS_ORDER_BATCH
cross join VALIDATE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as VALIDATION_RESULT ( ROW_VALID ) --example: 1/0 true/false Boolean returned
left join POST_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as POST_RESULT ( ROW_POSTED ) --example: 1/0 true/false Boolean returned
on ROW_VALIDATED = '1'
left join DATA_WAREHOUSE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
as WAREHOUSE_RESULT ( ROW_WAREHOUSED ) --example: 1/0 true/false Boolean returned
on ROW_POSTED = '1'
where coalesce( ROW_VALID, '0' ) = '0' --Capture only exceptions and unprocessed work.
or coalesce( ROW_POSTED, '0' ) = '0' --Or, you can flip the logic to capture only successful rows.
or coalesce( ROW_WAREHOUSED, '0' ) = '0'
) with data
If table TODAYS_ORDER_BATCH contains 1,000,000 rows, then
VALIDATE_TODAYS_ORDER_BATCH will be called 1,000,000 times, once for
each row.
If 900,000 rows pass validation inside VALIDATE_TODAYS_ORDER_BATCH, then POST_TODAYS_ORDER_BATCH will be called 900,000 times.
If only 850,000 rows successfully post, then VALIDATE_TODAYS_ORDER_BATCH needs some loopholes closed LOL, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH will be called 850,000 times.
If 850,000 rows successfully made it into the Data Warehouse (i.e. no additional exceptions were generated), then table TODAYS_ORDER_PROCESSING_EXCEPTIONS will be populated with 1,000,000 - 850,000 = 150,000 exception rows.
The table function calls in this example are only returning a single column, but they could be returning many columns. For example, the table function validating an order row could return the reason why an order failed validation.
In this design, virtually all the chatter between a HLL and the database is eliminated, since the HLL requestor is asking the database to process the whole batch in ONE request. This results in a reduction of millions of SQL requests to the database, in a HUGE removal of millions of HLL procedure or method calls, and as a result provides a HUGE runtime improvement. In contrast, legacy code which often processes a single row at a time, would typically send 1,000,000 fetch SQL requests, 1 for each row in TODAYS_ORDER_BATCH, plus at least 1,000,000 HLL and/or SQL requests for validation purposes, plus at least 1,000,000 HLL and/or SQL requests for posting purposes, plus 1,000,000 HLL and/or SQL requests for sending the order to the data warehouse. Granted, using this table function design, inside the table functions SQL requests are being sent to the database, but when the database makes requests to itself (i.e from inside a table function), the SQL requests are serviced much faster (especially in comparison to a legacy scenario where the HLL requestor is doing single row processing from a remote system, with the worst case over a WAN - OMG please don't do that).
You can easily run into performance problems if you use a table function to "fetch a result set" and then join that result set to other tables. In that case, the SQL optimizer can't predict what set of rows will be returned from the table function, and therefore it can't optimize the join to subsequent tables. For that reason, I rarely use them for fetching a result set, unless I know that result set will be a very small number of rows, hence not causing a performance problem, or I don't need to join to subsequent tables.
In my opinion, one reason why table functions are underutilized is that they are often perceived as only a tool to fetch a result set, which often performs poorly, so they get written off as a "poor" tool to use.
Table functions are immensely useful for pushing more functionality over to the server, for eliminating most of the chatter between the database server and programs on remote systems, and even for eliminating chatter between the database server and external programs on the same server. Even chatter between programs on the same server carries more overhead than many people realize, and much of it is unnecessary. The heart of the power of table functions lies in using them to perform actions inside result set processing.
There are more advanced design patterns for using table functions that build on the above pattern, where you can maximize result set processing even further, but this post is a lot for most to absorb already.