Reusing results from a SQL query in a following query in Sqlite - sql

I am using a recursive with statement to select all child from a given parent in a table representing tree structured entries. This is in Sqlite (which now supports recursive with).
This allows me to select very quickly thousands of record in this tree whithout suffering the huge performance loss due to preparing thousands of select statements from the calling application.
WITH RECURSIVE q(Id) AS
(
SELECT Id FROM Entity
WHERE Parent=(?)
UNION ALL
SELECT m.Id FROM Entity AS m
JOIN Entity ON m.Id=q.Parent
)
SELECT Id FROM q;
Now, suppose I have related data to these entities in an arbitrary number of other tables, that I want to subsequently load. Due to the arbitrary number of them (in a modular fashion) it is not possible to include the data fetching directly in this one. They must follow it.
But, if for each related tables I then do a SELECT statement, all the performance gain from selecting all the data from the tree directly inside Sqlite is almost useless because I will still stall on thousands of subsequent requests which will each prepare and issue a select statement.
So two questions :
The better solution is to formulate a similar recursive statement for each of the related tables, that will recursively gather the entities from this tree again, and this time select their related data by joining it.
This sounds really more efficient, but it's really tricky to formulate such a statement and I'm a bit lost here.
Now the real mystery is, would there be an even more efficient solution, which would be to somehow keep these results from the last query cached somewhere (the rows with the ids from the entity tree) and join them to the related tables in the following statement without having to recursively iterate over it again ?
Here is a try at the first option, supposing I want to select a field Data from related table Component : is the second UNION ALL legal ?
WITH RECURSIVE q(Data) AS
(
SELECT Id FROM Entity
WHERE Parent=(?)
UNION ALL
SELECT m.Id FROM Entity AS m
JOIN Entity ON m.Id=q.Parent
UNION ALL
SELECT Data FROM Component AS c
JOIN Component ON c.Id=q.Id
)
SELECT Data FROM q;

The documentation says:
 2. The table named on the left-hand side of the AS keyword must appear exactly once in the FROM clause of the right-most SELECT statement of the compound select, and nowhere else.
So your second query is not legal.
However, the CTE behaves like a normal table/view, so you can just join it to the related table:
WITH RECURSIVE q(Id) AS
( ... )
SELECT q.Id, c.Data
FROM q JOIN Component AS c ON q.Id = c.Id
If you want to reuse the computed values in q for multiple queries, there's nothing you can do with CTEs, but you can store them in a temporary table:
CREATE TEMPORARY TABLE q_123 AS
WITH RECURSIVE q(Id) AS
( ... )
SELECT Id FROM q;
SELECT * FROM q_123 JOIN Component ...;
SELECT * FROM q_123 JOIN Whatever ...;
DROP TABLE q_123;

Related

Can I leverage BigQuery (BQ) partition via a join?

I am a Tableau designer, and we are building some views that get filtered by category a lot. Because of this, we tried to create a category_id that would serve as partition. The problem seems to be that if I filter data category only, the partition doesn't get used and the total table GB and cost gets hit.
Our team is trying to see if this could be minimized by using a nested query as follows:
SELECT *
FROM table a
INNER JOIN (
SELECT DISTINCT category_id, category
FROM table
) b
ON a.category_id = b.category_id
WHERE b.category = 'Category A'
The idea is that we could show the user b.category, they select it in Tableau and then the inner join would kick off the partition and limit the bytes returned. When I try this in the BQ interface, the estimated returned size comes back the same.
You'll need to filter on the partitioned field before you make the inner join.
I haven't used tableau before so don't know if this is possible but just an idea. You could create a parameter which is set by the chosen category in tableau, which could be referenced in the where statement of the partitioned table?
SELECT *
FROM table a
INNER JOIN (
SELECT DISTINCT category_id, category
FROM table
Where category = #chosen_category
) b
ON a.category_id = b.category_id;
When you say that your attempts to filter only by category, the partition isn't used, have you actually tested querying the table from the console to test whether the partition is being used or not. If it isn't then you need to look at the partition, but if it is, then you would need to take another look at your Tableau query.
VizQL (Viz query language) is Tableau's sql parser that converts your Tableau viz into SQL for execution, so whilst you cannot really modify the outgoing SQL, you can at least capture it and test which enables you to identify poor performing calculations and/or vizzes, as well as optimise the backend for the queries that Tableau will send.
I've written an article about this here: https://datawonders.atlassian.net/wiki/spaces/TABLEAU/pages/1290600449/Let+s+Talk+Errors+Tuning+6+minute+read
The thing about Tableau is that it treats the source as a derived table, with all filters being placed at the upper-level of the query immediately before the stream,
so your query:
Select *
From table a
Join (
Select Distinct Category_ID, Category
From table
)b On a.category_id = b.category_id
Where b.category = 'Category A'
Will actually look like this (assuming you just select everything):
Select a1.*
From (
Select *
From table a
Join (
Select Distinct Category_ID, Category
From table
)b On a.category_id = b.category_id
)a1
Where a1.category = <your selected category>
So you can see from here that being two-levels deep, your Category table just won't be hit, instead everything shall be read into the spool, the join taking place in tempdb, and only the complete set is filtered immediately before streaming to Tableau.
Bad, underperforming sql it most certainly is.
And this is where the relational method of v2020.2 comes into play, as this has been designed to treat each table as a separate exclusive entity, joins are only made at execution time, so you could build a view that uses data from table a where you are using table b to provide the filtering.
As an alternative, and my preferred overall method is to switch entirely to Custom SQL, utilising this with parameters, as this will enable you to craft and test your own sql to create your own high-performance, low-loading query, but as parameters are parsed before the query is executed, you can place the filtering deep down in the query without the need for a secondary look-up table or filtered derived statement - a select distinct as you are currently using it is still going to produce a large plan, as unless the category column is indexed, the engine shall still need to read every record from the table.
So using parameters, your new query will look something like:
Select a1.*
From (
Select *
From table a
Join lookup_table b On On a.category_id = b.category_id
And b.category = <parameters.pCategory>
)a1
(I've placed the filter condition directly onto the join as this can improve performance in some circumstances, though this actually shouldn't make much difference)
And when used in conjunction with the Set parameter action, you can now use parameters as in/out updateable variables which shall update as the user interacts directly with the viz, instead of the user needing to manually update as they go. If you haven't used these before, I wrote an article about it here: https://community.tableau.com/s/news/a0A4T00000313S0UAI/psst-have-you-had-a-go-with-variables-in-tableau-yet
Steve

force Oracle to process recursive CTE on remote db site (perhaps using DRIVING_SITE hint)

I am trying to fetch data from remote table. The data is expanded from seed set of data in local table using recursive CTE. The query is very slow (300 seed rows to 800 final rows takes 7 minutes).
For other "tiny local, huge remote"-cases with no recursive query the DRIVING_SITE hint works excellently. I also tried to export seed set from local table into auxiliary table on remotedb with same structure and - being logged in remotedb - ran query as pure local query (my_table as p, my_table_seed_copy as i). It took 4s, which encouraged me to believe forcing query to remote site would make query fast.
What's the correct way to force Oracle to execute recursive query on the remote site?
with s (id, data) as (
select p.id, p.data
from my_table#remotedb p
where p.id in (select i.id from my_table i)
union all
select p.id, p.data
from s
join my_table#remotedb p on ...
)
select /*+DRIVING_SITE(p)*/ s.*
from s;
In the query above, I tried
select /*+DRIVING_SITE(p)*/ s.* in main select
select /*+DRIVING_SITE(s)*/ s.* in main select
omitting DRIVING_SITE in whole query
select /*+DRIVING_SITE(x)*/ s.* from s, dual#remotedb x as main select
select /*+DRIVING_SITE(p)*/ p.id, p.data in first inner select
select /*+DRIVING_SITE(p)*/ p.id, p.data in both inner selects
select /*+DRIVING_SITE(p) MATERIALIZE*/ p.id, p.data in both inner selects
(just for completeness - rewriting to connect by is not applicable for this case - actually the query is more complex and uses constructs which cannot be expressed by connect by)
All without success (i.e. data returned after 7 minutes).
Recursive query actually performs breadth-first search - seed rows represent 0-th level and recursive part finds element on n-th level from elements on (n-1)-th level. Original query was intended to be part of merge ... using ... clause.
Hence I rewrote query to PLSQL loop. Every cycle generates one level. Merge prevents insertion of duplicates so finally no new row is added and loop exits (transitive closure is constructed). Pseudocode:
loop
merge into my_table using (
select /*+DRIVING_SITE(r)*/ distinct r.* /*###BULKCOLLECT###*/
from my_table l
join my_table#remotedb r on ... -- same condition as s and p in original question are joined on
) ...
exit when rows_inserted = 0;
end loop;
Actual code is not so simple since DRIVING_SITE actually does not directly work with merge so we have to transfer data via work collection but that's different story. Also the count of inserted rows cannot be easily determined, it must be computed as difference between row count after and before merge.
The solution is not ideal. Anyway it's much faster than recursive CTE (30s, 13 cycles) because queries are provably utilizing the DRIVING_SITE hint.
I will leave question open for some time to wait if somebody finds answer how to make recursive query working or proving it is not possible.

Teiid not performing optimal join

For our Teiid Springboot project we use a row filter in a where clause to determine what results a user gets.
Example:
SELECT * FROM very_large_table WHERE id IN ('01', '03')
We want the context in the IN clause to be dynamic like so:
SELECT * FROM very_large_table WHERE id IN (SELECT other_id from very_small_table)
The problem now is that Teiid gets all the data from very_large_table and only then tries to filter with the where clause, this makes the query 10-20 times slower. The data in this very_small_tableis only about 1-10 records and it is based on the user context we get from Java.
The very_large_table is located on a Oracle database and the very_small_table is on the Teiid Pod/Container. Somehow I can't force Teiid to ship the data to Oracle and perform filtering there.
Things that I have tried:
I have specified the the foreign data wrappers as follows
CREATE FOREING DATA WRAPPER "oracle_override" TYPE "oracle" OPTIONS (EnableDependentsJoins 'true');
CREATE SERVER server_name FOREIGN DATA WRAPPER "oracle_override";
I also tried, exists statement or instead of a where clause use a join clause to see if pushdown happened. Also hints for joins don't seem to matter.
Sadly the performance impact at the moment is that high that we can't reach our performance targets.
Are there any cardinalities on very_small_table and very_large_table? If not the planner will assume a default plan.
You can also use a dependent join hint:
SELECT * FROM very_large_table WHERE id IN /*+ dj */ (SELECT other_id from very_small_table)
Often, exists performs better than in:
SELECT vlt.*
FROM very_large_table vlt
WHERE EXISTS (SELECT 1 FROM very_small_table vst WHERE vst.other_id = vlt.id);
However, this might end up scanning the large table.
If id is unique in vlt and there are no duplicates in vst, then a JOIN might optimize better:
select vlt.*
from very_small_table vst join
very_large_table vlt
on vst.other_id = vlt.id;

Cycle detection for recursive SQL WITH clause

Let us assume that we have normal hierarchical table with parent column pointing to its parent. I wanted to build query that will enumerate all ancestors with SQL WITH clause.
with data_ancestors (par, chi) AS (
SELECT d.parent, d.dat_id
FROM data d
WHERE d.parent IS NOT NULL
UNION ALL
SELECT p.parent, a.chi
FROM dta_ancestors a
JOIN data p
ON p.dat_id = a.par
WHERE p.parent IS NOT NULL
)
select * from dta_ancestors where par = 1 order by chi;
The problem here is that although the data should not contains cycles, it is not guaranteed so. In such wrong case I want to gradually degrade the functionality (loops should be arbitrary broken). However Oracle ends with error during execution on "wrong" data.
I know I can use different more Oracle specific approach like:
select p.dat_id, a.dat_id from data p, data a where a.dat_id in (
select d.dat_id from data d start with d.dat_id = p.dat_id connect by nocycle prior d.dat_id = d.parent
);
or as suggested in this question to make cycle detection by myself.
However are there any other nice solutions (mainly for Oracle but also other DBs) that solves the recursion problems with WITH clause?

What is the advantage of common table expression in sql server

we write CTE sql like below one
WITH yourCTE AS
(
SELECT .... FROM :... WHERE.....
) SELECT * FROM yourCTE
what would be advantage to put sql in with block. i think that if we put complicated sql in with block then we just can write sql like SELECT * FROM yourCTE. as if i am accessing view.
what is added advantage of using CTE in terms of performance. please discuss. thanks
There are a number of cases where a CTE can be really useful:
recursive queries, like walking up a hierarchy tree - that's extremely tricky and cumbersome without a CTE (see here for a sample of a recursive CTE)
anytime you want to use one of the ranking functions like ROW_NUMBER(), RANK(), NTILE() and so forth (see here for info on ranking functions)
in general any case where you need to select a few rows/columns first, based on some criteria, and then do something with these, e.g. update a table, delete duplicates etc.
One case I often use a CTE for is deleting all but the most recent row of a given set of data, e.g. if you have customers and an 1:n relationship to their orders, and you want to delete all but the most recent order (based on an OrderDate), for each customer, it gets quite hairy to do this in SQL without a CTE.
With a CTE and the ranking functions, it's a breeze:
;WITH CustomerOrders AS
(
SELECT
c.CustomerID, o.OrderID,
ROW_NUMBER() OVER(PARTITION BY c.CustomerID ORDER BY o.OrderDate DESC) AS 'RowN'
FROM
dbo.Customer c
INNER JOIN
dbo.Orders o ON o.CustomerID = c.CustomerID
)
DELETE FROM
dbo.Orders
FROM
CustomerOrders co
WHERE
dbo.Orders.OrderID = co.OrderID
AND co.RowN > 1
With this, you create an "inline view" that partitions by CustomerID (e.g. each customer gets rownumbers starting at 1), order by OrderDate DESC (newest order first). For each customer, the newest, most recent order has RowN = 1, so you can easily just delete all other rows and you've done what you wanted to do - piece of cake with a CTE - messy code without it....
This MSDN article describes it the best. The bottom line is that, if you are already selecting the data from a view, you don't have to wrap it in a CTE and THEN select from the CTE. I don't think there's much difference (performance wise) between a CTE and a view. At least not in my experience (and I've been working with some complex database structures housing tons of records recently). A CTE is, however, ideal for recursive selects.
Another thing, though, is that a CTE can be beneficial if you'd be selecting the same subset of joined data multiple times in your query/ies and DON'T have a view defined for it. I think it's overkill if you'll be joining data just for a single query and then wrapping it up in a CTE. The query path will still get cached even though you're not using a CTE...
Making recursive query.
Hold a query output virtually in a temporary area named as given while definition.
No need to save Meta data.
Useful when there is need to do more operation on some query output.
Query output retain while till then query is running
Best use of holding temporary data for further processing.
Allow more grouping option than a single query.
Allow to get scalar data from a complicated query
Good evening friends..Today we are going to learn about Common table expression that is a new feature which was introduced in SQL server 2005 and available in later versions as well.
Common table Expression :- Common table expression can be defined as a temporary result set or in other words its a substitute of views in SQL Server. Common table expression is only valid in the batch of statement where it was defined and cannot be used in other sessions.
Syntax of declaring CTE(Common table expression) :-
with [Name of CTE]
as
(
Body of common table expression
)
Lets take an example :-
CREATE TABLE Employee([EID] [int] IDENTITY(10,5) NOT NULL,[Name] [varchar](50) NULL)
insert into Employee(Name) values('Neeraj')
insert into Employee(Name) values('dheeraj')
insert into Employee(Name) values('shayam')
insert into Employee(Name) values('vikas')
insert into Employee(Name) values('raj')
CREATE TABLE DEPT(EID INT,DEPTNAME VARCHAR(100))
insert into dept values(10,'IT')
insert into dept values(15,'Finance')
insert into dept values(20,'Admin')
insert into dept values(25,'HR')
insert into dept values(10,'Payroll')
I have created two tables employee and Dept and inserted 5 rows in each table. Now I would like to join these tables and create a temporary result set to use it further.
With CTE_Example(EID,Name,DeptName)
as
(
select Employee.EID,Name,DeptName from Employee
inner join DEPT on Employee.EID =DEPT.EID
)
select * from CTE_Example
Lets take each line of the statement one by one and understand.
To define CTE we write "with" clause, then we give a name to the table expression, here I have given name as "CTE_Example"
Then we write "As" and enclose our code in two brackets (---), we can join multiple tables in the enclosed brackets.
In the last line, I have used "Select * from CTE_Example" , we are referring the Common table expression in the last line of code, So we can say that Its like a view, where we are defining and using the view in a single batch and CTE is not stored in the database as a permanent object. But it behaves like a view. we can perform delete and update statement on CTE and that will have direct impact on the referenced table those are being used in CTE. Lets take an example to understand this fact.
With CTE_Example(EID,DeptName)
as
(
select EID,DeptName from DEPT
)
delete from CTE_Example where EID=10 and DeptName ='Payroll'
In the above statement we are deleting a row from CTE_Example and it will delete the data from the referenced table "DEPT" that is being used in the CTE.
I hope this article will be helpful to you and you will be able to use CTE whenever you find it suitable.