Big Query using a CTE twice on the same query performance - sql

Lets assume I have something like the following simplified query for big query:
WITH test AS (
SELECT 1 AS fieldA, 2 AS fieldB
)
SELECT fieldA, fieldB
FROM test
UNION ALL
SELECT fieldB, fieldA
FROM test;
Will Big Query run the test CTE twice or only once and then share the data between both parts of the union?
I searched before posting this and I know that a CTE query only lives for one SQL statement. But here - there is only one statement which uses same CTE twice and I could not find something similar.
Of course CTE query is more complex in real life scenario and might contain a ROW NUMBER window function and also JOINS.
Thanks a lot.

The documentation is clear for the tables in the with clause (CTE).
BigQuery only materializes the results of recursive CTEs, but does not
materialize the results of non-recursive CTEs inside the WITH clause.
If a non-recursive CTE is referenced in multiple places in a query,
then the CTE is executed once for each reference.
This can be tested by adding a column with rand(). For each usage it will have its own value.
WITH RECURSIVE test AS (
SELECT "normal" AS fieldA, 2 AS fieldB, RAND() AS R),
test_recursive AS
(SELECT "recursive" , 9, RAND() AS R
UNION ALL SELECT * FROM test_recursive
WHERE FALSE )
SELECT * FROM test
UNION ALL SELECT * FROM test
UNION ALL SELECT * FROM test
UNION ALL SELECT * FROM test
UNION ALL SELECT * FROM test_recursive
UNION ALL SELECT * FROM test_recursive
UNION ALL SELECT * FROM test_recursive
order by 1
All real recursive CTEs have the same value for the random one. Therefore, the CTE was only caluculated once.
This query also shows that with two extra lines every CTE can by materialized.

Related

How can I make this query perform faster within Oracle 10g limitations?

I have an Oracle query that is performing horrendously and could do with some suggestions as to what could be the cause and/or suggestions on how to improve it. I have detailed below a simplified version of my original query and what I have tried.
Original Query
Select * From
(
SELECT * FROM table1
Union All
SELECT * FROM table2
Union All
SELECT * FROM table3
Union All
SELECT * FROM table4
) GroupedData
LEFT JOIN
(
SELECT * FROM RecursiveCte
) ON GroupedData.id = RecursiveCte.id
I have simplified the queries to generic "select all" statements just for ease of this question.
A couple of points on some of the queries...
The GroupedData subquery is actually more than 4 unions, each one varies in the volume of data it is looking at but is limited in the data returned by date filters. The total data returned from this query is usually 1500 records, although the volume of data being processed could be hundreds of thousands of records. If I run this query on its own, it takes less than a second to return those 1500 rows.
The RecursiveCte subquery makes use of the CONNECT BY functionality as Oracle 10g doesn't have the recursive CTE (which would be so much easier). If I run this query on its own, it also takes less than a second.
The problem comes when I try and join the two together via a LEFT JOIN. When I do this, the query takes over 8 minutes to run for the same date range parameters.
I have tried setting these up in the following CTE formats but they all perform worse!
Method #1
WITH GroupedData AS
(
SELECT * FROM table1
Union All
SELECT * FROM table2
Union All
SELECT * FROM table3
Union All
SELECT * FROM table4
) GroupedData,
RecursiveCte AS
(
SELECT * FROM RecursiveCte
)
Select * From
GroupedData
LEFT JOIN RecursiveCte ON GroupedData.id = RecursiveCte.id
Method #2
WITH Query1 AS
(SELECT * FROM table1),
Query2 AS
(SELECT * FROM table2),
Query3 AS
(SELECT * FROM table3),
Query4 AS
(SELECT * FROM table4),
RecursiveCte AS
(
SELECT * FROM RecursiveCte
)
Select * From
(
Select * From Query1
Union All
Select * From Query2
Union All
Select * From Query3
Union All
Select * From Query4
) GroupedData
LEFT JOIN RecursiveCte ON GroupedData.id = RecursiveCte.id
On top of the limitations of Oracle 10g, I am also running with a database user with readonly permission which limits what I can do within the database.
Any help is very much appreciated, and sorry in advance if I have not provided enough context!
Thanks
When you have two queries that run fast separately, but run slowly together, the easiest solution is usually to add a ROWNUM like below:
Select * From
(
...
--Prevent optimizer transformations to improve performance:
WHERE ROWNUM >= 1
) GroupedData
LEFT JOIN
(
SELECT * FROM RecursiveCte
--Prevent optimizer transformations to improve performance:
WHERE ROWNUM >= 1
) ON GroupedData.id = RecursiveCte.id
See my answer here for a more detailed explanation of why this trick works.
While the above trick is often the easiest solution, it's usually not the best solution. There's always a reason why Oracle is re-writing your query poorly; maybe table statistics are missing, or the conditions are too complicated for Oracle to estimate the number of rows returned, etc. But if you don't want to spend hours investigating SQL Monitoring reports right now, it's OK to take a shortcut.

Missing rows when converting select to subquery

I am flabbergasted by these results. When wrapping a query in a sub-query the group by clause suddenly drops all rows with a certain value.
Could anyone help me figure out what this might happen?
Here is my problem in a nutshell, I do not understand last result:
with my_cte as (
select complex stuff from tables
)
select checktype from my_cte group by checktype
This returns 2 rows: PRE,POST.
with my_cte as (
select complex stuff from tables
)
select distinct checktype from my_cte
This also returns 2 rows: PRE,POST.
with my_cte as (
select complex stuff from tables
)
select * from (
select distinct checktype from my_cte
)
This also returns 2 rows: PRE,POST
with my_cte as (
select complex stuff from tables
)
select * from (
select checktype from my_cte group by checktype
)
This only returns 1 rows! PRE. Why?
The same thing happens if I use another CTE instead of a sub-query.
Why would a subquery in oracle suddenly drop all rows of a certain value?
Oracle version:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
After digging around I found that in my CTEs I use a TABLE function together with a UNION ALL, this seems to be the cause of the trouble:
WITX X AS (
SELECT DISTINCT
T.TS,
TRIM(REGEXP_SUBSTR(TREPSUMMARY, '.+', 1, LEVELS.COLUMN_VALUE)) AS MISSING,
DATE
FROM
BASE_POST_EXCEPTIONS T,
TABLE(CAST(MULTISET(SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= LENGTH (REGEXP_REPLACE( T.CMPLEXTREPSUMMARY, '.+')) + 1) AS ODCINUMBERLIST)) LEVELS
WHERE REGEXP_SUBSTR(T.CMPLEXTREPSUMMARY, '.+', 1, LEVELS.COLUMN_VALUE) LIKE 'my query'
)
Y AS (
SELECT TS, MISSING, DATE FROM G
),
MY_UNION AS (
SELECT * FROM X /* CAUSED TROUBLE SOMEHOW */
UNOIN ALL
SELECT * FROM Y
)
In order to get around the bug I had to hint to the query planner to materilize the tables before the UNION ALL
MY_UNION AS (
SELECT /*+ materialize */ * FROM X
UNOIN ALL
SELECT /*+ materialize */ * FROM Y
)
No idea why this happens. Will try to reverse engineer and create a simple reproducible test case.

Creating SQL UNION where second side of the union depends on first side

I would .like to perform a union of two queries where second query depends on first:
SELECT * FROM company_res t1
UNION
SELECT * FROM company_res t2
WHERE t2.company_id IN (
SELECT c.id
FROM company c
WHERE c.parent_id = t1.company_id
)
ORDER BY company_id, year_code
However, when I run this queries in psql I get an error to the effect that t1 in second query does have a FROM-clause.
Is it possible to have UNION of tow queries that depend on each other?
From your partial example I think you're trying to make a recursive query, and not a classical UNION query, that's an adavnced for of UNIONS if fact.
You need to perform some selections on company_res, and then to add parents of theses companies.
The basic form is:
WITH RECURSIVE t(n) AS (
SELECT 1
UNION ALL
SELECT n+1 FROM t
)
SELECT n FROM t LIMIT 100;
In you case something like that maybe:
WITH RECURSIVE rectable(
company_id,
field2,
field3,
parent_id) AS (
-- here the starting rows, t1 in your example
SELECT
company_res.company_id,
company_res.field2,
company_res.field3,
company.parent_id
FROM company_res
INNER JOIN company ON company_res.company_id=company.id
WHERE (here any condition on the starting points)
UNION ALL
-- here the recursive part
SELECT
orig.company_id,
orig.field2,
orig.field3,
orig.parent_id
FROM rectable rec,company_res orig
INNER JOIN company ON orig.company_id=company.id
WHERE company.parent_id=rec.company_id
-- here you could add some AND sections if you want
)
SELECT company_id,field2, field3,parent_id
FROM rectable
ORDER BY parent_id;
The SELECT * FROM company_res t1 in your query is going to provide you with everything from company_res, regardless of what else you UNION it with from company_res. I doubt that's what you're looking for. See the answer from shahkalpesh.

Postgresql UNION takes 10 times as long as running the individual queries

I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
Each of the above queries takes about 2 minutes to run (Its a large table)
I wanted to combine the two queries in hopes to save time, so I tried:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.
Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?
UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?
With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.
For this reason, especially combined with your except statements "union all" would likely be faster.
Read more here:
http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html
In addition to combining the results of the first and second query, UNION by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
I don't think your code returns resultset you intend it to. I rather think you want to do this:
SELECT *
FROM (
SELECT * FROM tableA
EXCEPT
SELECT * FROM tableB
) AS T1
UNION
SELECT *
FROM (
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
) AS T2;
In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:
SELECT * FROM tableA
UNION
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
INTERSECT
SELECT * FROM tableB;
FWIW, using subqueries (derived tables T1 and T2) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM tableA
EXCEPT
SELECT *
FROM tableB
) AS T2
UNION
SELECT *
FROM tableB
) AS T1
EXCEPT
SELECT *
FROM tableA;
The above can be relationalised to:
SELECT *
FROM tableB
EXCEPT
SELECT *
FROM tableA;
...and I think not what is intended.
You could use tableA FULL OUTER JOIN tableB, which would give what you want (with a propre join condition) with only 1 table scan, it probably would be faster than the 2 queries above.
Post more info please.

Combining several query results into one table, how is the results order determined?

I am retuning table results for different queries but each table will be in the same format and will all be in one final table. If I want the results for query 1 to be listed first and query2 second etc, what is the easiest way to do it?
Does UNION append the table or are is the combination random?
The SQL standard does not guarantee an order unless explicitly called for in an order by clause. In practice, this usually comes back chronologically, but I would not rely on it if the order is important.
Across a union you can control the order like this...
select
this,
that
from
(
select
this,
that
from
table1
union
select
this,
that
from
table2
)
order by
that,
this;
UNION appends the second query to the first query, so you have all the first rows first.
You can use:
SELECT Col1, Col2,...
FROM (
SELECT Col1, Col2,..., 1 AS intUnionOrder
FROM ...
) AS T1
UNION ALL (
SELECT Col1, Col2,..., 2 AS intUnionOrder
FROM ...
) AS T2
ORDER BY intUnionOrder, ...