I am trying to get the diff between two nearly identical tables in postgresql. The current query I am running is:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB;
and
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
Each of the above queries takes about 2 minutes to run (Its a large table)
I wanted to combine the two queries in hopes to save time, so I tried:
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
And while it works, it takes 20 minutes to run!!! I would guess that it would at most take 4 minutes, the amount of time to run each query individually.
Is there some extra work UNION is doing that is making it take so long? Or is there any way I can speed this up (with or without the UNION)?
UPDATE: Running the query with UNION ALL takes 15 minutes, almost 4 times as long as running each one on its own, Am I correct in saying that UNION (all) is not going to speed this up at all?
With regards to your "extra work" question. Yes. Union not only combines the two queries but also goes through and removes duplicates. It's the same as using a distinct statement.
For this reason, especially combined with your except statements "union all" would likely be faster.
Read more here:
http://www.postgresql.org/files/documentation/books/aw_pgsql/node80.html
In addition to combining the results of the first and second query, UNION by default also removes duplicate records. (see http://www.postgresql.org/docs/8.1/static/sql-select.html). The extra work involved in checking for duplicate records between the two queries is probably responsible for the extra time. In this situation there should not be any duplicate records so the extra work looking for duplicates can be avoided by specifying UNION ALL.
SELECT * FROM tableA EXCEPT SELECT * FROM tableB
UNION ALL
SELECT * FROM tableB EXCEPT SELECT * FROM tableA;
I don't think your code returns resultset you intend it to. I rather think you want to do this:
SELECT *
FROM (
SELECT * FROM tableA
EXCEPT
SELECT * FROM tableB
) AS T1
UNION
SELECT *
FROM (
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
) AS T2;
In other words, you want the set of mutually exclusive members. If so, you need to read up on relational operator precedence in SQL ;) And when you have, you may realise the above can be rationalised to:
SELECT * FROM tableA
UNION
SELECT * FROM tableB
EXCEPT
SELECT * FROM tableA
INTERSECT
SELECT * FROM tableB;
FWIW, using subqueries (derived tables T1 and T2) to explicitly show (what would otherwise be implicit) relational operator precedence, your original query is this:
SELECT *
FROM (
SELECT *
FROM (
SELECT *
FROM tableA
EXCEPT
SELECT *
FROM tableB
) AS T2
UNION
SELECT *
FROM tableB
) AS T1
EXCEPT
SELECT *
FROM tableA;
The above can be relationalised to:
SELECT *
FROM tableB
EXCEPT
SELECT *
FROM tableA;
...and I think not what is intended.
You could use tableA FULL OUTER JOIN tableB, which would give what you want (with a propre join condition) with only 1 table scan, it probably would be faster than the 2 queries above.
Post more info please.
Related
I have an Oracle query that is performing horrendously and could do with some suggestions as to what could be the cause and/or suggestions on how to improve it. I have detailed below a simplified version of my original query and what I have tried.
Original Query
Select * From
(
SELECT * FROM table1
Union All
SELECT * FROM table2
Union All
SELECT * FROM table3
Union All
SELECT * FROM table4
) GroupedData
LEFT JOIN
(
SELECT * FROM RecursiveCte
) ON GroupedData.id = RecursiveCte.id
I have simplified the queries to generic "select all" statements just for ease of this question.
A couple of points on some of the queries...
The GroupedData subquery is actually more than 4 unions, each one varies in the volume of data it is looking at but is limited in the data returned by date filters. The total data returned from this query is usually 1500 records, although the volume of data being processed could be hundreds of thousands of records. If I run this query on its own, it takes less than a second to return those 1500 rows.
The RecursiveCte subquery makes use of the CONNECT BY functionality as Oracle 10g doesn't have the recursive CTE (which would be so much easier). If I run this query on its own, it also takes less than a second.
The problem comes when I try and join the two together via a LEFT JOIN. When I do this, the query takes over 8 minutes to run for the same date range parameters.
I have tried setting these up in the following CTE formats but they all perform worse!
Method #1
WITH GroupedData AS
(
SELECT * FROM table1
Union All
SELECT * FROM table2
Union All
SELECT * FROM table3
Union All
SELECT * FROM table4
) GroupedData,
RecursiveCte AS
(
SELECT * FROM RecursiveCte
)
Select * From
GroupedData
LEFT JOIN RecursiveCte ON GroupedData.id = RecursiveCte.id
Method #2
WITH Query1 AS
(SELECT * FROM table1),
Query2 AS
(SELECT * FROM table2),
Query3 AS
(SELECT * FROM table3),
Query4 AS
(SELECT * FROM table4),
RecursiveCte AS
(
SELECT * FROM RecursiveCte
)
Select * From
(
Select * From Query1
Union All
Select * From Query2
Union All
Select * From Query3
Union All
Select * From Query4
) GroupedData
LEFT JOIN RecursiveCte ON GroupedData.id = RecursiveCte.id
On top of the limitations of Oracle 10g, I am also running with a database user with readonly permission which limits what I can do within the database.
Any help is very much appreciated, and sorry in advance if I have not provided enough context!
Thanks
When you have two queries that run fast separately, but run slowly together, the easiest solution is usually to add a ROWNUM like below:
Select * From
(
...
--Prevent optimizer transformations to improve performance:
WHERE ROWNUM >= 1
) GroupedData
LEFT JOIN
(
SELECT * FROM RecursiveCte
--Prevent optimizer transformations to improve performance:
WHERE ROWNUM >= 1
) ON GroupedData.id = RecursiveCte.id
See my answer here for a more detailed explanation of why this trick works.
While the above trick is often the easiest solution, it's usually not the best solution. There's always a reason why Oracle is re-writing your query poorly; maybe table statistics are missing, or the conditions are too complicated for Oracle to estimate the number of rows returned, etc. But if you don't want to spend hours investigating SQL Monitoring reports right now, it's OK to take a shortcut.
I have 2 queries. The first is what I want to accomplish, however, it was taking much too long. Here's how the query looked like:
SELECT *
FROM old_table
WHERE id NOT IN (SELECT id FROM new_table)
UNION ALL
SELECT *
FROM new_table
Basically, I want everything that's in the old table, but not in the new table. I then union everything from the old and new. Again, this query was taking much too long on a bigger dataset. So, I optimized it like so:
WITH union_tbl AS (
(SELECT * FROM old_table)
UNION ALL
(SELECT * FROM new_table)
), row_tbl AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY ordering) AS row
FROM union_tbl
)
SELECT *
FROM row_tbl
WHERE row = 1
In the old_table and new_table (in a separate query) I add a new column to each called ordering. The new_table gets 1 for all its rows. For the old table, it gets 2 for all its rows. In the end, I select where the row number is 1. So that means if there's 2 rows one for new_table and one for old_table then it should get the new_table row instead.
This is how I'm envisioning it, however, I'm not getting the same results as my previous query. I'd expect it to have the same exact results with the same rows. What am I getting wrong? Is my query logic incorrect?
Can you try the below code?
SELECT a.*
FROM old_table a left join new_table b on a.id=b.id
WHERE b.id IS NULL
UNION ALL
SELECT *
FROM new_table
This is a very optimized way
Thanks in advance !!
I want to get below data in separate table with column how can we achieved this.
From my reading of your question, you would like the results of that SELECT statement put into a new table?
Firstly, I'm assuming your original SQL works as a SELECT statement - e.g., all those tables have the same structure. Note that you can simplify the unions, but I haven't done so here, to keep the key part of the answer (saving the data) as the main focus.
To save the data into another table, you can either create a table first and make that into an insert, or just use 'SELECT INTO' within the main SELECT.
If you are happy with the columns being automatically created, the 'SELECT INTO' version will create columns (e.g., you do not need to specify the columns in a CREATE TABLE statement). However, when you run the SELECT INTO, it does create the table. Therefore if you want to insert further values, you need to specify the column list (or have matching column lists).
SELECT INTO version
select *
INTO #Temp -- Added This row
from
( select * from #OneyearExpiry
union all
select * from #OtherYearExpiry
) A
except
select * from
( select * from #ONEYRCON
union all
select * from #OTHERYRCON
) B
INSERT INTO version
CREATE TABLE #Temp (<your fields here to match the SELECT statement>)
INSERT INTO #Temp
select * from
( select * from #OneyearExpiry
union all
select * from #OtherYearExpiry
) A
except
select * from
( select * from #ONEYRCON
union all
select * from #OTHERYRCON
) B
Set operators are evaluated from top to bottom so there only needs to be 1 subquery. Something like this
select ab.* into #Temp
from (select * from #OneyearExpiry
union all
select * from #OtherYearExpiry
except
select * from #ONEYRCON
except
select * from #OTHERYRCON) ab;
I have two tables that are exactly the same. I want to join them together into one large dataset. I tried simply SELECT-INTO query but got an error...
SELECT * INTO dbo.ParkingBay
FROM (SELECT * FROM dbo.ParkingBay_Old
UNION
SELECT * FROM dbo.ParkingBay_New) AS PARKING_BAY;
The error is:
The geometry data type cannot be selected as DISTINCT because it is
not comparable.
The UNION performs a DISTINCT on the combined result set.
UNION ALL eliminates this DISTINCT step, but would create the possibility of dupes in the result.
If you are OK with dupe possibility, then try this
SELECT * INTO dbo.ParkingBay
FROM (SELECT * FROM dbo.ParkingBay_Old
UNION ALL
SELECT * FROM dbo.ParkingBay_New) AS PARKING_BAY;
It looks like ALL solves everything:
SELECT * INTO dbo.ParkingBay
FROM (SELECT * FROM dbo.ParkingBay_Old
UNION ALL
SELECT * FROM dbo.ParkingBay_New) AS PARKING_BAY;
For this MySQL SELECT statement:
SELECT * FROM MY_TABLE WHERE ID IN(x,y,y,z):
I want 4 rows back - ie I WANT row duplication for the case where I pass duplicate IDs in the list.
Is this possible?
using the IN() construct, that's not possible.
the only way i can think to do this is with a UNION:
SELECT * FROM my_table WHERE id = x
UNION ALL
SELECT * FROM my_table WHERE id = y
UNION ALL
SELECT * FROM my_table WHERE id = y
UNION ALL
SELECT * FROM my_table WHERE id = z
but in all honesty, i would just do the IN() like you have it and make your app code duplicate the rows as needed.
Put your IDs, including dups in a temp table and join your results on that table. The join will take care of filtering, but will keep duplicates if it's in the temp table twice
SELECT * FROM MY_TABLE WHERE ID IN(x,y,z)
union all
SELECT * FROM MY_TABLE WHERE ID IN(y)
To me, IN specify a set of values to search in (and duplication is a concept that conflict with the set one).
You should use other mean to reach your scope.