UNION with WHERE clause - sql

I'm doing a UNION of two queries on an Oracle database. Both of them have a WHERE clause. Is there a difference in the performance if I do the WHERE after UNIONing the queries compared to performing the UNION after WHERE clause?
For example:
SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colA, colB FROM tableB WHERE colA > 1
compared to:
SELECT *
FROM (SELECT colA, colB FROM tableA
UNION
SELECT colA, colB FROM tableB)
WHERE colA > 1
I believe in the second case, it performs a full table scan on both the tables affecting the performance. Is that correct?

In my experience, Oracle is very good at pushing simple predicates around. The following test was made on Oracle 11.2. I'm fairly certain it produces the same execution plan on all releases of 10g as well.
(Please people, feel free to leave a comment if you run an earlier version and tried the following)
create table table1(a number, b number);
create table table2(a number, b number);
explain plan for
select *
from (select a,b from table1
union
select a,b from table2
)
where a > 1;
select *
from table(dbms_xplan.display(format=>'basic +predicate'));
PLAN_TABLE_OUTPUT
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | SORT UNIQUE | |
| 3 | UNION-ALL | |
|* 4 | TABLE ACCESS FULL| TABLE1 |
|* 5 | TABLE ACCESS FULL| TABLE2 |
---------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("A">1)
5 - filter("A">1)
As you can see at steps (4,5), the predicate is pushed down and applied before the sort (union).
I couldn't get the optimizer to push down an entire sub query such as
where a = (select max(a) from empty_table)
or a join. With proper PK/FK constraints in place it might be possible, but clearly there are limitations :)

NOTE: While my advice was true many years ago, Oracle's optimizer has improved so that the location of the where definitely no longer matters here. However preferring UNION ALL vs UNION will always be true, and portable SQL should avoid depending on optimizations that may not be in all databases.
Short answer, you want the WHERE before the UNION and you want to use UNION ALL if at all possible. If you are using UNION ALL then check the EXPLAIN output, Oracle might be smart enough to optimize the WHERE condition if it is left after.
The reason is the following. The definition of a UNION says that if there are duplicates in the two data sets, they have to be removed. Therefore there is an implicit GROUP BY in that operation, which tends to be slow. Worse yet, Oracle's optimizer (at least as of 3 years ago, and I don't think it has changed) doesn't try to push conditions through a GROUP BY (implicit or explicit). Therefore Oracle has to construct larger data sets than necessary, group them, and only then gets to filter. Thus prefiltering wherever possible is officially a Good Idea. (This is, incidentally, why it is important to put conditions in the WHERE whenever possible instead of leaving them in a HAVING clause.)
Furthermore if you happen to know that there won't be duplicates between the two data sets, then use UNION ALL. That is like UNION in that it concatenates datasets, but it doesn't try to deduplicate data. This saves an expensive grouping operation. In my experience it is quite common to be able to take advantage of this operation.
Since UNION ALL does not have an implicit GROUP BY in it, it is possible that Oracle's optimizer knows how to push conditions through it. I don't have Oracle sitting around to test, so you will need to test that yourself.

Just a caution
If you tried
SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB WHERE colA > 1
compared to:
SELECT *
FROM (SELECT colA, colB FROM tableA
UNION
SELECT colX, colA FROM tableB)
WHERE colA > 1
Then in the second query, the colA in the where clause will actually have the colX from tableB, making it a very different query. If columns are being aliased in this way, it can get confusing.

You need to look at the explain plans, but unless there is an INDEX or PARTITION on COL_A, you are looking at a FULL TABLE SCAN on both tables.
With that in mind, your first example is throwing out some of the data as it does the FULL TABLE SCAN. That result is being sorted by the UNION, then duplicate data is dropped. This gives you your result set.
In the second example, you are pulling the full contents of both tables. That result is likely to be larger. So the UNION is sorting more data, then dropping the duplicate stuff. Then the filter is being applied to give you the result set you are after.
As a general rule, the earlier you filter away data, the smaller the data set, and the faster you will get your results. As always, your milage may vary.

SELECT * FROM (SELECT colA, colB FROM tableA UNION SELECT colA, colB FROM tableB) as tableC WHERE tableC.colA > 1
If we're using a union that contains the same field name in 2 tables, then we need to give a name to the sub query as tableC(in above query). Finally, the WHERE condition should be WHERE tableC.colA > 1

I would make sure you have an index on ColA, and then run both of them and time them. That would give you the best answer.

i think it will depend on many things - run EXPLAIN PLAN on each one to see what your optimizer selects. Otherwise - as #rayman suggests - run them both and time them.

SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB

SELECT *
FROM (SELECT * FROM can
UNION
SELECT * FROM employee) as e
WHERE e.id = 1;

Related

Oracle ordered resultset without order by clause

Is there a way to get "ordered" resultset from oracle table without actually using an "ORDER BY" clause?
I am working on an application that reads data from oracle table (which has no unique column) and I want to introduce some sort of resume mechanism so that in case of query failure (e.g. network error during fetch) we avoid reading rows that are fetched already.
The application is developed using oracle OCI and currently simple select queries are used.
Is there any efficient mechanism to achieve this?
In some very special condition you have a defined order of results without given any ORDER BY clause. However, you shoul not rely on that, Oracle may change this behaviour any time.
Maybe you can count total number of rows (read SQL%ROWCOUNT after executioin of the query) and check this number with received records on your client.
As Wernfried pointed out, there is no reliable way to get ordered results without any ORDER BY. But the question assumes that ORDER BY is impossible because there are no unique columns. There are at least two workarounds to this.
1. ROWID. Every Oracle row has a unique pseudo-column, ROWID. The application could ORDER BY ROWID, store the latest ROWID, and then use WHERE ROWID <= :rowid to pick up where it left off. Note that ROWIDs can change, if the table was modified or moved.
2. ROW_NUMBER. Another option is to sort all the data and keep track of the duplicates. If two rows are exactly the same then it does not matter which of the duplicates were returned and processed. The query and application only need to track how many of them have been processed. Then it can later process the rest.
drop table test1;
create table test1(a number);
insert into test1 values(1);
insert into test1 values(1);
insert into test1 values(2);
commit;
select a ,row_number() over (order by a /*and all other columns*/) rowNumber
from test1
order by rowNumber
A ROWNUMBER
1 1 --Am I the real #1? It doesn't matter.
1 2
2 3
If there was a failure after the first row, adding the predicate where rownumber > :last_rownumber_processed will get the rest of the rows. The second query may return the "first" 1 instead of the "second" 1, but the application won't care. As with the first workaround, this will fail if the data changes between runs.
Either way, the query must pay for sorting:
----------------------------
| Id | Operation |
----------------------------
| 0 | SELECT STATEMENT |
| 1 | WINDOW SORT |
| 2 | TABLE ACCESS FULL|
----------------------------
case 1:
In order to achieve a simple resume mechanism the only simple way is to use RowID.
select t?.rowid, t1.*, t2.*, t3.*
from table1 t1,
table2 t2,
table3 t3
where t1.? = t2.?
and t2.? = t3.?
and t?.rowid > :rowidProcessedPriorNetworkFailure
order by t?.rowid
Reason for t?.rowid is because you have to choose the leaf table.
in the following case you should choose t3 as t?.
T2 - T1 : one T2 record may have one or more T1 records
T1 - T3 : one T1 record may have one or more T3 records
But keep in mind that RowID's will change whenever Oracle maintains the underlying physical structure (ie. defragmantation, Regorganization, Move of table to a new datafile)
case 2:
if it is a join, select the leaf table.
Select a column that has the most distinct values. Order by that column. and whenever you have to resume, reverse operations done with the last value on hand.
Hope this will help you.
I have used the following query to get ordered output without using "order by" words. Here, I want to sort empno without using order by clause.
Query:
Select empno, sal, deptno, max(sal) over(partition by empno) from emp;
However you can exclude any columns except empno and max() analytical function.
Let me know if you have any questions.

Is there some way to do the following in SQL?

Let's imagine that we have these 2 tables:
Table 1, with the column:
Field1
1
3
Table 2, with the column:
Field1
2
4
(Well they could also be called in any other way, but I want to represent that the type of table1.field1 is the same as table2.field1).
Would it be possible to do a SQL query that would return the following?
[1,2,3,4], I mean the numbers ordered by any criteria I would want but that criteria aplying to both tables. As far as I know ORDER BY can just ORDER by the values of a column, not by a general criteria like "from lower to higher number. And even if it could I believe the SELECT instruction can't fuse columns. I mean I think the best I could achieve with that instruction would be to get something like [(1,2),(1,4),(3,2),(3,4)] and later work on it, but this can be painful with lots of results.
And the application needs fields to be on different tables, I cannot merge them.
Any idea about how to deal with this?
Thanks a lot for your help.
Edit:
Oh, it was much easier than what I thought, with that instruction is not something hard to achieve.
Thank you everyone.
This is what the UNION statement is for. It lets you combine two SELECT statements into the same resultset:
SELECT Field1
FROM Table1
UNION ALL
SELECT Field1
FROM Table2
ORDER BY 1
can you do union all
Like below:
Select field 1
from
(Select field 1 from Table 1
Union
select field 1 from table 2)
order by field 1
Use union or Union all based on your need to repeat elements in both the tables or not.
select * from
(
select field1 as field_value from table1
union
select field2 as field_value from table2
)
order by field_value asc

distinct values from multiple fields within one table ORACLE SQL

How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...

MySQL: Specifying the order I'd like rows returned in

I have a SQL statement something along these lines:
SELECT * FROM `table` WHERE some_column IN(1,58,22,9);
What I would like is to return the rows in the same order as the some_column values are specified, i.e. 1 before 58 before 22 before 9. The problem is that I have no column that, when sorted, will produce this specific order of rows.
Is there any way I can achieve this?
Use the FIND_IN_SET function:
SELECT *
FROM `table`
WHERE some_column IN(1,58,22,9)
ORDER BY FIND_IN_SET(some_column, '1,58,22,9')
You can use a case to achieve pretty much any sort order:
select *
from TheTable
where some_column in (1,58,22,9)
order by
case some_column
when 9 then 1
when 22 then 2
when 58 then 3
when 1 then 4
end
There's a couple of good solutions here already for ordering using the SQL statement. However taking such an approach has the drawback that it is inflexible: every time you need to select a different set of values you have to modify the SQL statement.
That may be fine if you're simply after a quick and dirty query to analyse data, but if you are likely to need to re-run the query, or modify it, or indeed this is in any sort of production environment whatsoever, then a better solution is to use a sort table. This should contain two columns - your source and your sort value - then write SQL that does the join and returns values ordered by the sort column.
For example:
TheSortTable
SomeColumn SortValue
9 1
22 2
58 3
1 4
The your sql is
SELECT SomeColumn, SomeValue FROM TheTable
INNER JOIN TheSortTable on TheTable.SomeColumn = TheSortTable.SomeColumn
WHERE SomeColumn IN(1,58,22,9)
ORDER BY SortValue
In practical terms this is a far superior to explicitly coding your solution directly into SQL for anything other than a strictly once-off query (and indeed, the general philosophy of the approach is one worth adopting more widely).

What is the difference between UNION and UNION ALL?

What is the difference between UNION and UNION ALL?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):
UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/
UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.
Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.
UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!
UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.