SQL UNION result not the same as the result of individual statements? - sql

Is it possible for UNION of two sql statements to produce an output that is different from output of executing each of the two statements separately?

Yes. Absolutely.
The union removes duplicates. If you want the same results, use union all.
Also note: the ordering after union and union all is not guaranteed. If you want things in a particular order, use order by.

Related

Union vs. Union ALL if rows are unique

I would like to optimize performance while bringing together queries on many SAS data sets with the same metadata. At this point I have:
select * from
(select t1.column_a, t1.column_b
from table t1)
Union
(select t2.column_a, t2.column_b
from table t2)
and so on.
Each query brings up unique rows, do I save time wise if I use use Union All instead?
Yes. you are correct. Please refer this. What is the difference between UNION and UNION ALL?
If you are pretty sure that you don't have duplicates, then you can just use the UNION ALL instead of UNION. The later lacks in performance as it has to remove the duplicates
At some point during UNION there will be checks for duplicates. Even if these checks all turn up false, they are an extra step. UNION ALL will probably be more efficient but as dfundako pointed out, you'll have to test and see to be sure of a difference in speed.

Fetch data sequentially using union operator

I am fetching data using Union operator. I want my output to be in the same order as my select queries are fetching but instead Union sorts it in alphabetical order. Can you suggest me a way to avoid getting it sorted by default.
Try to do it in a subquery like this:
select * from (select x , y ,z from table1
UNION ALL
select x,y,z from table2)
order by y
Thilo is correct: to be safe, a query should always explicitly order the results. Relying on implicit sorting has caused many problems in the past and will continue to cause more problems in the future.
The suggestion that UNION ALL will avoid the sorting is almost always correct. It should work in 11g and below. But 12c introduced concurrent execution of union all, which does not guarantee the order of results any more.
Even if implicit sorting works right now it's always a good idea to add an ORDER BY.
You should not rely on the order if you do not specify any ORDER BY clause.
UNION guaranties uniquines of the result. Pre 10g versions used sort to remove duplicates. Newer Oracle version also might (but not must) use a hash-table to remove duplicates - therefore the result is not necessarily sorted.
UNION ALL does not care about uniquines.
You can simply type:
select x , y ,z from table1
UNION ALL
select x,y,z from table2
order by y
Order by applies onto the whole result.

does union or union all make a big difference?

i have a code that works,
CURSOR c_val IS
SELECT 'Percentage' destype,
1 descode
FROM dual
UNION ALL --16028 change from union to union all for better performance
SELECT 'Unit'destype,
2 descode
FROM dual
ORDER BY descode DESC;
--
i change the union to union all, from what i have read it wouldnt make a big differnece as long as i have a order by (which i do) , is this truth or should i just leave it the way it was , it still works both ways
The choice between UNION and UNION ALL is not whether one of them works or not, or whether one of them is faster performing, but which one is correct for the logic you want to implement.
UNION ALL is simply the results of the two queries combined into a single result set. You should consider the order of the result sets to be random, although in practice you'd probably find that it's the first result set followed by the second.
UNION is the same, but with a DISTINCT applied to the combined sets. Therefore it's possible for it to return fewer rows and take longer and consume more resources. You would probably find that the order of rows is different, but again you should assume that the order is random.
Applying an ORDER BY to either one of them is the only way to guarantee an order on the result set. It would affect performance of the UNION ALL, but may not affect UNION too much (on top of the impact of the DISTINCT) because the optimiser might choose a DISTINCT implementation that will return an ordered result set.
So in your case you want both rows to appear (and in fact UNION would not eliminate either of them), and you want the order to be specified -- so based on that, UNION ALL and the ORDER BY you have provided are the correct approaches.
"UNION" = "UNION ALL" + "DISTINCT"
So basically by going for UNION ALL you save a dedup operation. In this case you may find that the performance gain is negligible but it can be important on large datasets.
Union All returns all rows, so you might get duplicates. Union is effectively like a distinct. In your case, it doesn't make a difference as you are using the same tables!

Does UNION ALL guarantee the order of the result set [duplicate]

This question already has answers here:
SQL Server UNION - What is the default ORDER BY Behaviour
(6 answers)
Closed 9 years ago.
Can I be sure that the result set of the following script will always be sorted like this O-R-D-E-R ?
SELECT 'O'
UNION ALL
SELECT 'R'
UNION ALL
SELECT 'D'
UNION ALL
SELECT 'E'
UNION ALL
SELECT 'R'
Can it be proved to sometimes be in a different order?
There is no inherent order, you have to use ORDER BY. For your example you can easily do this by adding a SortOrder to each SELECT. This will then keep the records in the order you want:
SELECT 'O', 1 SortOrder
UNION ALL
SELECT 'R', 2
UNION ALL
SELECT 'D', 3
UNION ALL
SELECT 'E', 4
UNION ALL
SELECT 'R', 5
ORDER BY SortOrder
You cannot guarantee the order unless you specifically provide an order by with the query.
No it does not. SQL tables are inherently unordered. You need to use order by to get things in a desired order.
The issue is not whether it works once when you try it out. The issue is whether you can trust this behavior. And you cannot. SQL Server does not even guarantee the ordering for this:
select *
from (select t.*
from t
order by col1
) t
It says here:
When ORDER BY is used in the definition of a view, inline function,
derived table, or subquery, the clause is used only to determine the
rows returned by the TOP clause. The ORDER BY clause does not
guarantee ordered results when these constructs are queried, unless
ORDER BY is also specified in the query itself.
A fundamental principle of the SQL language is that tables are not ordered. So, although your query might work in many databases, you should use the version suggested by BlueFeet to guarantee the ordering of results.
Try removing all of the ALLs, for example. Or even just one of them. Now consider that the type of optimization that has to happen there (and many other types) will also be possible when the SELECT queries are actual queries against tables, and are optimized separately. Without an ORDER BY, ordering within each query will be arbitrary, and you can't guarantee that the queries themselves will be processed in any order.
Saying UNION ALL with no ORDER BY is like saying "Just throw all the marbles on the floor." Maybe every time you throw all the marbles on the floor, they end up being organized by color. That doesn't mean the next time you throw them on the floor they'll behave the same way. The same is true for ordering in SQL Server - if you don't say ORDER BY then SQL Server assumes you don't care about order. You may see by coincidence a certain order being returned all the time, but many things can affect the arbitrary order that has been selected next time. Data changes, statistics changes, recompile, plan flush, upgrade, service pack, hotfix, trace flag... ad nauseum.
I will put this in large letters to make it clear:
You cannot guarantee an order without ORDER BY
Some further reading:
Bad habits to kick : relying on undocumented behavior
Also, please read this post by Conor Cunningham, a pretty smart guy on the SQL team.
No. You get the records in whatever way SQL Server fetches them for you. You can apply an order on a unioned result set by 1-based index thusly:
SELECT 1, 'O'
UNION ALL
SELECT 2, 'R'
UNION ALL
SELECT 3, 'D'
UNION ALL
SELECT 4, 'E'
UNION ALL
SELECT 5, 'R'
ORDER BY 1

What is the difference between UNION and UNION ALL?

What is the difference between UNION and UNION ALL?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):
UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/
UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.
Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.
UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!
UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.