Odd 'UNION' behavior in an Oracle SQL query - sql

Here's my query:
SELECT my_view.*
FROM my_view
WHERE my_view.trial in (select 2 as trial_id from dual union select 3 from dual union select 4 from dual)
and my_view.location like ('123-%')
When I execute this query it returns results which do not conform to the my_view.location like ('123-%') condition. It's as if that condition is being ignored completely. I can even change it to my_view.location IS NULL and it returns the same results, despite that field being not-nullable.
I know this query seems ridiculous with the selects from dual, but I've structured it this way to replicate a problem I have when I use a 'WITH' clause (the results of that query are where the selects from dual inline view are).
I can modify the query like so and it returns the expected results:
SELECT my_view.*
FROM my_view
WHERE my_view.trial in (2, 3, 4)
and my_view.location like ('123-%')
Unfortunately I do not know the trial values up front (they are queried for in a 'WITH' clause) so I cannot structure my query this way. What am I doing wrong?
I will say that the my_view view is composed of 3 other views whose results are UNION ALL and each of which retrieve some data over a DB Link. Not that I believe that should matter, but in case it does.

One thing you could try if you don't have luck with this route is to replace "IN" with an "EXISTS" or "NOT EXISTS" statement.
If you could accomplish what you want using joins, that would be the best option because of performance. If you have views pulling data from views, you can often make a single query to do what you want that gives you better performance using subqueries.

If you do
EXPLAIN PLAN FOR
SELECT my_view.*
FROM my_view
WHERE my_view.trial in (select 2 as trial_id from dual union select 3 from dual union select 4 from dual)
and my_view.location like ('123-%');
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
you should see where the location predicate is 9or is not) being applied. My bet is that it has something to do with the DB links and you won't be able to reproduce it if all the tables are local.
Optimizing a distributed query gets complicated.

Try changing the UNION query to use UNION ALL, as in:
SELECT my_view.*
FROM my_view
WHERE my_view.trial in (select 2 as trial_id from dual
UNION ALL
select 3 AS TRIAL_ID from dual
UNION ALL
select 4 AS TRIAL_ID from dual)
and my_view.location like ('123-%')
I also put in "AS TRIAL_ID" on the 3 and 4 cases. I agree that neither of these should matter, but I've run into cases occasionally where things that I thought shouldn't matter mattered.
Good luck.

Related

Counter-intuitive behavior of SUM( ) of UNION in Sqlite3

I ran this code in an sqlite3 terminal (version 3.29.0), and the result is a little strange:
sqlite> select sum((select 4 union all select 2 ));
4
sqlite> select sum((select 4 union select 2 ));
2
I understand why union reverses the table, but why does sum choose the first element?
Meanwhile, this code works just as expected:
sqlite> select sum(x) from (select 4 as x union all select 2 as x);
6
Is this the intended behavior, or is this a bug in sqlite? And if it's intended, what's the logic (and semantics) behind it?
That's expected behavior.
From the documentation (emphasis added)
A SELECT statement enclosed in parentheses is a subquery. All types of SELECT statement, including aggregate and compound SELECT queries (queries with keywords like UNION or EXCEPT) are allowed as scalar subqueries. The value of a subquery expression is the first row of the result from the enclosed SELECT statement. The value of a subquery expression is NULL if the enclosed SELECT statement returns no rows.
The UNION example just happened to end up returning.rows in a different order than the UNION ALL one. Without an ORDER BY neither one is guaranteed to use a particular order, though.

TABLE/CAST/MULTISET vs subquery in FROM clause

The following query doesn't work. It is expected to fail since temp.col references something that is unavailable in that context.
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
(select level || temp.col from dual connect by level < 3);
The error message from Oracle is : ORA-00904: "TEMP"."COL": invalid identifier
But why is the next query working ? I see CAST/MULTISET as a way to go from a SQL table to a collection type and TABLE to go back to a SQL table. Why do we use such round-trip ? I guess to make the query work, but how ?
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
table(
cast(
multiset(
select level || temp.col from dual connect by level < 3
) as sys.odcivarchar2list
)
) t;
The result is :
COL COLUMN_VALUE
--- ------------
A 1A
A 2A
B 1B
B 2B
Look how the second column is named COLUMN_VALUE. Looks like a generated name by one of the construct CAST/MULTISET or TABLE.
EDIT
With the accepted answer below, I checked the documentation and found that the TABLE mechanism is a table collection expression. The expression between rounded brackets is the collection expression. The documentations defines a mechanism called left correlation :
The collection_expression can reference columns of tables defined to
its left in the FROM clause. This is called left correlation. Left
correlation can occur only in table_collection_expression. Other
subqueries cannot contains references to columns defined outside the
subquery.
So this is like LATERAL in 12c.
Oracle allows lateral inline views to reference other tables inside the inline view.
In old versions this feature was mostly used for optimizations, as discussed in the Oracle optimizer blog here. Explicit lateral joins were added in 12c. Your first query only needs a small change to work in 12c:
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
lateral(select level || temp.col from dual connect by level < 3);
Apparently Oracle also silently uses lateral joins for collection unnesting. There are a few cases where SQL uses a logical cross join, but the tables are obviously closely related; such as XMLTable, JSON_table, and queries like your second example. In those cases it makes sense to execute the two tables together. I assume the lateral mechanism is used there, although neither the execution plan nor the 10053 optimizer trace uses the word "lateral". The documentation even has an example very similar to yours in the Collection Unnesting: Examples. However, this "feature" is still not well documented.
On a side note, in general you should avoid SQL features that increase the context. Features like lateral joins, common table expressions, and correlated subqueries can be useful, but they can also make SQL statements more difficult to understand. A regular inline view can be run and understood all by itself and has a very simple interface - its projected columns. That simplicity makes it easier to assemble small components into a large statement.
I suggest you re-write your query like below. Treat each inline view like you would a function or procedure - give them good names and comments. It will help you later when you assemble them into large, realistic statements.
select col, the_level||col
from
(
--Good comment 1.
select 'A' col from dual union all
select 'B' col from dual
) good_name_1
cross join
(
--Good comment 2.
select level the_level
from dual
connect by level < 3
) good_name_2

How is WITH used in Oracle SQL (example code shown )?

I found this example code online when I searched for "how to do an Exclusive Between oracle sql"
Someone was proving that, in Oracle, BETWEEN is by default inclusive.
So they used such code :
with x as (
select 1 col1 from dual
union
select 2 col1 from dual
union
select 3 col1 from dual
UNION
select 4 col1 from dual
)
select *
from x
where col1 between 2 and 3
I've never seen such an example, what is going on with the WITH ?
In short, WITH clause is an inline view, or subquery. It is useful when you will refer to something multiple times, or when you want to abstract parts of a complex query to make it easier to read.
If you are from SQL Server world, you can also think of it like a temporary table.
So:
WITH foo as (select * from tab);
select * from foo;
is like
select * from (select * from tab);
Though it may be more efficient since x is resolved to a single dataset, even if queried multiple times.
It also reduces repetition. If you use a subquery more than once in a statement, you can consider factoring it out using WITH.
It has nothing to do with the BETWEEN example, it is just the author's choice of approach for demonstrating a concept.

Can a View hold some data

I have created a View:
CREATE VIEW Lunch
AS
SELECT 'Beer' AS item
UNION SELECT 'Olives'
UNION SELECT 'Bread'
UNION SELECT 'Salami'
UNION SELECT 'Calamari'
UNION SELECT 'Coffee';
GO
Then I executed the following query in a different query.
SELECT item FROM Lunch
This is resulting in the data from the view being returned. There are not any underlying tables to hold data. But still the system is showing records. How is this possible?
I don't fully understand the question. The view is not "holding data", it is holding a query. The query statement has constants in it that are turned into a result set that can be used by other queries.
You can think of a query as substituting the text of the query directly into a statement. So, when you do:
select *
from lunch;
SQL really processes this as:
select *
from ( SELECT 'Beer' AS item
UNION SELECT 'Olives'
UNION SELECT 'Bread'
UNION SELECT 'Salami'
UNION SELECT 'Calamari'
UNION SELECT 'Coffee'
) t
This is a good model of what happens, although it is not quite what really happens. What really happens is that the view is compiled and the compiled code is inserted into the compiled code for the query.
There is another concept of "materialized views". These are views where you have indexes and the values are actually stored in the database. However, this is not an example of a materialized view.
It doesn't actually hold data, but it can evaluate SQL, which is all you're doing in the definition.
So the following can be evaluated:
SELECT 'Beer' AS item
UNION SELECT 'Olives'
UNION SELECT 'Bread'
UNION SELECT 'Salami'
UNION SELECT 'Calamari'
UNION SELECT 'Coffee';
The same as a view defined with:
SELECT * FROM Customers
SQL View Reference:
A view can be thought of as either a virtual table or a stored query. The data accessible through a view is not stored in the database as a distinct object. What is stored in the database is a SELECT statement. The result set of the SELECT statement forms the virtual table returned by the view. A user can use this virtual table by referencing the view name in Transact-SQL statements the same way a table is referenced.
A view is a Saved Select Statement in your case view definition does not have any underlying tables, but the select statement itself has hard coded (Constant) values.
Therefore it will always return this data.
Again to be clear it is not storing any data but the Select Statement which happens to return some constant values.
Yes, the data is shown because is statically included into the query itself.
I don't see anything uncommon here, the view refers to the query that embed the data .

What is the difference between UNION and UNION ALL?

What is the difference between UNION and UNION ALL?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):
UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/
UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.
Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.
UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!
UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.