Oracle union multiple tables with same sub-query - sql

I have an inefficiently written Oracle SQL query with multiple unions of sub-queries (about 7 or 8 tables) that differ only by the table queried in sub-query, that I am certain can be written more succinctly. Observe the code at bottom, in that the only difference between the union-ed sub-queries is the table names (in this case table_a / table_b).
SELECT
/*+ parallel(10) */
Col_Alpha,
Col_Beta,
Col_Gamma,
Col_Delta,
col_epsilon
FROM
table_a
WHERE
Col_Theta = 'CAT'
And Col_Kappa In ('CAR','TRUCK','PLANE')
UNION
SELECT
/*+ parallel(10) */
Col_Alpha,
Col_Beta,
Col_Gamma,
Col_Delta,
col_epsilon
FROM
table_b
WHERE
Col_Theta = 'CAT'
AND col_kappa IN ('CAR','TRUCK','PLANE')
I tried giving a list of tables after the from clause but that did not work. I also added variations of:
from
(table_a, table_b)
And that did not work. I have tried finding a way to compress the code but I do not know enough for a successful search.
I cannot use procedures with my level of access.
I expect an output similar to what I'm getting, a union of several tables with the same columns queried and same filters across all of them, but that takes around 1/7 the amount of code.

Unfortunately you can't get away from multiple SELECT statements. The way UNION is designed to work according to Oracle: "You can combine multiple queries using the set operators UNION, UNION ALL, INTERSECT, and MINUS."
In other words, you need to HAVE multiple queries in order to UNION them.
But you can move the conditions to outside the UNION so that you only have to repeat those once:
SELECT * FROM(
Select col1, col2, col3 FROM table_1 UNION
Select col1, col2, col3 FROM table_2 UNION
Select col1, col2, col3 FROM table_3 UNION
Select col1, col2, col3 FROM table_4)
WHERE Col_Theta = 'CAT'
AND Col_Kappa In ('CAR','TRUCK','PLANE');
You can also throw in ORDER BY and other conditions at the end as well. This doesn't remove all of the repetition but at least makes it so that if you change the WHERE conditions you're only changing them in one place.

Related

Oracle WITH clause restrictions

I have a quite complex query that is based on multiple tables unioned together. At the moment, we are using view in order to perform operations on all the rows we need, so the view and a query look like:
CREATE VIEW
V_VIEW
(
COL1, COL2, COL3, COL4
) AS
SELECT
"COL1", "COL2", "COL3", "COL4"
FROM
TABLE1
UNION ALL
SELECT
"COL1", "COL2", "COL3", "COL4"
FROM
TABLE2;
SELECT
COL1, COL2
FROM
( SELECT
COL1, COL2
FROM
V_VIEW
WHERE
COL1 like 'val%'
AND COL2 =
(
SELECT
MAX(COL3)
FROM
V_VIEW
WHERE
COL4 = 'Y' ) part1
UNION ALL
SELECT
COL1, COL2
FROM
( SELECT
COL1, COL2
FROM
V_VIEW
WHERE
COL1 like 'sth%'
AND COL2 =
(
SELECT
MIN(COL3)
FROM
V_VIEW
WHERE
COL4 = 'N' ) part2;
I'm looking for a way to improve performance of this query and unfortunately creating new table that consists all rows of Table1 and Table2 is not an option for now (we are not allowed to interfere with the way rows are being inserted there). I tried to use WITH clause instead of the view, so it would look a bit like:
WITH TEMP_TABLE AS (
SELECT
COL1, COL2, COL3, COL4
FROM
TABLE1
UNION ALL
SELECT
COL1, COL2, COL3, COL4
FROM
TABLE2 )
SELECT
COL1, COL2
FROM
( SELECT
COL1, COL2
FROM
TEMP_TABLE
WHERE
COL1 like 'val%'
AND COL2 =
(
SELECT
MAX(COL3)
FROM
TEMP_TABLE
WHERE
COL4 = 'Y' ) part1
UNION ALL
SELECT
COL1, COL2
FROM
( SELECT
COL1, COL2
FROM
TEMP_TABLE
WHERE
COL1 like 'sth%'
AND COL2 =
(
SELECT
MIN(COL3)
FROM
TEMP_TABLE
WHERE
COL4 = 'N' ) part2
On a small data volume (Table1 and Table2 have about 20k rows) this improves performance very well. However, those tables will eventually get stuffed with millions of rows. I don't entirely understand how WITH clause is being processed, so I wonder: is there a chance that query using WITH closure, on a large set of data, will fail (due to lack of memory?), where a query without it would work slow, but will finish just fine?
You could try using the following:
WITH main_res AS (SELECT col1,
col2,
MAX(CASE WHEN col4 = 'N' THEN col3) OVER () col3_n_max,
MAX(CASE WHEN col4 = 'Y' THEN col3) OVER () col3_y_max
FROM v_view
WHERE col1 LIKE 'val%'
OR col1 LIKE 'sth%')
SELECT col1,
col2
FROM main_res
WHERE (col1 LIKE 'val%' AND col2 = col3_y_max)
OR (col1 LIKE 'sth%' AND col2 = col3_n_max);
This uses a conditional max analytic function to return the max value (depending on the col4 value) across all the rows.
Once you know that information, you can then filter on it appropriately. This should reduce the number of times you're querying each table, which usually is faster (but not always!) than the original query. I advise you test this query and work out if it's faster than the original query (and any other answers) before you choose which one to use.
WITH clause is a kind of VIEW which is created on the fly, used and the code for wont get stored in the DB. However, the it consumes main memory to store the information related to the cursor which is used to retrieve rows from the WITH SELECT query. You are right; WITH query on tables with huge data will slow down the DB.
I am not aware of:
a) Whether TABLE1 and TABLE2 hold full data set or these tables are incrementally updated.
b) Do we have date columns in this table?
c) At what interval these tables are populated or updated?
Based on the answers to above questions:
After discussing with your DBAs:
You can ask DBAs to extract data belonging to TRUNC(SYSDATE) or TRUNC(SYSDATE)-1 from TABLE1 and TABLE2 and populate this data into a single "new" table with same columns along with two additional columns:
a) One column is going to contain 1st three letters of COL1 value.
b) Another column to hold status value with DEFAULT 'Q'.
Create a LIST partition on this new table on COL1 for values 'Val' and 'Sth' and COL4 for Y and N.
Write an anonymous block which prepares data the way you need. Then, simple query on this new table should fetch data for you. We can schedule this anonymous block in job schedule depending on the frequency at which data will be available in the source tables TABLE1 and TABLE2.
These suggestions are based on a set of assumptions and amount of information you have shared.
If there is any UI or report running on this data then, house keeping of this data is required.
Bottom line :
Prepare the data as required by the subsequent process(es) beforehand rather than preparing the data on-the-fly when it is required. This will simplify your entire process and query part also.
Most of the times when we encounter performance bottlenecks in Prod or Int environment, we always look for short-term solutions. Short-time solutions are very much required to sort out the issue at hand. However, I would suggest you to be prepared with a long-term solution as well.
Before investing too much time in rewriting, it would be helpful to ensure that the optimizer is given a fighting chance at doing a good job. Make sure the tables have good stats and appropriate indexes.
Run explain plans on your queries to see what Oracle is actually doing in each case. You may find that something unexpected is going on with those UNION ALL statements. The optimizer sometimes makes dumb decisions and you may need to help it with indexes or strategically applied hints.
The WITH clause is quite handy and does the same job as a standalone view or a view defined inline in the table list, with one key exception: Oracle treats standalone views, WITH-clause views, and inline views slightly differently in the optimization process.
Oracle may choose to materialize the results of a view defined in a WITH clause, while it may merge the view if it is defined inline.
The point is that changing between these three kinds of views in your query will cause odd nuances of the optimizer to start showing up.
Finally, what version of Oracle are you on? The optimizer is one area where version really matters.

How is WITH used in Oracle SQL (example code shown )?

I found this example code online when I searched for "how to do an Exclusive Between oracle sql"
Someone was proving that, in Oracle, BETWEEN is by default inclusive.
So they used such code :
with x as (
select 1 col1 from dual
union
select 2 col1 from dual
union
select 3 col1 from dual
UNION
select 4 col1 from dual
)
select *
from x
where col1 between 2 and 3
I've never seen such an example, what is going on with the WITH ?
In short, WITH clause is an inline view, or subquery. It is useful when you will refer to something multiple times, or when you want to abstract parts of a complex query to make it easier to read.
If you are from SQL Server world, you can also think of it like a temporary table.
So:
WITH foo as (select * from tab);
select * from foo;
is like
select * from (select * from tab);
Though it may be more efficient since x is resolved to a single dataset, even if queried multiple times.
It also reduces repetition. If you use a subquery more than once in a statement, you can consider factoring it out using WITH.
It has nothing to do with the BETWEEN example, it is just the author's choice of approach for demonstrating a concept.

How to manipulate a column selected by * in SQLite?

I want a query to return all rows and all columns with one caveat: if, in a given row, colN is null, then instead return the string 'FOO'.
Why dont I just use SELECT col1, col2, ..., COALESCE(colN, 'FOO')?
I am implementing an abstract interface and thus I am required to use SELECT queries which SELECT * (because I cannot make assumptions on what columns there are). I can only assume 1 columns exists: colN.
What would this provide me?
I need this because this query is used in combination with a UNION and this allows me to keep track of the origin of the data.
Any ideas on how to do this?
One thing you could do is
SELECT *, COALESCE(colN, 'FOO') as CoalescedColN
if it's possible to adjust the other select(s) in the UNION accordingly
I don't know if SQL Lite can use this technique but this is what I would do in most other dbs:
select * from
(SELECT col1, col2, ..., COALESCE(colN, 'FOO') from table ) a

Multiple SQL SELECT

I've 10 tables with a lot of records. All tables have "Date" column. I want extract all data from tables for date.
I can do 10 queries SELECT * FROM Table1 WHERE Date=dd/MM/yyyy, ect...but I want to do only a query with "multiple selection". How can I do this?
I'm not so skilled with SQL language.
EDIT: I'm working with Microsoft Access and also MySQL (for two different desktop application, but same problem).
Tables have different fields (just Date all in common), so It's not good the use of UNION.
SELECT *
FROM table1,
table2
WHERE table1.date = 'somedate'
AND table2.date = 'somedate'
Take a look at the UNION operator for including data from multiple SELECT statements.
Your question is not so clear. Based one what i understood, you can use Union SQL statement to combine your queries and make them as a single query. But if you want to query for different dates in different tables, you can use only use multiple queries .
If I right understand your question, and you want to get data from tables where
all 10 tables contains the same list of fields, you can use
SELECT * FROM Table1 WHERE Date=dd/MM/yyyy
UNION ALL
SELECT * FROM Table2 WHERE Date=dd/MM/yyyy
UNION ALL
...
UNION ALL
SELECT * FROM Table10 WHERE Date=dd/MM/yyyy
UNION ALL
If fields are different, you need to add the fields you want to get in result set:
SELECT field1, field2 FROM Table1 WHERE Date=dd/MM/yyyy
UNION ALL
SELECT field1, field2 FROM Table2 WHERE Date=dd/MM/yyyy
UNION ALL
...
UNION ALL
SELECT field1, field2 FROM Table10 WHERE Date=dd/MM/yyyy
UNION ALL
Be careful of performance issues when using Union. -- Be sure to run some tests comparing query times of the two approaches.
Depending on your database software, you could also use a stored procedure.
And also check that the date column is indexed in each of the tables.

What is the difference between UNION and UNION ALL?

What is the difference between UNION and UNION ALL?
UNION removes duplicate records (where all columns in the results are the same), UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL, since the database server must do additional work to remove the duplicate rows, but usually you do not want the duplicates (especially when developing reports).
To identify duplicates, records must be comparable types as well as compatible types. This will depend on the SQL system. For example the system may truncate all long text fields to make short text fields for comparison (MS Jet), or may refuse to compare binary fields (ORACLE)
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
Both UNION and UNION ALL concatenate the result of two different SQLs. They differ in the way they handle duplicates.
UNION performs a DISTINCT on the result set, eliminating any duplicate rows.
UNION ALL does not remove duplicates, and it therefore faster than UNION.
Note: While using this commands all selected columns need to be of the same data type.
Example: If we have two tables, 1) Employee and 2) Customer
Employee table data:
Customer table data:
UNION Example (It removes all duplicate records):
UNION ALL Example (It just concatenate records, not eliminate duplicates, so it is faster than UNION):
UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may have an impact on the performance of the UNION, depending on the volume of data being sorted, and the settings of various RDBMS parameters ( For Oracle PGA_AGGREGATE_TARGET with WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" comment, but have insufficient reputation (points) to do so.
In ORACLE: UNION does not support BLOB (or CLOB) column types, UNION ALL does.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
from http://zengin.wordpress.com/2007/07/31/union-vs-union-all/
UNION
The UNION command is used to select related information from two tables, much like the JOIN command. However, when using the UNION command all selected columns need to be of the same data type. With UNION, only distinct values are selected.
UNION ALL
The UNION ALL command is equal to the UNION command, except that UNION ALL selects all values.
The difference between Union and Union all is that Union all will not eliminate duplicate rows, instead it just pulls all rows from all tables fitting your query specifics and combines them into a table.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
You can avoid duplicates and still run much faster than UNION DISTINCT (which is actually same as UNION) by running query like this:
SELECT * FROM mytable WHERE a=X UNION ALL SELECT * FROM mytable WHERE b=Y AND a!=X
Notice the AND a!=X part. This is much faster then UNION.
Just to add my two cents to the discussion here: one could understand the UNION operator as a pure, SET-oriented UNION - e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose T-SQL offers the operator UNION ALL.
UNION - results in distinct records while
UNION ALL - results in all the records including duplicates.
Both are blocking operators and hence I personally prefer using JOINS over Blocking Operators(UNION, INTERSECT, UNION ALL etc. ) anytime.
To illustrate why Union operation performs poorly in comparison to Union All checkout the following example.
CREATE TABLE #T1 (data VARCHAR(10))
INSERT INTO #T1
SELECT 'abc'
UNION ALL
SELECT 'bcd'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'def'
UNION ALL
SELECT 'efg'
CREATE TABLE #T2 (data VARCHAR(10))
INSERT INTO #T2
SELECT 'abc'
UNION ALL
SELECT 'cde'
UNION ALL
SELECT 'efg'
Following are results of UNION ALL and UNION operations.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results.
Using UNION results in Distinct Sort operations in the Execution Plan. Proof to prove this statement is shown below:
Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge performance leak. As a rule of thumb use UNION ALL if you are not sure which to use.
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a single result set.
ALL
Incorporates all rows into the results. This includes duplicates. If not specified, duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION results will be shown as ORDER BY 1, 2, 3, ..., n (n = column number of Tables) applied on results. You can see this side effect when you don't have any duplicate row.
I add an example,
UNION, it is merging with distinct --> slower, because it need comparing (In Oracle SQL developer, choose query, press F10 to see cost analysis).
UNION ALL, it is merging without distinct --> faster.
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
and
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual
UNION ALL
SELECT to_date(sysdate, 'yyyy-mm-dd') FROM dual;
UNION merges the contents of two structurally-compatible tables into a single combined table.
Difference:
The difference between UNION and UNION ALL is that UNION will omit duplicate records whereas UNION ALL will include duplicate records.
Union Result set is sorted in ascending order whereas UNION ALL Result set is not sorted
UNION performs a DISTINCT on its Result set so it will eliminate any duplicate rows. Whereas UNION ALL won't remove duplicates and therefore it is faster than UNION.*
Note: The performance of UNION ALL will typically be better than UNION, since UNION requires the server to do the additional work of removing any duplicates. So, in cases where it is certain that there will not be any duplicates, or where having duplicates is not a problem, use of UNION ALL would be recommended for performance reasons.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all duplicate entries(all columns value of row in a table is same of another table). Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables, it returns all entries with duplicate(if there is any difference between any column value of a row in 2 tables). Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as they do additional task to remove the duplicate values. You can check that from Execution Estimated Time by press ctrl+L at MSSQL
UNION removes duplicate records in other hand UNION ALL does not. But one need to check the bulk of data that is going to be processed and the column and data type must be same.
since union internally uses "distinct" behavior to select the rows hence it is more costly in terms of time and performance.
like
select project_id from t_project
union
select project_id from t_project_contact
this gives me 2020 records
on other hand
select project_id from t_project
union all
select project_id from t_project_contact
gives me more than 17402 rows
on precedence perspective both has same precedence.
If there is no ORDER BY, a UNION ALL may bring rows back as it goes, whereas a UNION would make you wait until the very end of the query before giving you the whole result set at once. This can make a difference in a time-out situation - a UNION ALL keeps the connection alive, as it were.
So if you have a time-out issue, and there's no sorting, and duplicates aren't an issue, UNION ALL may be rather helpful.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
Important! Difference between Oracle and Mysql: Let's say that t1 t2 don't have duplicate rows between them but they have duplicate rows individual. Example: t1 has sales from 2017 and t2 from 2018
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION ALL
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE UNION ALL fetches all rows from both tables. The same will occur in MySQL.
However:
SELECT T1.YEAR, T1.PRODUCT FROM T1
UNION
SELECT T2.YEAR, T2.PRODUCT FROM T2
In ORACLE, UNION fetches all rows from both tables because there are no duplicate values between t1 and t2. On the other hand in MySQL the resultset will have fewer rows because there will be duplicate rows within table t1 and also within table t2!
UNION ALL also works on more data types as well. For example when trying to union spatial data types. For example:
select a.SHAPE from tableA a
union
select b.SHAPE from tableB b
will throw
The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.
However union all will not.