I have table 1 and table 2 which have the same columns. I want to concatenate the tables where if ID1 and ID2 (not ‘value) in Table 1 match with Table 2, than only include Table 1 and not Table 2. If they do not match, than include.
FYI all ‘values’ in Table 2 are NULL. After doing the above step, if the output table has value ‘NULL’ and fort his row, Table 2 has a matching ID1 with Table 1, than apply this value from Table 1.
Both steps can be rolled up into one. I have broken it out to make sure its easy to understand.
I attempted this using The following code, I just have to somehow get rid of the duplicated entries and do the second part of the question. I don’t think my way is very effective. Would love some guidance…
SELECT ID1, ID2, value
FROM table1
UNION
SELECT ID1, ID2, value
FROM table2
Table 1:
ID1
ID2
value
1
1
0.1
1
2
0.2
2
2
0.2
Table 2:
ID1
ID2
value
1
1
NULL
2
1
NULL
For task #1, this is the output I want:
ID1
ID2
value
1
1
0.1
1
2
0.2
2
2
0.2
2
1
NULL
For task #2, this is the output I want:
ID1
ID2
value
1
1
0.1
1
2
0.2
2
2
0.2
2
1
0.2
As I mentioned in the comments, a FULL OUTER JOIN might be more appropriate here. Based on the data you've given, it'll likely look something like this:
SELECT ISNULL(T1.ID1,T2.ID1) AS ID1,
ISNULL(T1.ID2,T2.ID2) AS ID2,
ISNULL(T1.Value,T2.Value) AS Value
FROM dbo.Table1 T1
FULL OUTER JOIN dbo.Table2 T2 ON T1.ID1 = T2.ID1
AND T1.ID2 = T2.ID2;
So extract rows from table1 and ignore ID matching rows from table2. table1 is the first SELECT, table2 is the second SELECT in this answer.
UNION is the right way to do this academically.
It works and is fast on a few test rows.
Then there are tables with millions/billions of rows and the UNION approach is suddenly slow. UNION gets slow when:
more rows than the memory of the server can handle
indexes are too wide or include too much
tables/selection are wide
one source comes across the network
UNION works by checking each row of table1 against all rows of table2. This is not a real set operation.
If UNION gets slow then following strategies help:
check query plans (or use tools like SQL Sentry Explorer)
build an index lookup between both tables first (CTE or materialised)
proper indexing
use local tables instead of networked data
split table2 into multiple chunks
The replace UNION with UNION ALL and WHERE clauses in both parts that ensure the result is never overlapping. In essence the WHERE clause against table1 is WHERE NOT for table2.
My preferred approach for comparing medium sized sets of data or larger is besides indexing the CTE approach / UNION ALL / WHERE + WHERE NOT. Simply it was fast enough for me with the least overhead so far.
Related
I am using Oracle 11gR2. Given a table, I would like to return a certain number of rows in random order, with potential duplicates.
All the posts I have seen (here or here or here also) are about finding a number of unique rows in random order.
For example, given this table and asking for 2 random rows:
Table
-----------------
ID LABEL
1 Row 1
2 Row 2
3 Row 3
I would like the query to return
1 Row 1
2 Row 2
but also possibly
1 Row 1
1 Row 1
How could this be done using only pure SQL (no PL/SQL or stored procedure) ? The source table does not have duplicate rows; by duplicate, I mean two rows having the same ID.
Maybe something like this (where p_num is a parameter):
with sample_data as (select 1 id, 'row 1' label from dual union all
select 2 id, 'row 2' label from dual union all
select 3 id, 'row 3' label from dual),
dummy as (select level lvl
from dual
connect by level <= p_num)
select *
from (select sd.*
from sample_data sd,
dummy d
order by dbms_random.value)
where rownum <= p_num;
I really wouldn't like to use this in production code, though, as I don't think it will scale at all well.
What's the reasoning behind your requirement? It doesn't sound like particularly good design to me.
select a random row union select another random row
That gives you two totally randomized rows, which can be the same, if both randoms have the same value, or two different rows. The key is to do two random selects, not one to return two rows
If you want more than two rows, i think the best solution would be to have a random-number-table, do a full outer join to that table and order by random, select top(n) of that join. By the full outer join you have each Row of your Sourcetable many times in the result set before selecting the top(n)
I cant think of any way to do it without a stored procedure.
You might be able to make sue of DBMS_RANDOM
http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_random.htm#i998925
http://www.databasejournal.com/features/oracle/article.php/3341051/Generating-random-numbers-and-strings-in-Oracle.htm
You could generate a random primary key and return that?
You can use DBMS_RANDOM in a SQL Query.
SELECT ID FROM
(
SELECT ID FROM mytable
ORDER BY dbms_random.value)
WHERE ROWNUM <=2
http://www.sqlfiddle.com/#!4/c6487/13/0
Let's imagine that we have these 2 tables:
Table 1, with the column:
Field1
1
3
Table 2, with the column:
Field1
2
4
(Well they could also be called in any other way, but I want to represent that the type of table1.field1 is the same as table2.field1).
Would it be possible to do a SQL query that would return the following?
[1,2,3,4], I mean the numbers ordered by any criteria I would want but that criteria aplying to both tables. As far as I know ORDER BY can just ORDER by the values of a column, not by a general criteria like "from lower to higher number. And even if it could I believe the SELECT instruction can't fuse columns. I mean I think the best I could achieve with that instruction would be to get something like [(1,2),(1,4),(3,2),(3,4)] and later work on it, but this can be painful with lots of results.
And the application needs fields to be on different tables, I cannot merge them.
Any idea about how to deal with this?
Thanks a lot for your help.
Edit:
Oh, it was much easier than what I thought, with that instruction is not something hard to achieve.
Thank you everyone.
This is what the UNION statement is for. It lets you combine two SELECT statements into the same resultset:
SELECT Field1
FROM Table1
UNION ALL
SELECT Field1
FROM Table2
ORDER BY 1
can you do union all
Like below:
Select field 1
from
(Select field 1 from Table 1
Union
select field 1 from table 2)
order by field 1
Use union or Union all based on your need to repeat elements in both the tables or not.
select * from
(
select field1 as field_value from table1
union
select field2 as field_value from table2
)
order by field_value asc
I am not able to understand this query:
SELECT FIELD1 FROM TABLE1 T1
WHERE 3 = (
SELECT COUNT(FIELD1)
FROM TABLE1 T2
WHERE T2.FIELD1 <= T1.FIELD1
);
This query is running properly with out any error. The inner count query is returning result as 363.
in where clause if I put 3 = (select.. then I am getting one result. If I put 4=(select.. then no records are coming. If I put 363 = (select... then 3 records are coming.
I am confused with this. Please help me to understand this.
The subquery is counting how many FIELD1 values in the whole table are smaller or equal compared to the current one in the outer query (T1.FIELD1). Therefore the whole queue just works like this:
Return FIELD1 values from table TABLE1 if there are exactly 3 (or 4 or
whatever number you put there) other FIELD1 values in the table TABLE1
which are smaller or equal.
Note that it uses <= which means the subquery will allways return at least 1.
The query produces a result set consisting of the bottom n records in table1 with respect to the ordering implied by the values of field1 where nrepresents the literal number in the where clause. a non-empty result sets to the query also asserts that there are exactly n tuples to meet the comparison criterion.
you therefore compute the occupants of the nth rank in a tournament provided that the rank can be issued unequivocally.
example:
imagine a tournament result as follows:
scooby doo
donald duck
mickey mouse
calvin
hobbes
6. minnie mouse
...
363. whoever
this ranking would be compatible with your results (of course you would commonly label calvin & hobbes' rank as 4 instead of 5, as you'd have to if using your query to dertemine top n contestants).
How can I get distinct values from multiple fields within one table with just one request.
Option 1
SELECT WM_CONCAT(DISTINCT(FIELD1)) FIELD1S,WM_CONCAT(DISTINCT(FIELD2)) FIELD2S,..FIELD10S
FROM TABLE;
WM_CONCAT is LIMITED
Option 2
select DISTINCT(FIELD1) FIELDVALUE, 'FIELD1' FIELDNAME
FROM TABLE
UNION
select DISTINCT(FIELD2) FIELDVALUE, 'FIELD2' FIELDNAME
FROM TABLE
... FIELD 10
is just too slow
if you were scanning a small range in the data (not full scanning the whole table) you could use WITH to optimise your query
e.g:
WITH a AS
(SELECT field1,field2,field3..... FROM TABLE WHERE condition)
SELECT field1 FROM a
UNION
SELECT field2 FROM a
UNION
SELECT field3 FROM a
.....etc
For my problem, I had
WL1 ... WL2 ... correlation
A B 0.8
B A 0.8
A C 0.9
C A 0.9
how to eliminate the symmetry from this table?
select WL1, WL2,correlation from
table
where least(WL1,WL2)||greatest(WL1,WL2) = WL1||WL2
order by WL1
this gives
WL1 ... WL2 ... correlation
A B 0.8
A C 0.9
:)
The best option in the SQL is the UNION, though you may be able to save some performance by taking out the distinct keywords:
select FIELD1 FROM TABLE
UNION
select FIELD2 FROM TABLE
UNION provides the unique set from two tables, so distinct is redundant in this case. There simply isn't any way to write this query differently to make it perform faster. There's no magic formula that makes searching 200,000+ rows faster. It's got to search every row of the table twice and sort for uniqueness, which is exactly what UNION will do.
The only way you can make it faster is to create separate indexes on the two fields (maybe) or pare down the set of data that you're searching across.
Alternatively, if you're doing this a lot and adding new fields rarely, you could use a materialized view to store the result and only refresh it periodically.
Incidentally, your second query doesn't appear to do what you want it to. Distinct always applies to all of the columns in the select section, so your constants with the field names will cause the query to always return separate rows for the two columns.
I've come up with another method that, experimentally, seems to be a little faster. In affect, this allows us to trade one full-table scan for a Cartesian join. In most cases, I would still opt to use the union as it's much more obvious what the query is doing.
SELECT DISTINCT CASE lvl WHEN 1 THEN field1 ELSE field2 END
FROM table
CROSS JOIN (SELECT LEVEL lvl
FROM DUAL
CONNECT BY LEVEL <= 2);
It's also worthwhile to add that I tested both queries on a table without useful indexes containing 800,000 rows and it took roughly 45 seconds (returning 145,000 rows). However, most of that time was spent actually fetching the records, not running the query (the query took 3-7 seconds). If you're getting a sizable number of rows back, it may simply be the number of rows that is causing the performance issue you're seeing.
When you get distinct values from multiple columns, then it won't return a data table. If you think following data
Column A Column B
10 50
30 50
10 50
when you get the distinct it will be 2 rows from first column and 1 rows from 2nd column. It simply won't work.
And something like this?
SELECT 'FIELD1',FIELD1, 'FIELD2',FIELD2,...
FROM TABLE
GROUP BY FIELD1,FIELD2,...
I'm doing a UNION of two queries on an Oracle database. Both of them have a WHERE clause. Is there a difference in the performance if I do the WHERE after UNIONing the queries compared to performing the UNION after WHERE clause?
For example:
SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colA, colB FROM tableB WHERE colA > 1
compared to:
SELECT *
FROM (SELECT colA, colB FROM tableA
UNION
SELECT colA, colB FROM tableB)
WHERE colA > 1
I believe in the second case, it performs a full table scan on both the tables affecting the performance. Is that correct?
In my experience, Oracle is very good at pushing simple predicates around. The following test was made on Oracle 11.2. I'm fairly certain it produces the same execution plan on all releases of 10g as well.
(Please people, feel free to leave a comment if you run an earlier version and tried the following)
create table table1(a number, b number);
create table table2(a number, b number);
explain plan for
select *
from (select a,b from table1
union
select a,b from table2
)
where a > 1;
select *
from table(dbms_xplan.display(format=>'basic +predicate'));
PLAN_TABLE_OUTPUT
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | |
| 2 | SORT UNIQUE | |
| 3 | UNION-ALL | |
|* 4 | TABLE ACCESS FULL| TABLE1 |
|* 5 | TABLE ACCESS FULL| TABLE2 |
---------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("A">1)
5 - filter("A">1)
As you can see at steps (4,5), the predicate is pushed down and applied before the sort (union).
I couldn't get the optimizer to push down an entire sub query such as
where a = (select max(a) from empty_table)
or a join. With proper PK/FK constraints in place it might be possible, but clearly there are limitations :)
NOTE: While my advice was true many years ago, Oracle's optimizer has improved so that the location of the where definitely no longer matters here. However preferring UNION ALL vs UNION will always be true, and portable SQL should avoid depending on optimizations that may not be in all databases.
Short answer, you want the WHERE before the UNION and you want to use UNION ALL if at all possible. If you are using UNION ALL then check the EXPLAIN output, Oracle might be smart enough to optimize the WHERE condition if it is left after.
The reason is the following. The definition of a UNION says that if there are duplicates in the two data sets, they have to be removed. Therefore there is an implicit GROUP BY in that operation, which tends to be slow. Worse yet, Oracle's optimizer (at least as of 3 years ago, and I don't think it has changed) doesn't try to push conditions through a GROUP BY (implicit or explicit). Therefore Oracle has to construct larger data sets than necessary, group them, and only then gets to filter. Thus prefiltering wherever possible is officially a Good Idea. (This is, incidentally, why it is important to put conditions in the WHERE whenever possible instead of leaving them in a HAVING clause.)
Furthermore if you happen to know that there won't be duplicates between the two data sets, then use UNION ALL. That is like UNION in that it concatenates datasets, but it doesn't try to deduplicate data. This saves an expensive grouping operation. In my experience it is quite common to be able to take advantage of this operation.
Since UNION ALL does not have an implicit GROUP BY in it, it is possible that Oracle's optimizer knows how to push conditions through it. I don't have Oracle sitting around to test, so you will need to test that yourself.
Just a caution
If you tried
SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB WHERE colA > 1
compared to:
SELECT *
FROM (SELECT colA, colB FROM tableA
UNION
SELECT colX, colA FROM tableB)
WHERE colA > 1
Then in the second query, the colA in the where clause will actually have the colX from tableB, making it a very different query. If columns are being aliased in this way, it can get confusing.
You need to look at the explain plans, but unless there is an INDEX or PARTITION on COL_A, you are looking at a FULL TABLE SCAN on both tables.
With that in mind, your first example is throwing out some of the data as it does the FULL TABLE SCAN. That result is being sorted by the UNION, then duplicate data is dropped. This gives you your result set.
In the second example, you are pulling the full contents of both tables. That result is likely to be larger. So the UNION is sorting more data, then dropping the duplicate stuff. Then the filter is being applied to give you the result set you are after.
As a general rule, the earlier you filter away data, the smaller the data set, and the faster you will get your results. As always, your milage may vary.
SELECT * FROM (SELECT colA, colB FROM tableA UNION SELECT colA, colB FROM tableB) as tableC WHERE tableC.colA > 1
If we're using a union that contains the same field name in 2 tables, then we need to give a name to the sub query as tableC(in above query). Finally, the WHERE condition should be WHERE tableC.colA > 1
I would make sure you have an index on ColA, and then run both of them and time them. That would give you the best answer.
i think it will depend on many things - run EXPLAIN PLAN on each one to see what your optimizer selects. Otherwise - as #rayman suggests - run them both and time them.
SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB
SELECT *
FROM (SELECT * FROM can
UNION
SELECT * FROM employee) as e
WHERE e.id = 1;