Locally symmetric difference in sql - sql

I have a problem similar to this StackOverflow question, except that I need to exclude certain fields from the comparison but still include it in the result set.
I'm penning the problem as locally symmetric difference.
For example Table A and B have columns X,Y,Z and I want to compare only Y,Z for differences but I still want the result set to include X.

Sounds like this is basically what you want. Match rows between two tables on columns Y and Z, find the unmatched rows, and output the values of columns X, Y, and Z.
SELECT a.x, a.y, a.z, b.x, b.y, b.z
FROM a FULL OUTER JOIN b ON a.y = b.y AND a.z = b.z
WHERE a.y IS NULL OR b.y IS NULL

Old style SQL for a full join - A concatenated with B, excluding rows in B also in A (the middle):
-- all rows in A with or without matching B
select a.x, a.y, a.z
from a
left join b
on a.x = b.x
and a.y = b.y
union all
-- all rows in B with no match in A to "exclude the middle"
select b.x, b.y, null as z
from b
where not exists (select null
from a
where b.x = a.x
and b.y = a.y)
ANSI Style:
select coalesce(a.x, b.x) as x,
coalesce(a.y, b.y) as y,
a.z
from a
full outer join b
on a.x = b.x
and a.y = b.y
The coalesce's are there for safety; I've never actually had cause to write a full outer join in the real world.
If what you really want to find out if two table are identical, here's how:
SELECT COUNT(*)
FROM (SELECT list_of_columns
FROM one_of_the_tables
MINUS
SELECT list_of_columns
FROM the_other_table
UNION ALL
SELECT list_of_columns
FROM the_other_table
MINUS
SELECT list_of_columns
FROM one_of_the_tables)
If that returns a non-zero result, then there is a difference. It doesn't tell you which table it's in, but it's a start.

Related

Why does an INSERT take time when using subqueries

INSERT INTO TableA
SELECT
x,
y,
z
FROM TableB
WHERE x IN
(select DISTINCT x
FROM TableC
WHERE x NOT IN
(SELECT DISTINCT x from TableD)
)
This query takes forever and it doesn't complete.
When I run the each select query it works fine but when I run it all it takes forever? Can you see the reason?
Try this query :
insert into TableA
select b.*
from TableB b --with(nolock)
left outer join TableC c --with(nolock)
on b.x = c.x
left outer join TableD d --with(nolock)
on c.x = d.x
where c.x is not null and d.x is null
if it is also running infinite then uncomment with(nolock) and try again. if it does not work then check estimated execution plan.
Firstly you need to look at the execution plan for the query - it might tell you where the bottlenecks are or if there are missing indexes that would significantly speed up your query - I think this is likely as your query is simple so I don't see why it would take so long;
I think you could also restructure you query so that it uses Joins instead of the not in - it would help if i knew the data to see if this produced the same results, But i think it should;
SELECT B.x,
B.y,
B.Z
FROM TableB B
INNER JOIN --where in
(
SELECT DISTINCT x
FROM TableC c
LEFT JOIN TableD d
ON c.x = d.x
WHERE d.x IS NULL -- c x not in d x
) sub
on B.x = sub.x
Subqueries and DISTINCT when not needed are notoriously poor for performance. You can accomplish what you need using JOINs.
SELECT b.x, b.y, b.z
FROM TableB b
INNER JOIN TableC c ON c.x=b.x
LEFT JOIN TableD d ON d.x=b.x
WHERE d.x IS NULL
GROUP BY b.x, b.y, b.z -- only if you have duplicates and need unique records
The INNER JOIN on TableC fixes your 1st "IN", then the LEFT JOIN and d.x IS NULL fixes your "NOT IN" clause.
Lastly, make sure that you have indexes on the "x" column in each table.
CREATE INDEX IX_TableB_X ON TableB (X);
CREATE INDEX IX_TableC_X ON TableC (X);
CREATE INDEX IX_TableD_X ON TableD (X);

SQL Case With Many Columns

Hi I have looked through the "Case with multiple columns" questions and don't see something the same as this so I think I should ask.
Basically I have two tables (both are the result of a subquery) which I want to join. They have the same column names. If I join them on their ids and SELECT * I get each row being something like this:
A.id, A.x, A.y, A.z, A.num, B.id, B.x, B.y, B.z, B.num
What I want is a way to only select the columns of the table with the lower value of num. So in this case the result table would always have 5 columns, id, x, y, z, num, and I don't care which table id, x, y, z, num came from after the fact. Also either table result is fine if they are equal.
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END FROM A JOIN B ON A.id=B.id
would be perfect but you can only return one column in a CASE statement, and I could use a CASE for every column but that seems so wasteful (there are 8 in each table in my actual database so I would have 8 CASE statements).
This is SQLite btw. Any help would be appreciated!
Edit for more info on A and B:
A and B come from Queries like this
SELECT "thought case statement might go here" FROM
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53348574-3593) AND (53348574+3593)) AND (z BETWEEN (-6259973-6027) AND (-6259973+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r) A
JOIN
(SELECT id, x, y, z, num FROM Table1 a JOIN Table2 b ON a.id=b.id AND (y BETWEEN (53401007-3593) AND (53401007+3593)) AND (z BETWEEN (-6397286-6027) AND (-6397286+6027)) JOIN Table3 c ON c.id= b.id GROUP BY a.id, c.r ) B ON A.id=B.id
So it joins two tables, made based on geolocation if you're wondering why the big numbers, and needs to decide which of the tables to take its data from based on attributes of what it finds in either of the locations.
try
select A.id, A.x, A.y, A.z, A.num from A JOIN B ON A.id=B.id where a.num<b.num
union
select b.id, b.x, b.y, b.z, b.num from A JOIN B ON A.id=B.id where b.num<a.numhere
I don't know of any RDBMS supporting what you want. You will have to write 8 CASE statements. But why is it so wasteful? Or are you just lazy? :)
Edit:
See, when you write SELECT * ... what your RDBMS is doing is to query system tables (information_schema and so on) and get the list of columns in the table.
So when you write
SELECT CASE WHEN A.num < B.num THEN A.* ELSE B.* END ...
you basically write
SELECT CASE WHEN A.num < B.num THEN A.num, A.whatever, A.more, ... ELSE B.num, B.whatever, B.more... END FROM A JOIN B ON A.id=B.id
and this is unfortunately wrong syntax.

Is it possible to get multiple values from a subquery?

Is there any way to have a subquery return multiple columns in oracle db? (I know this specific sql will result in an error, but it sums up what I want pretty well)
select
a.x,
( select b.y, b.z from b where b.v = a.v),
from a
I want a result like this:
a.x | b.y | b.z
---------------
1 | 2 | 3
I know it is possible to solve this problem through joins, but that is not what I am asking for.
My Question is simply if there is any way, to get two or more values out of a subquery? Maybe some workaround using dual? So that there is NO actual join, but a new subquery for each row?
EDIT: This is a principle question. You can solve all these problems using join, I know. You do not need subqueries like this at all (not even for one column). But they are there. So can I use them in that way or is it simply impossible?
A Subquery in the Select clause, as in your case, is also known as a Scalar Subquery, which means that it's a form of expression. Meaning that it can only return one value.
I'm afraid you can't return multiple columns from a single Scalar Subquery, no.
Here's more about Oracle Scalar Subqueries:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/expressions010.htm#i1033549
It's incorrect, but you can try this instead:
select
a.x,
( select b.y from b where b.v = a.v) as by,
( select b.z from b where b.v = a.v) as bz
from a
you can also use subquery in join
select
a.x,
b.y,
b.z
from a
left join (select y,z from b where ... ) b on b.v = a.v
or
select
a.x,
b.y,
b.z
from a
left join b on b.v = a.v
Here are two methods to get more than 1 column in a scalar subquery (or inline subquery) and querying the lookup table only once. This is a bit convoluted but can be the very efficient in some special cases.
You can use concatenation to get several columns at once:
SELECT x,
regexp_substr(yz, '[^^]+', 1, 1) y,
regexp_substr(yz, '[^^]+', 1, 2) z
FROM (SELECT a.x,
(SELECT b.y || '^' || b.z yz
FROM b
WHERE b.v = a.v)
yz
FROM a)
You would need to make sure that no column in the list contain the separator character.
You could also use SQL objects:
CREATE OR REPLACE TYPE b_obj AS OBJECT (y number, z number);
SELECT x,
v.yz.y y,
v.yz.z z
FROM (SELECT a.x,
(SELECT b_obj(y, z) yz
FROM b
WHERE b.v = a.v)
yz
FROM a) v
Can't you use JOIN like this one?
SELECT
a.x , b.y, b.z
FROM a
LEFT OUTER JOIN b ON b.v = a.v
(I don't know Oracle Syntax. So I wrote SQL syntax)
you can use cross apply:
select
a.x,
bb.y,
bb.z
from
a
cross apply
( select b.y, b.z
from b
where b.v = a.v
) bb
If there will be no row from b to mach row from a then cross apply wont return row. If you need such a rows then use outer apply
If you need to find only one specific row for each of row from a, try:
cross apply
( select top 1 b.y, b.z
from b
where b.v = a.v
order by b.order
) bb
In Oracle query
select a.x
,(select b.y || ',' || b.z
from b
where b.v = a.v
and rownum = 1) as multple_columns
from a
can be transformed to:
select a.x, b1.y, b1.z
from a, b b1
where b1.rowid = (
select b.rowid
from b
where b.v = a.v
and rownum = 1
)
Is useful when we want to prevent duplication for table A.
Similarly, we can increase the number of tables:
.... where (b1.rowid,c1.rowid) = (select b.rowid,c.rowid ....
View this web:
http://www.w3resource.com/sql/subqueries/multiplee-row-column-subqueries.php
Use example
select ord_num, agent_code, ord_date, ord_amount
from orders
where(agent_code, ord_amount) IN
(SELECT agent_code, MIN(ord_amount)
FROM orders
GROUP BY agent_code);

Update table, but make a new updated table?

I am comparing two tables A and B. A and B are months (we will go with JAN and FEB).
FEB has updated data that belongs in JAN.
I need to update the data like so
UPDATE A
SET A.x = B.x, A.y = B.y, A.z = B.z
FROM JAN A, FEB B
WHERE (A.x <> B.x OR A.y <> B.y OR A.z <> B.z) AND A.PK = B.PK
Now I want the above to not take place on the Original JAN table.
Should I go about it this way? Or is there a better way?
SELECT *
INTO JAN_UPDATED
FROM JAN
UPDATE A
SET A.x = B.x, A.y = B.y, A.z = B.z
FROM JAN_UPDATED A, FEB B
WHERE (A.x <> B.x OR A.y <> B.y OR A.z <> B.z) AND A.PK = B.PK
EDIT: I want all of the original values + the updates in the new table
EDIT: Added PK
Using an outer join, you could select the data into a new table and update them along the way. Here's how:
SELECT A.PK,
COALESCE(B.x, A.x) AS x,
COALESCE(B.y, A.y) AS x,
COALESCE(B.z, A.z) AS x,
other columns as necessary
INTO JAN_UPDATED
FROM JAN A
LEFT JOIN FEB B ON A.PK = B.PK AND (A.x <> B.x OR A.y <> B.y OR A.z <> B.z)
The left part of the join, the JAN table, will return all the JAN rows, and the right part of the join, FEB, will only return matching rows, those which are different from their counterparts in JAN. When there's no match, the right part's columns will be filled with NULLs. Now, when pulling values, the COALESCE() function is used for x, y, and z: the FEB version is tried first, and if it is NULL (meaning this is a non-matching JAN row, the one that has no updates in FEB), then the JAN party (the unchanged value) is used instead.
The approach you are following is correct one if you require original records and updated one.
We cann't follow any other approach because there is no primary key defined on the tables.
Your solution also has fuzzy logic because it can update wrong records.

Comparing rows from two tables and and ouputting the result

Hello guys I've been trying to compare two similar tables that have some different columns
Table 1 has columns ID_A, X, Y, Z and
Table 2 has columns ID_B, X, Y, Z
If both values from columns X or Y or Z are = 1
the result of the query would output columns
ID_A, ID_B, X, Y, Z
I thought it would be an intersect statement in there, but I'm having problems because the name of the columns and the values from ID_A and ID_B are completely different.
What would this SQL statement look like? I'd appreciate any ideas, been banging my head on the wall for this one.
To output rows that are in both tables, an inner join would work:
select *
from table1 a
inner join table2 b
on a.x = b.x and a.y = b.y and a.z = b.z
To list only rows with x=1, y=1, or z=1 in both tables, add a where clause like;
where a.x = 1 or a.y = 1 or a.z = 1