Table "Diff" in Oracle - sql

What is the best way to perform a "diff" between two structurally identical tables in Oracle? they're in two different schemas (visible to each other).
Thanks,

If you don't have a tool like PLSQL developer, you could full outer join the two tables. If they have a primary key, you can use that in the join. This will give you an instant view on records missing in either table.
Then, for the records that do exist in both tables, you can compare each of the fields. You should note that you cannot compare null with the regular = operator, so checking is table1.field1 = table2.field1 will return false if both fields are null. So you'll have to check for each field if it has the same value as in the other table, or if both are null.
Your query might look like this (to return records that don't match):
select
*
from
table1 t1
full outer join table2 t2 on t2.id = t1.id
where
-- Only one record exists
t1.id is null or t2.id is null or
( -- The not = takes care of one of the fields being null
not (t1.field1 = t2.field1) and
-- and they cannot both be null
(t1.field1 is not null or t2.field1 is not null)
)
You will have to copy that field1 condition for each of your fields. Of course you could write a function to compare field data to make your query easier, but you should keep in mind that that might decrease performance dramatically when you need to compare two large tables.
If your tables do not have a primary key, you will need to cross join them and perform these checks for each resulting record. You may speed that up a little by using full outer join on each mandatory field, because that cannot be null and can therefor be used in the join.

Assuming you want to compare the data (diff on entire rows) in the two tables:
SELECT *
FROM (SELECT 's1.t' "Row Source", a.*
FROM (SELECT col1, col2
FROM s1.t tbl1
MINUS
SELECT col1, col2
FROM s2.t tbl2) a
UNION ALL
SELECT 's2.t', b.*
FROM (SELECT col1, col2
FROM s2.t tbl2
MINUS
SELECT col1, col2
FROM s1.t tbl1) b)
ORDER BY 1;
More info about comparing two tables.

Related

How to JOIN and get data from either table based on specific logics?

Let's say I have 2 tables as shown below:
Table 1:
Table 2:
I want to join the 2 tables together so that the output table will have a "date" column, a "hrs_billed_v1" column from table1, and a "hrs_billed_v2" column from table2. Sometimes a date only exists in one of the tables, and sometimes a date exists in both tables. If a date exists in both table1 and table2, then I want to allocate the hrs_billed_v1 from table1 and hrs_billed_v2 from table2 to the output table.
So the ideal result will look like this:
I've tried "FULL OUTPUT JOIN" but it returned some null values for "date" in the output table. Below is the query I wrote:
SELECT
DISTINCT CASE WHEN table1.date is null then table2.date WHEN table2.date is null then table1.date end as date,
CASE WHEN table1.hrs_billed_v1 is null then 0 else table1.hrs_billed_v1 END AS hrs_billed_v1,
CASE WHEN table2.hrs_billed_v2 is null then 0 else table2.hrs_billed_v2 END AS hrs_billed_v2
FROM table1
FULL OUTER JOIN table2 ON table1.common = table2.common
Note that the "common" column where I use to join table1 and table2 on is just a constant string that exists in both tables.
Any advice would be greatly appreciated!
A full join is indeed what you want. I think that would be:
select
common,
date,
coalesce(t1.hrs_billed_v1, 0) as hrs_billed_v1,
coalesce(t2.hrs_billed_v2, 0) as hrs_billed_v2
from table1 t1
full join table2 t2 using (common, date)
Rationale:
you don't show what common is; your data indicates that you want to match rows of the same date - so I put both in the join condition; you might need to adapat that
there should really be no need for distinct
coalesce() is much shorter than the case expressions
using () is handy to express the join condition when the columns to match have the same name in both tables

Correct way to select from two tables in SQL Server with no common field to join on

Back in the old days, I used to write select statements like this:
SELECT
table1.columnA, table2.columnA
FROM
table1, table2
WHERE
table1.columnA = 'Some value'
However I was told that having comma separated table names in the "FROM" clause is not ANSI92 compatible. There should always be a JOIN statement.
This leads to my problem.... I want to do a comparison of data between two tables but there is no common field in both tables with which to create a join. If I use the 'legacy' method of comma separated table names in the FROM clause (see code example), then it works perfectly fine. I feel uncomfortable using this method if it is considered wrong or bad practice.
Anyone know what to do in this situation?
Extra Info:
Table1 contains a list of locations in 'geography' data type
Table2 contains a different list of 'geography' locations
I am writing select statement to compare the distances between the locations. As far I know you cant do a JOIN on a geography column??
You can (should) use CROSS JOIN. Following query will be equivalent to yours:
SELECT
table1.columnA
, table2.columnA
FROM table1
CROSS JOIN table2
WHERE table1.columnA = 'Some value'
or you can even use INNER JOIN with some always true conditon:
FROM table1
INNER JOIN table2 ON 1=1
Cross join will help to join multiple tables with no common fields.But be careful while joining as this join will give cartesian resultset of two tables.
QUERY:
SELECT
table1.columnA
, table2,columnA
FROM table1
CROSS JOIN table2
Alternative way to join on some condition that is always true like
SELECT
table1.columnA
, table2,columnA
FROM table1
INNER JOIN table2 ON 1=1
But this type of query should be avoided for performance as well as coding standards.
A suggestion - when using cross join please take care of the duplicate scenarios. For example in your case:
Table 1 may have >1 columns as part of primary keys(say table1_id,
id2, id3, table2_id)
Table 2 may have >1 columns as part of primary keys(say table2_id,
id3, id4)
since there are common keys between these two tables (i.e. foreign keys in one/other) - we will end up with duplicate results. hence using the following form is good:
WITH data_mined_table (col1, col2, col3, etc....) AS
SELECT DISTINCT col1, col2, col3, blabla
FROM table_1 (NOLOCK), table_2(NOLOCK))
SELECT * from data_mined WHERE data_mined_table.col1 = :my_param_value

Comparing two datasets SQL SSRS 2005

I have two datasets on two seperate servers. They both pull one column of information each.
I would like to build a report showing the values of the rows that only appear in one of the datasets.
From what I have read, it seems I would like to do this on the SQL side, not the reporting side; I am not sure how to do that.
If someone could shed some light on how that is possible, I would really appreciate it.
You can use the NOT EXISTS clause to get the differences between the two tables.
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
This will show the rows in Table1 that don't appear in Table2. You can reverse the table names to find the rows in Table2 that don't appear in Table1.
Edit: Made an edit to show what the case would be in the event of linked servers. Also, if you wanted to see all of the rows that are not shared in both tables at the same time, you can try something as in the below.
SELECT
Column, 'Table1' TableName
FROM
DatabaseName.SchemaName.Table1
WHERE
NOT EXISTS
(
SELECT
Column
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
Table1.Column = Table2.Column --looks at equalities, and doesn't
--include them because of the
--NOT EXISTS clause
)
UNION
SELECT
Column, 'Table2' TableName
FROM
LinkedServerName.DatabaseName.SchemaName.Table2
WHERE
NOT EXISTS
(
SELECT
Column
FROM
DatabaseName.SchemaName.Table1
WHERE
Table1.Column = Table2.Column
)
You can also use a left join:
select a.* from tableA a
left join tableB b
on a.PrimaryKey = b.ForeignKey
where b.ForeignKey is null
This query will return all records from tableA that do not have corresponding records in tableB.
If you want rows that appear in exactly one data set and you have a matching key on each table, then you can use a full outer join:
select *
from table1 t1 full outer join
table2 t2
on t1.key = t2.key
where t1.key is null and t2.key is not null or
t1.key is not null and t2.key is null
The where condition chooses the rows where exactly one match.
The problem with this query, though, is that you get lots of columns with nulls. One way to fix this is by going through the columns one by one in the SELECT clause.
select coalesce(t1.key, t2.key) as key, . . .
Another way to solve this problem is to use a union with a window function. This version brings together all the rows and counts the number of times that key appears:
select t.*
from (select t.*, count(*) over (partition by key) as keycnt
from ((select 'Table1' as which, t.*
from table1 t
) union all
(select 'Table2' as which, t.*
from table2 t
)
) t
) t
where keycnt = 1
This has the additional column specifying which table the value comes from. It also has an extra column, keycnt, with the value 1. If you have a composite key, you would just replace with the list of columns specifying a match between the two tables.

SQLite table aliases effecting the performance of queries

How does SQLite internally treats the alias?
Does creating a table name alias internally creates a copy of the same table or does it just refers to the same table without creating a copy?
When I create multiple aliases of the same table in my code, performance of the query is severely hit!
In my case, I have one table, call it MainTable with namely 2 columns, name and value.
I want to select multiple values in one row as different columns. for example
Name: a,b,c,d,e,f
Value: p,q,r,s,t,u
such that a corresponds to p and so on.
I want to select values for names a,b,c and d in one row => p,q,r,s
So I write a query
SELECT t1.name, t2.name, t3.name, t4.name
FROM MainTable t1, MainTable t2, MainTable t3, MainTable t4
WHERE t1.name = 'a' and t2.name = 'b' and t3.name = 'c' and t4.name = 'd';
This way f writing the query kills the performance when size of the table increases as rightly pointed above by Larry.
Is there any efficient way to retrieve this result. I am bad at SQL queries :(
If you list the same table more than once in your SQL statement and do not supply conditions on which to JOIN the tables, you are creating a cartesian JOIN in your result set and it will be enormous:
SELECT * FROM MyTable A, MyTable B;
if MyTable has 1000 records, will create a result set with one million records. Any other selection criteria you include will then have to be evaluated across all one million records.
I'm not sure that's what you're doing (your question is very unclear), but it may be a start on solving your problem.
Updated answer now that the poster has added the query that is being executed.
You're going to have to get a little tricky to get the results you want. You need to use CASE and MAX and, unfortunately, the syntax for CASE is a little verbose:
SELECT MAX(CASE WHEN name='a' THEN value ELSE NULL END),
MAX(CASE WHEN name='b' THEN value ELSE NULL END),
MAX(CASE WHEN name='c' THEN value ELSE NULL END),
MAX(CASE WHEN name='d' THEN value ELSE NULL END)
FROM MainTable WHERE name IN ('a','b','c','d');
Please give that a try against your actual database and see what you get (of course, you want to make sure the column name is indexed).
Assuming you have table dbo.Customers with a million rows
SELECT * from dbo.Customers A
does not result in a copy of the table being created.
As Larry pointed out, the query as it stands is doing a cartesian product across your table four times which, as you has observed, kills your performance.
The updated ticket states the desire is to have 4 values from different queries in a single row. That's fairly simple, assuming this syntax is valid for sqllite
You can see that the following four queries when run in serial produce the desired value but in 4 rows.
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a';
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b';
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c';
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d';
The trick is to simply run them as sub queries like so there are 5 queries: 1 driver query, 4 sub's doing all the work. This pattern will only work if there is one row returned.
SELECT
(
SELECT t1.name
FROM MainTable t1
WHERE t1.name='a'
) AS t1_name
,
(
SELECT t2.name
FROM MainTable t2
WHERE t2.name='b'
) AS t2_name
,
(
SELECT t3.name
FROM MainTable t3
WHERE t3.name='c'
) AS t3_name
,
(
SELECT t4.name
FROM MainTable t4
WHERE t4.name='d'
) AS t4_name
Aliasing a table will result a reference to the original table that exists for the duration of the SQL statement.

SELECT clause: Optional Columns?

I want to run a single query on my MYSQL database. The table I am searching has a file_type field that will be one of three values (1,2,3). Depending what these values are I want to look for different information from the database.
So for example if the 'file_type' field is 1 or 2 then I want to SELECT fields a,b,c,d
However if I notice that file_type = 3 then I want to SELECT fields a,b,c,d,e,f
Can this be done in a single SELECT statement? like this - ish
my_simple_query = SELECT file_type,a,b,c,d FROM table1
my_new_query = SELECT file_type,a,b,c,d (AND e,f IF file_type = 3) FROM table1
thanks all
---------------------------------- ADDITION -----------------------------------
And how would I do this if e,f were stored in another table?
my_multitable_query = SELECT file_type,id,a,b,c,d (AND e,f FROM table2 WHERE id=id) FROM table1
get my drift?
No, SQL SELECT statements do not support optional columns.
But you can specify logic to only return values based on criteria, otherwise they would be null:
SELECT a,b,c,d,
CASE WHEN file_type = 3 THEN t2.e ELSE NULL END AS e,
CASE WHEN file_type = 3 THEN t2.f ELSE NULL END AS f
FROM TABLE_1 t1
JOIN TABLE_2 t2 ON t2.id = t1.id
You may be happier using a non-relational database such as CouchDB or MongoDB if you need non-relational data.
In SQL, you have to declare all the columns at the time you write the query. All these columns must be present on every row of the result set; the set of columns cannot vary based on the content in each row. This is practically the definition of being a relational database (along with the requirement that each column has the same type on every row).
Regarding your additional question, this may be getting somewhere:
SELECT t1.file_type, t1.a, t1.b, t1.c, t1.d, t2.e, t2.f
FROM table1 t1
LEFT OUTER JOIN table2 ON t1.id=t2.id;
You'll still get columns e and f on every row of the result set, even if they're NULL because there is no match in table2 for a given row in table1.
This leads to you a pattern called Class Table Inheritance.
Not familiar with mysql, but why not using a UNION?
> SELECT file_type,a,b,c,d FROM table1 WHERE file_type in (1,2)
> UNION
> SELECT file_type,a,b,c,d,e,f FROM table1 WHERE file_type = 3