Compare two tables and save the difference in a file - sql

I have two tables in two different Oracle databases, they look the same (same column names etc) but the data is mostly different. I would like to compare them and save the difference in a third database (or just save it in an easily imported format).
The tables aren't huge but its still like 40 million rows in each table and would like help to do the compare in an efficient way.
There is no keys or unique columns but there are no columns with the same Nr and Name
Table:
Nr Name AText
1234 Jon Doe Ksjfkjsdkfjksdfsf
3234 Jon Sho sdfsdfasdfsdf
1434 Ian Doe lksjdfkljlkjsdfkj

If you're not trying to do this programmatically, you should take a look at SQL Data Compare from Red Gate. I believe it does exactly what you're looking for.

Depends on what you want to find.
For example, if tables are very simmilar, you can make two exports to txt files but ordered(select * from table order by 1, 2, 3) and then try a diff -h between these files. This is somehow fast.
Or, you can import one table in the other database, and try minus, but this is slow. Advantage: you can minus (col1, col2) and exclude col3...

Related

Redshift IN condition on thousands of values

What's the best way to get data that matches any one of ~100k values?
For this question, I'm using an Amazon Redshift database and have a table something like this with hundreds of millions of rows:
--------------------
| userID | c1 | c2 |
| 101000 | 12 | 'a'|
| 101002 | 25 | 'b'|
____________________
There are also millions of unique userIDs. I have a CSV list of 98,000 userIDs that I care about, and I want to do math on the columns for those specific users.
select c1, c2 from table where userID in (10101, 10102, ...)
What's the best solution to match against a giant list like this?
My approach was to make a python script that read in the result of all users in our condition set, then filtering against the CSV in python. It was dead slow and wouldn't work in all scenarios though.
A coworker suggested uploading the 98k users into a temporary table, then joining against in in the query. This seems like the smartest way, but I wanted to ask if you all had ideas.
I also wondered if printing an insanely long SQL query containing all 98k users to match against and running it would work. Out of curiosity, would that even have ran?
As your coworker suggests, put your IDs into a temporary table by uploading a CSV to S3 and then using COPY to import the file into a table. You can then use an INNER JOIN condition to filter your main data table on the list of IDs you're interested in.
An alternative option, if uploading a file to S3 isn't possible for you, could be to use CREATE TEMP TABLE to set up a table for your list of IDs and then use a spreadsheet to generate a whole of INSERT statements to populate the temp table. 100K of inserts could be quite slow though.

Comparing the data of two tables in the same database in sqlserver

I need to compare the two table data with in one database.match the data using some columns form table.
Stored this extra rows data into another table called "relationaldata".
while I am searched ,found some solutin.
But it's not working to me
http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx
can any one help how to do this.
How compare two table data with in one database using redgate(Tool)?
Red Gate SQL Data Compare lets you map together two tables in the same database, provided the columns are compatible datatypes. You just put the same database in the source and target, then go to the Object Mapping tab, unmap the two tables, and map them together.
Data Compare used to use UNION ALL, but it was filling up tempdb, which is what will happen if the table has a high row count. It does all the "joins" on local hard disk now using a data cache.
I think you can use Except clause in sql server
INSERT INTO tableC
(
Col1
, col2
, col3
)
select Col1,col2,col3from tableA
Except
select Col1,col2,col3 from tableB
Please refer for more information
http://blog.sqlauthority.com/2008/08/07/sql-server-except-clause-in-sql-server-is-similar-to-minus-clause-in-oracle/
Hope this helps

SQL query: have results into a table named the results name

I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);

trying to determine unique identifier for database table

I have a database table with many columns and there is no specified primary key. There isn't a list of super keys either. Besides iteratively trying all candidate keys/columns, is there a way for me, using SQL, to try and figure our whether a subset of keys can make a unique identifier for my table?
For example, a table may have 4 columns first name, last name, address and zip and the data I see is:
John, Smith, 1 main st, 00001
Mary, Smith, 1 main st, 00001
Mary, Smith, 2 sub st, 00002
In this case, I'll need first, last and zip as my unique key.
John, Smith, 1 main st, 00001
John, Smith, 1 main st, 00001
In this case, there is no unique key.
Please don't comment on my table construction and/or normalization of databases, I'm just trying to find a practical answer. Thanks.
This is my question: Besides iteratively trying all candidate keys/columns, is there a way for me, using SQL, to try and figure our whether a subset of keys can make a unique identifier for my table?
Looking for a subset of unique values in this case seems so specific to the particular data set. What if you arrive at a subset today and find you can't insert a new row tomorrow?
Use an artificial key, like an auto-incrementing integer.
In short: no, there's no way to do this in T-SQL really.
My advice: just add a ID INT IDENTITY PRIMARY KEY column to the table. It's guaranteed to be unique, it will be filled automagically when you create it, it's fast and easy, no messy "is this really unique or are there any combinations of rows that violate the uniqueness" questions......
Just do it - it's the easiest way to go!!
You cannot find if a combination "can" make a primary key. You can find if one WILL make a good primary key for an existing set of data.
To find if a set of fields is candidate or not, you can count the distinct of those fields (using group-by with rollup) and compare that with count (*)
There is a much faster method.
Enterprise dbms have had it for many years but MS SQL Server 2005 (useable in 2008) and later provided the HashBytes() function. Convert the columns to CHAR() (VARCHAR on MS), concatenate them; then hash them; then compare the hashes. You can compare the two tables in a single SELECT command. IIRC max 8000 characters per row.
(If you use this answer, please undo and redo your Answer choice.)
if you are comparing two databases, then you can see if any duplicate rows exist in the source db with structures like this:
select a,b,c,d
from mytable
having count(*) > 1
group by a,b,c,d
include all columns.
then use all columns as the 'row key' to see if it exists in the target system
there are update anomalies in this schema:
you cannot a person without knowing his address
better approach is to separate to three tables, one for persons and one for PersonAddress
> perons: id,firstname, lastname
> address: id,address:
> personaddress: personid, addressid
You cannot find if a combination "can" make a primary key.
I actually disagree with this, I think it is possible to write a query that will SELECT all possible permutations of columns from the table and combine each permutation into a single unique value (the simplest, crudest way is to CAST them all to VARCHAR and connect them with a spacer character - a better way would be some kind of hash function).
With a single pass you would then have set of columns like P1, P12, P123, P2, P23, P3 etc (in case of three columns). Then you can do a query with COUNT(*) vs COUNT(DISTINCT) for each permutation column and you will see which permutations are unique.
Using dynamic SQL you could probably make it so that it would work on any table, although I don't know about the column limit for SQL Server.

Access - Merge two databases with identical structure

I would like to write a query that merges two Access 2000 databases into one. Each has 35 tables with identical fields and mostly unique data. There are some rows which will have the same "primary key" in which case the row from database A should always take precedence over database B. I use quotes around "primary key" because the databases are generated without any keys or relationships. For example:
Database A, table1
col1 col2
Frank red
Debbie blue
Database B, table1
col1 col2
Harry orange
Debbie pink
And the results I would like:
col1 col2
Frank red
Harry orange
Debbie blue
These databases are generated and downloaded by non-sql-savvy users, so I would like to just give them a query to copy and paste. They will obviously have to start by importing or linking one DB [in]to another.
I'm guessing I will have to make a third table with the combined results query and then delete the other two. Ideally, though, it would just add database A's rows to database B's (overriding where necessary).
I'm of course not looking for a complete answer, just hoping for some advice on where to start. I have some mySQL experience and understand the basics of joins. Is it possible to do this all in one query, or will I have to have a separate one for each table?
THANKS!!
How about:
SELECT t.ID, t.Field1, t.Field2 INTO NewTable FROM
(SELECT a.ID, a.Field1, a.Field2
FROM Table1 A
UNION
SELECT x.ID, x.Field1, x.Field2
FROM Table1 x IN 'C:\docs\db2.mdb'
WHERE x.ID NOT IN (SELECT ID From Table1)) t
This isn't an SQL solution, but may work just as well as telling your non-sql savvy users to cut and paste SQL statements.
I suggest defining a unique Key on the table that takes precedence (col1).
Then copy all the data from Database B into the 'master' table.
This will fail for all duplicates, but insert any 'new' records.
Remove the unique key constraint after you're done if necessary.
From your question it looks like you need the resulting table in database B. So you may want to have your users copy the table into database B before you start or after you're done.