How to merge two identical database data to one? - sql

Two customers are going to merge. They are both using my application, with their own database. About a few weeks they are merging (they become one organisation). So they want to have all the data in 1 database.
So the two database structures are identical. The problem is with the data. For example, I have Table Locations and persons (these are just two tables of 50):
Database 1:
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
1 Location A
2 Location B
Persons:
Id LocationId Name etc...
1 1 Mark
2 2 Ashley
3 1 Ben
We see that person is related to location (column locationId). Note that I have more tables that is referring to the location table and persons table.
The databases contains their own locations and persons, but the Id's can be the same. In case, when I want to import everything to DB2 then the locations of DB1 should be inserted to DB2 with the ids 3 and 4. The the persons from DB1 should have new Id 4,5,6 and the locations in the person table also has to be changed to the ids 4,5,6.
My solution for this problem is to write a query which handle everything, but I don't know where to begin.
What is the best way (in a query) to renumber the Id fields also having a cascade to the childs? The databases does not containing referential integrity and foreign keys (foreign keys are NOT defined in the database). Creating FKeys and Cascading is not an option.
I'm using sql server 2005.

You say that both customers are using your application, so I assume that it's some kind of "shrink-wrap" software that is used by more customers than just these two, correct?
If yes, adding special columns to the tables or anything like this probably will cause pain in the future, because you either would have to maintain a special version for these two customers that can deal with the additional columns. Or you would have to introduce these columns to your main codebase, which means that all your other customers would get them as well.
I can think of an easier way to do this without changing any of your tables or adding any columns.
In order for this to work, you need to find out the largest ID that exists in both databases together (no matter in which table or in which database it is).
This may require some copy & paste to get a lot of queries that look like this:
select max(id) as maxlocationid from locations
select max(id) as maxpersonid from persons
-- and so on... (one query for each table)
When you find the largest ID after running the query in both databases, take a number that's larger than that ID, and add it to all IDs in all tables in the second database.
It's very important that the number needs to be larger than the largest ID that already exists in both databases!
It's a bit difficult to explain, so here's an example:
Let's say that the largest ID in any table in both databases is 8000.
Then you run some SQL that adds 10000 to every ID in every table in the second database:
update Locations set Id = Id + 10000
update Persons set Id = Id + 10000, LocationId = LocationId + 10000
-- and so on, for each table
The queries are relatively simple, but this is the most work because you have to build a query like this manually for each table in the database, with the correct names of all the ID columns.
After running the query on the second database, the example data from your question will look like this:
Database 1: (exactly like before)
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
10001 Location A
10002 Location B
Persons:
Id LocationId Name etc...
10001 10001 Mark
10002 10002 Ashley
10003 10001 Ben
And that's it! Now you can import the data from one database into the other, without getting any primary key violations at all.

If this were my problem, I would probably add some columns to the tables in the database I was going to keep. These would be used to store the pk values from the other db. Then I would insert records from the other tables. For the ones with foreign keys, I would use a known value. Then I would update as required and drop the columns I added.

Related

Excluding data pairs from a query based on a table?

I have a massive and messy database of facilities where there are many duplicates. Addresses have been entered in such a haphazard way that I will be making many queries to identify possible duplicates. My objective is for each query to identify the possible duplicates, and then a person actually goes through the list and marks each pairing as either "not a duplicate" or "possible duplicate."
When someone marks a facility pair as not a duplicate, I want to record that data pair in a table so when that when one of the queries would otherwise return that pairing, it is instead excluded. I am at a loss for how to do this. I'm currently using MS Access for SQL queries, and have rudimentary visual basic knowledge.
Sample of how it should work
Query 1 is run to find duplicates based on city and company name. It brings back that facilities 1 and 2, 3 and 4, 5 and 6 are possible duplicates. The first two pairings are duplicates I need to go fix, but that 5 and 6 are indeed separate facilities. I click to record that facilities 5 and 6 are not duplicates, which records the data in a table. When query 1 is run again it does not return that 5 and 6 are possible duplicates.
For reference, the address duplicates look something like this, which is why there need to be multiple queries
Frank's Garage, 123 2nd St
Frank's Garage LLC, LLC, 123 Second st
Frank's Garage and muffler, 123 2nd Street
Frank's, 12 2nd st
The only way I know to fix this is to create a master table of company names and associate this table PK with records in original table. It will be a difficult and tedious process to review records and eliminate duplicates from master and associate remaining PK of a duplicate group to the original records (as you have discovered).
Create a master table of DISTINCT company and address data from original table. Include autonumber field to generate key. Join tables on company/address fields and UPDATE a field in original table with this key. Have another field in original table to receive a replacement foreign key.
Have a number field (ReplacementPK) in master table. Sort and review records and enter the key you want to retain for company/address duplicates group. Build a query joining tables on original key fields, update NewFK field in original table with selected ReplacementPK from master.
When all looks good:
Delete company and address and original FK fields from original table.
Delete records from master where PK does not match ReplacementPK.

Modifying column in access

I have 2 tables in MS Access, TableA and TableB. Table A has only 1 field: myFieldID, and TableB has only 1 field: myFieldName (In reality I have more fields, but these are the ones that matter for the sake of my problem).
Both tables have records that mean the same thing, but written in a different, but similar way.
For instances TableA has:
|TableA.myFieldId |
|-----------------|
|MM0001P |
|HR0003P |
|MH0567P |
So as you can see all of the records are formated this way (with a P at the end):
([A-Z][A-Z][0-9][0-9][0-9][0-9]P)
then, TableB has:
|TableB.myFieldName |
|--------------------------------------------|
|MH-0567 Materials Handling important Role |
|MM-0001 Materials Management Minor Role |
|HR-0003 Human Resources Super Important Role|
So this one has the format (without 'P' at the end):
([A-Z][A-Z]-[0-9][0-9][0-9][0-9] ([A-Z]|[a-z]*))
First, I would like to make join queries with tableA and tableB on these fields, but as you can see, results will be NULL every time since both fields have completely different records.
So I would like to change every name in TableA.myFieldId with his corresponding name in TableB.myFieldName
Problem is, that both tables have around 1 million records, and the fields are repeated multiple times in both tables, plus I don't know how to do this (MS Access doesn't even let me use Regular Expressions).
I would make a table (or query, if it changes often enough) of all unique entries in the 2nd table and the corresponding key for the 1st table. Then use that table or query to help join the two tables.
Something like
Select myFieldName as FName, left(myFieldName,2) & mid(myFieldName,4,4) & "P" as FID
from TableB
group by FName, FID
Important note - are all IDs found in both files, or do you have records in either table that are not in the other? If they don't always match, you may need additional logic or steps to make a master table from both tableA and tableB.

deleting from two tables in single script in sql

I have three tables xx_1 , xx_2, xx_3 such that :
xx_1
id obj_version_num location
1 x ubudu
2 x bali
3 x india
xx_2
id name grade
1 abc band 1
2 xyz band 2
3 gdgd band 3
xx_3 has :
Name details col1 p_id
abc A HDHD 10
xyz B HDHD 20
gdgd C HDHD 30
smith D HDHD 40
I want to delete data from xx_1 and xx_2 if the name is smith in xx_3
Currently i am doing :
delete from xx_1
where id in (select distinct id from xx_2 t ,xx_3 k
where t.name=k.name
and k.name ='Smith')
and then
delete from xx_2
where name ='Smith'
Is there anyway i can delete data from both these table together ? without creating two separate scripts ?
There is no way to delete from many tables with a single statement, but the better question is why do you need to delete from all tables at the same time? It sounds to me like you don't fully understand how transactions work in Oracle.
Lets say you login and delete a row from table 1, but do not commit. As far as all other sessions are concerned, that row has not been deleted. If you open another connection and query for the row, it will still be there.
Then you delete from tables 2, 3 and then 4 in turn. You still have not committed the transaction, so all other sessions on the database can still see the deleted rows.
Then you commit.
All at the same time, the other sessions will no longer see the rows you deleted from the 4 tables, even though you did the deletes in 4 separate statements.
EDIT after edit in question:
You can define the foreign keys on the 3 child tables to "ON DELETE CASCADE". Then when you delete from the parent table, all associated rows from the 3 child tables are also deleted.
You cannot delete from multiple tables in a single statement, primary key or not.

What is the best way to copy data from related tables to another related tables?

What is the best way to copy data from related tables to another related tables with same schema. Table are connected with one-to-many relationship.
Consider following schema
firm
id | name | city.id (FK)
employee
id | lastname | firm.id (FK)
firm2
id | name | city_id (FK)
employee2
id | lastname |firm2.id (FK)
What I want to do is to copy rows from firm with specific city.id to firm2 and and their employees assosiated with firm to table employee2.
I use posgresql 9.0 so I have to call SELECT nextval('seq_name') to get new id for table.
Right now I perform this query simply iterating over all rows in Java backend server, but on huge amount of data (50 000 employee and 2000 of firms) it takes too much time ( 1-3 minutes).
I'm wondering is there another more tricky way to do it, for example select data into temp table? Or probably use store procedure and iterate over rows with cursror to avoid buffering on my backend server?
This is one problem caused by simply using a sequence or identity value as your sole primary key in a table.
If there is a real-life unique index/primary key, then you can join on that. The other option would be to create a mapping table as you fill in the tables with sequences then you can insert into the children tables' FKs by joining to the mapping tables. It doesn't completely remove the need for looping, but at least some of the inserts get moved out into a set-based approach.

Need SQL to shift entries from one table to another

Heres the situation. I have 2 tables here of the schema:
ID | COMPANY_NAME | DESC | CONTACT
ID | COMPANY_ID | X_COORDINATE | Y_COORDINATE
The first tabel contains a list of companies and the second contacts coordinates of the companies as mentioned.
The thing is that I want to merge the data in this table with the data in another set of tables which already have data. The other tables have similar structure but are already propopulated with data. The IDs are autoincremental.
SO if we have lets say companies marked 1-1000 in table1 and companies marked 1-500 in table 2. We need it merged such that ID number 1 in table 2 becomes ID 1001 when migrated to the other table. And side by side we would also want to migrated the entries in the coordinates table as well in such a way that they map with the new ids of the table. Can this be done in SQL or do I need to resort to using a script here for this kind of work.
i`m not sure i understand how many tables are there and who is table 1 ,2, but the problem is pretty clear. i think the easy way is:
back up all your database before you start this process
add a column to the destination table that will contain the original id.
insert all the records you want to merge (source) into the destination table, putting the original id in the column you added.
now you can update the geo X,Y data using the old ID
after all is done and good you can remove the original id column.
EDIT: in reply to your comment , i`ll add teh code here, since its more readable.
adapted from SQL Books Online: insert rows from another table
INSERT INTO MyNewTable (TheOriginalID, Desc)
SELECT ID, Desc
FROM OldTable;
Then you can do an update to the new table based on values from the old table like so:
UPDATE MyNewTable SET X = oldTable.X , Y = oldTable.Y where
FROM MYNewTable inner JOIN OldTable ON MYNewTable.TheOriginalID = OldTable.ID