Merge two versions of database tables with conflicting keys - sql

I have been asked to merge 2 Access databases. They are conflicting versions of the same file.
A database was emailed to somebody. (I know.) Somebody added records to the 'main' copy while somebody else added records to their copy. I want to add the new records from the 'unauthorised' copy into the main version, before utterly destroying all other copies.
Unfortunately, the database has several related tables. As would naturally happen when records are added, records in different versions have conflicting primary keys. These conflicting keys are also used as foreign keys in the new records. A foreign key reference to ID x means different things in the 2 versions.
Is there any hope? I thought of maybe importing it all into excel and using formulas to update the primary and foreign keys.
Is there any way to fix this programatically?
EDIT: Here is a picture showing the full relationships. Tables teachers, tests, and test_results have been changed; the others are the same in both.

In the main database, add a Long field named [oldID] to each table into which you need to append data. Then create Linked Tables pointing to the relevant tables in the "other" database. Since the table names are the same, the linked tables will have a '1' appended to them.
For this example, we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
[teachers1]
ID teacher
-- --------
1 TeacherA
2 TeacherB
3 TeacherY
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
[tests1]
ID test_name teacher
-- -------------- -------
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherY_Test1 3
5 TeacherY_Test2 3
Make a note of where the tables diverge. In this case the [teachers] tables diverge after ID=2. So, insert the new rows from [teachers1] into [teachers], putting [teachers1].[ID] into [teachers].[oldID] so we can map old IDs to new ones:
INSERT INTO [teachers] ([teacher], [oldID])
SELECT [teacher], [ID] FROM [teachers1] WHERE [ID]>2
So now we have
[teachers]
ID teacher oldID
-- -------- -----
1 TeacherA
2 TeacherB
3 TeacherX
4 TeacherY 3
Now when we append the new rows from [tests1] into [tests] we can use an INNER JOIN on [teachers].[oldID] to adjust the foreign key values that get inserted:
INSERT INTO [tests] ([test_name], [teacher], [oldID])
SELECT [tests1].[test_name], [teachers].[ID], [tests1].[ID]
FROM [tests1] INNER JOIN [teachers] ON [tests1].[teacher]=[teachers].[oldID]
giving us
[tests]
ID test_name teacher oldID
-- -------------- ------- -----
1 TeacherA_Test1 1
2 TeacherA_Test2 1
3 TeacherB_Test1 2
4 TeacherX_Test1 3
5 TeacherY_Test1 4 4
6 TeacherY_Test2 4 5
Notice how the [teacher] foreign key has been mapped from the value 3 in [tests1] to 4 in [tests], reflecting the new [teachers].[ID] value for 'TeacherY'.
You can then repeat the process for child tables of [tests].
(Once the cleanup is complete you can remove the table links and drop the [oldID] columns.)

Is there any way to fix this programatically?
No. This must be done by a human capable of reading and understanding the data and taking decisions.
Create a query with an inner join between table one and table two, another query with an outer join between table one and table two, and another query with an outer join between table two and table one.
Now you can study the differences and decide which version of similar records to be kept and which records are completely new and should be kept - some with a new Primary Key.

Related

Is it possible to create a many-to-many relationship with two columns that contain duplicated values?

I have two tables that I would like to create a relationship between them:
One is called Products and it contains data as such, the id is the the primary key:
id product_id name env
1 1 Python prod
2 1 Python test
3 1 Python uat
4 2 Rusty test
5 2 Rusty prod
Licence table is like that, it has no primary key this table:
product_id name phase_type
1 Python Available
1 Python Extension
1 Python Obsolete
2 Rusty Available
2 Rusty Extension
2 Rusty Obsolete
I would like to create a relationship between these two tables, however the product_id in table Products is duplicated and the same happens in table Licence. Is it possible to create at all a relationship between these tables ?
Sample data look like referencing the same set of, say, ProductTypes. You can create a table
ProductType
Id(PK) Name
1 Python
2 Rusty
and refactor your current tables excluding Name column
Product
id productType_id(FK to ProductType.Id) env
1 1 prod
2 1 test
3 1 uat
4 2 test
5 2 prod
Licence
productType_id(FK to ProductType.Id) phase_type
1 Available
1 Extension
1 Obsolete
2 Available
2 Extension
2 Obsolete
Then you can combine above data in your queries at will, join on productType_id for example.

Are Append Queries in a self joining table possible

I have tblEmployee that contains 3 fields:
ID: AutoNumber
Name: Text
Supervisor: Number [as a lookup in tblEmployee]
I wish to append new data to this table from tblNewEmployees that has the exact same structure as the previous table.
Can this be done if I have the ID field as an autonumber?
I have tried various queries (for example first appending only the Name field as step 1, and then trying with a second update query to get the supervisor) but all produced garbage, hence my question whether this is possible using AutoNumbers in the first place.
I suppose it can be done in your structure (adjacency list model) in an iterative way i.e. add the employee(s) at the top of the tree, query the database to get their auto-generated id(s), then add the employees in the next level down using the previously queried id(s), then repeat for each level down.
While possible, is it desirable? Presumably every employee already has a unique id e.g. payroll number, social security number, etc. If unsure, ask the payroll person.
Removing the dependency of the database in generating employee ids will probably free you from the aforementioned iterative process. It is preferable for inserts to be deterministic, predictable, scritpable as a one-off, etc.
Another thing to consider is that your may be modelling a tree structure when you may want a hierarchy. The examples Celko used to give: the army is a hierarchy because if you shoot your sergeant you still have to take orders from your captain; on the other hand, a river system is tree because if your dam one tributary then all downstream tributaries run dry.
It seems to me with your design, when a supervisor leaves (is deleted from the table) then you are left with an unsupervised employee (missing data, therefore data integrity is corrupted), whereas you'd want the next senior employee to take their place (hierarchy). An update in your structure could be a lot if work i.e. iterative again.
While the adjacency list model may be intuitive, it is not always the easiest to work with in SQL DML. Consider other models e.g. nested sets. That said, with Access, SQL DML is almost always painful because it doesn't support procedural SQL code in stored procs, triggers, etc; even a simple update can fail due to 'non-updatable query' (view) restrictions. So as usual, I must advise you to consider a more capable DBMS if at all possible.
Yes, it is possible to merge the two tables when the destination table has an AutoNumber ID. There are two possible scenarios:
Scenario 1: No overlap of ID values between the two tables
[tblEmployee]
ID Name Supervisor
-- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
[tblNewEmployees]
ID Name Supervisor
--- ---------- ----------
101 Director B
102 Manager B 101
103 Worker B 102
Since the Access Database Engine allows us to insert arbitrary values into an AutoNumber column, this case is trivial. Just ...
INSERT INTO tblEmployee (ID, [Name], Supervisor)
SELECT ID, [Name], Supervisor FROM tblNewEmployees
... and we're done:
[tblEmployee]
ID Name Supervisor
--- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
101 Director B
102 Manager B 101
103 Worker B 102
Scenario 2: Common ID values between the two tables
[tblEmployee]
ID Name Supervisor
-- ---------- ----------
1 Director A
2 Manager A 1
3 Worker A 2
[tblNewEmployees]
ID Name Supervisor
-- ---------- ----------
2 Director B
5 Manager B 2
7 Worker B 5
In this case we need to map the old ID values to the new ID values when the new rows are inserted. To do that, add a new column to [tblEmployee]
ALTER TABLE tblEmployee ADD oldID LONG
then insert the new rows, putting tblNewEmployees.ID into tblEmployee.oldID
INSERT INTO tblEmployee (oldID, [Name])
SELECT ID, [Name] FROM tblNewEmployees
giving us
[tblEmployee]
ID Name Supervisor oldID
-- ---------- ---------- -----
1 Director A
2 Manager A 1
3 Worker A 2
4 Director B 2
5 Manager B 5
6 Worker B 7
Then we can update the Supervisor column with the new ID values
UPDATE
(
tblEmployee emp
INNER JOIN
tblNewEmployees new
ON emp.oldID = new.ID
)
INNER JOIN
tblEmployee emp2
ON new.Supervisor = emp2.oldID
SET emp.Supervisor = emp2.ID
producing
[tblEmployee]
ID Name Supervisor oldID
-- ---------- ---------- -----
1 Director A
2 Manager A 1
3 Worker A 2
4 Director B 2
5 Manager B 4 5
6 Worker B 5 7
We can then drop the [oldID] column if desired.

Join Table vs Foreign Key/Ref

Imagine I have two tables and they have a 1-to-Many relationship. Is it better to have a Join table storing the relationship, or issuing a foreign key in one of these tables? Take a look of these two situations:
Situation A:
Table 1: CreditCard
Table 2: Person
It seems to me quite making sense to put the creditCard_id as part of the Person table
Situation B:
Table 1: Order
Table 2: Person
This time I think I will put the order_id and person_id in a Join table?
Am I making a mistake in the above? Is there a standard/better way of determining this?
For 1 to Many relation, people usually put the foreign key into the heavier table or the "Many" table.
So from your example, both go CreditCard and Order tables, by doing so you will remove duplicate data.
Imagine you which one is better:
FK goes to the "Many" table
Table People:
ID NAME
1 A
2 B
Table CreditCard:
ID PEOPLE_ID
1 1
2 1
FK goes to "1" table:
Table People:
ID NAME CreditCard_ID
1 A 1
1 A 2
2 B 3
Table CreditCard:
ID
1
2
3
Note: See how the ID and Name are repeated(ID=1, NAME=A) in the second example, that happens if you put the FK in the wrong table.
I would make three tables; a person table with all their info( name, address, etc. ), a credit card table with all the info( expiration date, security number?, etc.. ) then another table connecting them with the PersonID and CreditCardID. But what do I know, I'm still in school lol so wait for someone else to answer you.

SQL - Linking two tables

I have two tables, specifically, they contain standard and specific parameters respectively.
Table1:
PKParameter Name Unit
1 Temperature K
2 Length mm
3 Pressure bar
Table2:
PKSpecParam Name Unit
1 Weight kg
2 Area m2
PKParameter ans PKSpecParameter are primary keys
I would like to combine these two tables into a third table which will keep track of the primary keys so I can reference any of the parameters, regardless of the table they are from.
For example:
PKCombined PKParameter PKSpecParameter
1 1 NULL
2 2 NULL
3 3 NULL
4 NULL 1
5 NULL 2
Now I would like to use PKCombined primary key to reference parameter
Maybe there is a better way to do this, but I've just started meddling with databases.
Select a.PKParameter , a.name,a.unit,b.PKSpecParam , b.name,b.unit
from table1 a outer join table2 b on a.pkparameter=b.pkspecparam
However, this will give out null values if number of entries in pkparameter and pkspecparam dont match

SQL Constraint/Check on Join tables

I have three tables: store, product, storeproduct.
It doesn't really matter what's in the store and the product table, just know there is a storeID in the store table, and a productID in the product table. However the storeproduct table keeps track of the different products each store has. So the storeproduct table has two columns. The storeID column, and the productID column, both foreign keys from the store and the product table.
Is there a way to put a constraint or check on any of the table to make sure that a store must have more than 0 products, and less than 50 products.
Note: I do not want a select statement to do this. I just want to know if there is a way to put a constraint or a check when creating the tables.
The point of this is so a user cannot insert into the storeproduct table if there are already 50 products(rows) with the same storeID, or delete from the storeproduct table if deleting a row will cause the last row with that storeID to be gone.
The storeproduct table might look like this
storeID productID
1 1
1 2
1 3
2 4
2 5
2 6
2 7
3 4
3 2
3 6
3 1
3 8
Actually, depending on your database you may be able to do this.
Oracle (and maybe others) provide materialized views which you can apply constraints to. So you could create the MV with a column PRODUCTS_IN_STORES (being something like select storeID, count(*) as PRODUCTS_IN_STORES from stores left outer join storeproduct on store.storeid=storeproduct.storeid group by store.storeid .Then put a constraint on it asserting that PRODUCTS_IN_STORES is between 0 and 50 or whatever.
http://www.sqlsnippets.com/en/topic-12896.html
and
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:21389386132607
Not a complete answer for you, but something to think about and hopefully set you on your way.