Merging databases with similar tables structures into one single database - sql

I got two databases with similar data structures. They are basically archived databases having same tables but one having slightly updated data. I want to merge them both into one single database.
When I was trying to merge them both I am facing error saying duplicate data cannot be copied. But i want to copy duplicate data too. I believe this error is mainly due to primary key constraint.
Can anyone please suggest me how to club two databases without losing duplicate data in it?
For example:
Table1:
MemberID Name Class Year
120 Sam B 2005
121 Mark A 2005
122 John A 2005
Table2:
MemberID Name Class Year
120 Sam B 2006
121 Mark A 2006
123 David C 2006
Result table should be:
MemberID Name Class Year
120 Sam B 2005
120 Sam B 2006
121 Mark A 2005
121 Mark A 2006
122 John A 2005
123 David C 2006
Note: memberID is the primary key

Primary keys must be unique. So you need to change the design of the result table so it has some other primary key, such as a field called "ID" that is an identity field (autonumber).
When MemberID is no longer primary, then your insert should work.
You could also add another field to the result table that would indicate if it is archived, then make the primary key a combination of MemberID and archived data. There are many ways to do this, but somehow MemberID cannot be primary in the results table.

Related

Snowflake having duplicate

In My use-case, my scheduled Job reads a CSV and writes to snowflake.
When I schedule this read from CSV and write to snowflake for every hour I see multiple duplicates in snowflake. This is despite that my ID is a PRIMARY KEY (ALTER TABLE tablename ADD PRIMARY KEY (column1).
I understand that Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced. I need help to solve this issue.
To elaborate, Lets consider scenario:
Step 1: At 9AM insert data from CSV to Snowflake
ID Customer name Price
1111 John Mathew 10
1112 David Becham 20
Step 2: At 10PM I get one additional row hence my CSV is
ID Customer name Price
1111 John Mathew 10
1112 David Becham 20
1113 Hello World 40
Expected in Snowflake
ID Customer name Price
1111 John Mathew 10
1112 David Becham 20
1113 Hello World 40
What I get is duplicates as below
ID Customer name Price
1111 John Mathew 10
1112 David Becham 20
1113 Hello World 40
1111 John Mathew 10
1112 David Becham 20
It would help if you provided your code. It looks like you are updating your CSV, which means Snowflake sees the entire file as a new file to be loaded, which will then load the entire file, again. If you are just running a COPY INTO command with no downstream logic, then that is what will happen.
Two options:
1) don't update the CSV file...just create a new one with just the new data. Then, the COPY INTO command will work fine.
2) if you are also receiving updates to previous records, then you should run a COPY INTO into a temporary table and then MERGE that data into your final table on the primary key.
Create another table(second table) to store de-duplicated records. First table will get data from your source(CSV). Then create a stream on top of first table to capture changes. Then create a task for that stream which will merge(insert/update) data into the second table.

SQL Server "pseudo/synthetic" composite Id(key)

Sorry but I don't know how to call in the Title what I need.
I want to create an unique key where each two digits of the number identify other table PK. Lets say I have below Pks in this 3 tables:
Id Company Id Area Id Role
1 Abc 1 HR 1 Assistant
2 Xyz 2 Financial 2 Manager
3 Qwe 3 Sales 3 VP
Now I need to insert values in other table, I know that I may do in 3 columns and create a Composite Key to reach integrity and uniqueness as below:
Id_Company Id_Area Id_Role ...Other_Columns.....
1 2 1
1 1 2
2 2 2
3 3 3
But I was thinking in create a single column where each X digites identify each FK. So the above table 3 first columns become like below (suposing each digit in an FK)
Id ...Other_Columns.....
121
112
222
333
I don't know how to call it and even if it's stupid but it makes sense for me, where I can select for a single column and in case of need some join I just need to split number each X digits by my definition.
It's called a "smart", "intelligent" or "concatenated" key. It's a bad idea. It is fragile, leads to update problems and impedes the DBMS. The DBMS and query language are designed for you to describe your application via base tables in a straightforward way. Use them as they were intended.

How to design a database schema with type and subtype

I've read plenty of supertype/subtype threads and I'm pretty sure I am not asking the same one.
I have the following tables in my database. Note that:
1. Some security types only need Type but require no SubType, such as stocks and bonds.
2. Securties.TypeId is a foreign key pointing to Type.ID.
3. Securties.SubTypeId has no foreign key relationship to BondType or DerivativeType tables. And currently the data integrity is maintained by C# code.
Since lacking of foreign key relationship is bad, I want to refactor this DB to have it. Given that this DB is already in production, what's the best way to improve it while limiting the software risk? i.e., one way to do it is to combine all XXXType tables into a single table and have all SubTypeIds rearranged, but clearly that involves updating tons of records in the Securites table. So it's considered a more risky approach than another one which doesn't require changing values.
[Securites]
ID Name TypeId SubTypeId
1 Stock1 2 NULL
2 Fund1 3 NULL
3 Bond1 1 3
4 Deriv1 4 3
[Type]
ID Name
1 Bond
2 Stock
3 ETF
4 Derivative
[BondType]
ID Name
...
2 GovermentBond
3 CorporateBond
4 MunicipalBond
...
[DerivativeType]
ID Name
...
2 Future
3 Option
4 Swap
...

SQL Server 2012 Query to extract subsets of data

I'm trying to 2nf some data:
Refid | Reason
------|---------
1 | Admission
1 | Advice and Support
1 | Behaviour
As you can see one person might have multiple reasons so i need another table to have the following format:
Refid | Reason1 | Reason2 | Reason3 | ETC...
------|-----------|--------------------|-----------
1 | Admission | Advice and Support | Behaviour
But I don't know how to write a query to extract the data and write it in a new table like this. The reasons don't have dates of other criteria that would make any reason to be in any special order. All reasons are assigned at the time of referral.
Thanks For yor Help.. SQL Server 2012
You are modelling a many to many relationship
You need 3 tables
- One for Reasons (say ReasonID and Reason)
- One for each entity identified by RefID (say RefID and ReferenceOtherData)
- An junction (or intersection) table with the keys (RefID, ReasonID)
This way,
Multiple reasons can apply to one Ref entity
Multiple Refs can have the same reason
You turn repeated columns into rows.

How to merge two identical database data to one?

Two customers are going to merge. They are both using my application, with their own database. About a few weeks they are merging (they become one organisation). So they want to have all the data in 1 database.
So the two database structures are identical. The problem is with the data. For example, I have Table Locations and persons (these are just two tables of 50):
Database 1:
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
1 Location A
2 Location B
Persons:
Id LocationId Name etc...
1 1 Mark
2 2 Ashley
3 1 Ben
We see that person is related to location (column locationId). Note that I have more tables that is referring to the location table and persons table.
The databases contains their own locations and persons, but the Id's can be the same. In case, when I want to import everything to DB2 then the locations of DB1 should be inserted to DB2 with the ids 3 and 4. The the persons from DB1 should have new Id 4,5,6 and the locations in the person table also has to be changed to the ids 4,5,6.
My solution for this problem is to write a query which handle everything, but I don't know where to begin.
What is the best way (in a query) to renumber the Id fields also having a cascade to the childs? The databases does not containing referential integrity and foreign keys (foreign keys are NOT defined in the database). Creating FKeys and Cascading is not an option.
I'm using sql server 2005.
You say that both customers are using your application, so I assume that it's some kind of "shrink-wrap" software that is used by more customers than just these two, correct?
If yes, adding special columns to the tables or anything like this probably will cause pain in the future, because you either would have to maintain a special version for these two customers that can deal with the additional columns. Or you would have to introduce these columns to your main codebase, which means that all your other customers would get them as well.
I can think of an easier way to do this without changing any of your tables or adding any columns.
In order for this to work, you need to find out the largest ID that exists in both databases together (no matter in which table or in which database it is).
This may require some copy & paste to get a lot of queries that look like this:
select max(id) as maxlocationid from locations
select max(id) as maxpersonid from persons
-- and so on... (one query for each table)
When you find the largest ID after running the query in both databases, take a number that's larger than that ID, and add it to all IDs in all tables in the second database.
It's very important that the number needs to be larger than the largest ID that already exists in both databases!
It's a bit difficult to explain, so here's an example:
Let's say that the largest ID in any table in both databases is 8000.
Then you run some SQL that adds 10000 to every ID in every table in the second database:
update Locations set Id = Id + 10000
update Persons set Id = Id + 10000, LocationId = LocationId + 10000
-- and so on, for each table
The queries are relatively simple, but this is the most work because you have to build a query like this manually for each table in the database, with the correct names of all the ID columns.
After running the query on the second database, the example data from your question will look like this:
Database 1: (exactly like before)
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
10001 Location A
10002 Location B
Persons:
Id LocationId Name etc...
10001 10001 Mark
10002 10002 Ashley
10003 10001 Ben
And that's it! Now you can import the data from one database into the other, without getting any primary key violations at all.
If this were my problem, I would probably add some columns to the tables in the database I was going to keep. These would be used to store the pk values from the other db. Then I would insert records from the other tables. For the ones with foreign keys, I would use a known value. Then I would update as required and drop the columns I added.