Migrate data between two SQL databases with identity column - sql

Here is the scenario... I've two databases (A & B) with same schema but different records. I'd like to transfer B's data into corresponding tables in DB A.
Lets say we have tables named Question and Answer in both databases. DB A contains 10 records in Question table and 30 in Answer table. Both tables have identity column Id starting with 1(& auto increment), and there is 1 to many relation between Question and Answer.
In DB B, we have 5 entries in Question table and 20 in Answer.
My requirement is to copy data of both tables from source DB B into destination DB A without having any conflict in identity column while maintaining the relation between two tables during data transfer.
Any solution or potential workaround would be highly appreciated.

I will not write SQL here but here is what I think can be done. Make sure to use Identity insert ON and OFF.
Take maxids of both tables from DB A like A_maxidquestion and A_maxidanswer.
Select from B_question . In select column add derived col QuestionID+A_maxidquestion.This will be your new ID.
Select from B_Answer . In select column add derived col AnswerID+A_maxidanswer and fk id as QuestionID+A_maxidquestion.
Note- Make sure Destination table is not beeing used by any other process for inserting values while you are inserting

One of the best approaches to something like this is to use the OUTPUT clause. https://learn.microsoft.com/en-us/sql/t-sql/queries/output-clause-transact-sql?view=sql-server-2017 You can insert the new parent and capture the newly inserted identity value which you can use to insert the children.
You can do this set based if you also include a temp table which would hold the original identity value and the new identity value.
With no details of the tables that is the best I can do.

Related

Adding columns to existing redshift table

I have a database which contains more than 30m records, and I need to add two new columns to the database. The problem is that I need these columns to be NOT NULL, and without a default value. I thought that I would just add these columns without the NOT NULL constraint, fill them with data, then add the constraint, but Redshift doesn't support that. I have an other solution in my mind, but I wonder if there is any more simpler solution than this?
Create the two new columns with NOT NULL and DEFAULT
Filling the columns with data
Creating an empty table with the same columns as the target DB. (Of course the two new columns would be just NOT NULL)
Inserting everything from the target DB to the new DB.
Dropping the target DB
Renaming the new DB to the target.
I would suggest:
Existing Table-A
Create a new Table-B that contains the new columns, plus an identity column (eg customer_id) that matches Table-A.
Insert data into Table-B (2 columns + identity column)
Use CREATE TABLE AS to simultaneously create a new Table-C (specifying DISTKEY and SORTKEY) while querying Table-A and Table-B via a JOIN on the identity column
Verify contents of Table-C
VACCUM Table-C (shouldn't be necessary, but just in case, and it should be quick)
Delete Table-A and Table-B
Rename Table-C to desired table name (which was probably the same as Table-A)
In Summary: Existing columns in Table-A + Extra columns in Table-B ➞ Table-C
Reasoning:
UPDATE statements do not run very well in Redshift. It requires marking existing data rows for each column as 'deleted', then appending new rows to the end of each column. Doing lots of UPDATES will blow-out the size of a table and it will become unsorted. It's also relatively slow. You would need to Deep Copy or VACUUM the table afterwards to fix things.
Using CREATE TABLE AS with a JOIN will generate all "final state" data in one query and the resulting table will be sorted and in a 'clean' state
The process gives you a chance to verify the content of Table-C before committing to the switchover. Very handy for debugging the process!
See also: Performing a Deep Copy - Amazon Redshift

Renaming two columns or swapping the values? Which one is better?

I have a table with more than 1.5 million records, in which I have two columns, A and B. Mistakenly the column values of A got inserted into the column B and column B's values got inserted to A.
Recently only we found the issue. What will be the best option to correct this issue? Rename the column names interchangingly (I don't know how it can be possible, since if we nename A to B, when B already exists), or swapping the values contained in the two columns?
Hi, You can have the below query to swap the columns,
UPDATE table_name SET A = B, B = A;
But you have huge amount of date in this case renaming will be good. But renaming of column name because of data issue is not a right solution. So you can have above update query to update your data.
Before updating take a backup of table which you are updating using the query,
CREATE TABLE table_name_bkp AS SELECT * FROM table_name;
Always have a backup while playing with original data which will not mess up
15 lakh rows aren't a big deal for SQL server. Switching column names have many cons in relational DB such as index, foreign Key and also you may have to do lots of impacts. So, I would like to suggest to go for traditional path. Simply do the update.

Oracle SQL merge tables without specifying columns

I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.

One way syncing of 4 columns in SQL Server

This is my problem:
I have an old database, with no constraints whatsoever. There are a handful of tables I copy from the old database to the new database. This copy is simple, and I'm running that daily at night in a Job.
In my new database (nicely with constraints) I made all these loose tables in constraint with a main table. all these tables have a key made of 3ID's and a string.
My main table would translate these 3ID's and a string to 1 ID, so this table would have 5 columns.
In the loose tables, some records can be double, so to insert the ID's in the main table I'd take a distinct of the 3 ID's and a string, and insert those into my main table.
The tables in the old database are updated daily. the copy is run daily, and from the main table I'd like to make a one-to-many relation with the copied tables.
This gives me the problem:
how do I update the main table, do nothing with the already-inserted keys and add the new keys? Don't remove old keys if removed in old database.
I was thinking to make a distinct view of all keys in the old database, but how would I update this to the main table? This would need to run before the daily copy of the other tables (or it would fail on the constraints)
One other idea is to run this update of the main table in Linq-to-SQL in my website but that doesn't seem very clean.
So in short:
Old DB is SQL Server 2000
New DB is SQL Server 2008
Old db has no constraints, copy of some tables happens daily.
There should be a Main table, translating the 3ID and 1string key to a 1ID key, with a constraint to the other tables.
The main table should be updated before the copy job, else the constraints will fail. The main table will be a distinct of a few columns of 1 table. This distinct will be in a view on the old db.
Only new rows can be added in the main db
Does anyone have some ideas, some guidance?
Visualize the DB:
these loose tables are details about a company. 1 table has address(es) 1 table has contact person, another table has it's username and login for our system (which could be more than 1 for 1 company)
a company is identified by the 3ID's and 1string. the main table would list these unique ID's and string so that they could be translated to 1 ID. this 1 ID is then used in the rest of my DB. the one to many relation would then be made from the main table to all those loose tables. I hope this clears it up a bit :)
I think you could use EXCEPT to insert the ids that aren't in your main table yet http://msdn.microsoft.com/en-us/library/ms188055.aspx
So for example:
insert into MainTable
select Id1,Id2,Id3,String1,NewId from DistinctOldTable
except
select Id1,Id2,Id3,String1,NewId from MainTable

SQL Server 2005: stored procedure to move rows from one table to another

I have 2 tables of identical schemea. I need to move rows older than 90 days (based on a dataTime column present in the table) from table A to table B. Here is the pseudo code for what I want to do
SET #Criteria = getdate()-90
Select * from table A
Where column X<#Criteria
Into table B
--now clean up the records we just moved to table B, in Table A
delete from table A Where column X<#Criteria
My questions are:
What is the most efficient way to do this (will select-in perform well under high volumes)? Table A will have ~180,000,000 rows in it, and will need to move ~4,000,000 rows at a time to table B.
How do I encapsulate this under one transaction so that I will not delete rows from Table A if there was an error inserting them to Table B. I just want to make sure that I don't accidentally delete a row from table A unless I have successfully written it to table B.
Are there any good SQL Server 2005 books that you recommend?
Thanks,
Chris
I think that SSIS is probably the best solution for your needs.
I think you can just use the SSIS tasks like Data Flow task to achieve your needs. There doesnt seem to be any need to create a procedure separately for the logic.
Transactions can be set for any Data Flow task using TransactionOption property. Check out this article as to how to use Transactions in SSIS
Some basic tutorials on SSIS packages and how to create them can be referred to here and here
regarding
How do I encapsulate this under one transaction so that I will not delete rows from Table A if there was an error inserting them to Table B.
you can delete all rows from A that are in B using a join. Then, if the copy to B failed, nothing will be deleted from A.