How to repeat multi-step schema change (ETL schema changes?)

How to repeat multi-step schema change (ETL schema changes?) - sql

I'm new to DBA and not much of a SQL person, so be gentle please.
I'd like to restructure a database that requires adding new columns, tables, and relationships followed by removing old tables, columns, and relationships. A three step process seems to be in order.
Change schema to add new stuff
Run SSIS to hook up new data using some of the old data.
Change schema to drop old stuff.
I'm using a SQL database Project in VS 2015 to maintain the schema, and using schema compare to update the DB schema. I'd like to make it repeatable or automatic, if possible, so I can test it out on a non-production database to get the flow right: change schema->run ETL->change schema. Is there a way to apply schema changes from within ETL or does this require manual operations? Is there a way to store two schemas into files and then apply them, other than VS publish or compare?

There is a SQL TASK that allows you to do what you want to do. You want to alter table (to add columns), move the data from old columns to new columns, then drop the old columns.
1) Alter table tableA add column ..
2) update table tableA set ..
3) alter table tableA drop column...
Please test your code carefully before running it.

It worked! Here is the example of the ETL. Note that it's important to set DelayValidation to true for the data flows and to disable ValidateExternalMetadata for some of the operations within the data flows because the database is not static.

Related

Preserve table data during test data 'refresh'

We have a Postgres database in a test system which we want to 'refresh' with the data from the production system. However, there are some tables with test configuration data that I want to preserve in the test database. Note that the tables I want to preserve are referred to with foreign key constraints in other tables that are not preserved.
To refresh the test database, we usually rename it to '..._old' and then re-create the database from a dump of the production data.
Now there's a few ways to try to preserve the test configuration data, but I'm wondering if anyone has any brilliant ideas that are better/faster. Hoping that we can script this somehow to make it easy each time we do this.
A straight pg_dump/pg_restore won't work, because it will only INSERT not UPDATE the matching records. Or am I missing something?
I had thought about doing it by:
Renaming the tables involved with '..._test'
Using pg_dump to dump just those renamed tables to a file
Re-create the 'refreshed' database
Restore the renamed tables from file into the new database
Perform an UPDATE table_a SET (......) = (SELECT * FROM table_a_test) to overwrite refreshed data with preserved test data
Note that the number of records in the production data and the test data may not be the same.
The content of these tables is not huge, so I had also thought about generating UPDATE scripts for all the records within the preserved data.
Can anyone think of a better way to do this?

Copy Table Constraints/Keys along with Data and Structure

I have table "TableA", I want to make a copy of it "TableA_Copy", when I use the below script, it creates Table and data, but Constraints are not copied, is it possible copy the constraints along with Structure and Data
SELECT * INTO TableA_Copy FROM TableA
Note am using SQL Server 2016

Right click on the database and go to tasks->Generate script.
Select the table you want TableA, go to next step and ynder advanced options select data and schema.
Save the script or have it in a new query window.
Once the script is generated, replace TableA with TableA_copy
This way you will get data, schema and all the constraints. Remember to change the name of constraints to avoid any errors.

If you mean programmatically, then yes there are a number of ways in tsql to accomplish this by using the INFORMATION_SCHEMA views. The particular ones you will need for the table/columns are INFORMATION_SCHEMA.TABLES, and INFORMATION_SCHEMA.COLUMNS. For the constraints you can use INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE which will provide you the PK and FKs and INFORMATION_SCHEMA.CHECK_CONSTRAINTS for all check constraints.
Using these you can rebuild the table, complete, for use as you indicate above.

How to update table definitions for many tables?

We want to update our out-of-sync tables in our database to match a different sql server database instance. We want to preserve the data in the database tables but will need to update contraints and column definitions. What is the easiest technique for accomplishing this?

Brute force, but fairly easy to script would be to:
On the current data base (schemas you want), right-click on the DB and select Tasks > Generate Scripts...
Change the relevant parameters for what you need and save the script file (make sure you select the options to script all the indexes, triggers, etc.).
Create a fresh staging DB and run the script there.
Export all the data from the out-of-sync DB to the staging DB.
Drop all the tables on the out-of-sync DB.
Run the script on the out-of-sync DB.
Import all the data into the out-of-sync DB from the staging DB.
Delete the staging DB.
Obviously, you'll need to verify your data at the various steps before you go dropping tables or databases.

Order SQL Azure Table Columns via SSMS

I know you can go into the design view of a table in SQL Server Management Studios and reorder columns as they appear in the design view, however this isn't possible with SQL Azure as the option is disabled. Is there a way to modify SQL Azure tables so that you can reorder their columns as they appear in the design view?
I have been running a number of database upgrades over the last few months to support new requirements and would like to reorder the way the columns appear in design view so they're easier to read, i.e. so they start with a primary key, followed by foreign keys, then normal columns and end with the added by, modified by fields. Its purely to make the tables more readable as I manage them over time.

Just run a script against the table. Its a bit of pseudocode but you should get the idea.
CREATE TABLE TableWithDesiredOrder(PK,FK1,FK2,COL1,COL2)
INSERT INTO TableWithDesiredOrder(PK,FK1,FK2,COL1,COL2....)
SELECT PK,FK1,FK2,COL1,COL2.... FROM OriginalTable
DROP TABLE OriginalTable
Finally Rename the table
sp_Rename TableWithDesiredOrder, OriginalTable

Just another option: I use SQL Delta to propagate my db changes from dev db up to Azure db. So in this case, I just change the col order locally using SSMS GUI, and SQL Delta will do the createnew>copytonew>dropold for me, along with my other local changes. (In Project Options, I set Preserve Column Order=Yes.)

I experienced the same with Azure SQL Database, basically my view changes with ALTER were not taken when did a SELECT * from the view, or the column headers were mixed with the column values.
In order to fix it I dropped the view and re-created it again. That worked.

Keep table downtime to a minimum by renaming old table, then filling a new version?

I have a handful or so of permanent tables that need to be re-built on a nightly basis.
In order to keep these tables "live" for as long as possible, and also to offer the possibility of having a backup of just the previous day's data, another developer vaguely suggested
taking a route similar to this when the nightly build happens:
create a permanent table (a build version; e.g., tbl_build_Client)
re-name the live table (tbl_Client gets re-named to tbl_Client_old)
rename the build version to become the live version (tbl_build_Client gets re-named to tbl_Client)
To rename the tables, sp_rename would be in use.
http://msdn.microsoft.com/en-us/library/ms188351.aspx
Do you see any more efficient ways to go about this,
or any serious pitfalls in the approach? Thanks in advance.
Update
Trying to flush out gbn's answer and recommendation to use synonyms,
would this be a rational approach, or am I getting some part horribly wrong?
Three real tables for "Client":
1. dbo.build_Client
2. dbo.hold_Client
3. dbo.prev_Client
Because "Client" is how other procs reference the "Client" data, the default synonym is
CREATE SYNONYM Client
FOR dbo.hold_Client
Then take these steps to refresh data yet keep un-interrupted access.
(1.a.) TRUNCATE dbo.prev_Client (it had yesterday's data)
(1.b.) INSERT INTO dbo.prev_Client the records from dbo.build_Client, as dbo.build_Client still had yesterday's data
(2.a.) TRUNCATE dbo.build_Client
(2.b.) INSERT INTO dbo.build_Client the new data build from the new data build process
(2.c.) change the synonym
DROP SYNONYM Client
CREATE SYNONYM Client
FOR dbo.build_Client
(3.a.) TRUNCATE dbo.hold_Client
(3.b.) INSERT INTO dbo.hold_Client the records from dbo.build_Client
(3.c.) change the synonym
DROP SYNONYM Client
CREATE SYNONYM Client
FOR dbo.hold_Client

Use indirection to avoid manuipulating tables directly:
Have 3 tables: Client1, Client2, Client3 with all indexes, constraints and triggers etc
Use synonyms to hide the real table eg Client, ClientOld, ClientToLoad
To generate the new table, you truncate/write to "ClientToLoad"
Then you DROP and CREATE the synonyms in a transaction so that
Client -> what was ClientToLoad
ClientOld -> what was Client
ClientToLoad -> what was ClientOld
You can use SELECT base_object_name FROM sys.synonyms WHERE name = 'Client' to work out what the current indirection is
This works on all editions of SQL Server: the other way is "partition switching" which requires enterprise edition

Some things to keep in mind:
Replication - if you use replication, I don't believe you'll be able to easily implement this strategy
Indexes - make sure that any indexes you have on the tables are carried over to your new/old tables as needed
Logging - i don't remember whether or not sp_rename is fully logged, so you may want to test that in case you need to be able to rollback, etc.
Those are the possible drawbacks I can think of off the top of my head. It otherwise seems to be an effective way to handle the situation.

Except of missing step 0. Drop tbl_Client_old if exists solutions seems fine especially if you run it in explicit transaction. There is no backup of any previous data however.
The other solution, without renames and drops, and which I personally would prefer is to:
Copy all rows from tbl_Client to tbl_Client_old;
Truncate tbl_Client.
(Optional) Remove obsolete records from tbl_Client_old.
It's better in a way that you can control how much of the old data you can store in tbl_Client_old. Which solution will be faster depends on how much data is stored in tables and what indices in tables are.

if you use SQL Server 2008, why can't you try to use horisontal partitioning? All data contains in one table, but new and old data contains in separate partitions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to repeat multi-step schema change (ETL schema changes?) - sql

It worked! Here is the example of the ETL. Note that it's important to set DelayValidation to true for the data flows and to disable ValidateExternalMetadata for some of the operations within the data flows because the database is not static.

Related

Preserve table data during test data 'refresh'

Copy Table Constraints/Keys along with Data and Structure

How to update table definitions for many tables?

Order SQL Azure Table Columns via SSMS

Keep table downtime to a minimum by renaming old table, then filling a new version?

Categories

Resources