Database upgrade issues with SSDT - sql

I've been using SSDT for my database design for the past 9 months on a project and I'm about to abandon it for DbUp. Hopefully, there's a simpler solution...
Here's the problem. I have a database with the following table:
Persons
-----------
Id (PK)
Name
Email
I would like to upgrade my database to allow a person to have multiple email addresses:
Persons EmailAddresses
----------- ----------------
ID (PK) ID (PK)
Name PersonId (FK)
Email
To do this all within SSDT without dataloss I would need to do some fancy Pre & Post deployment scripting:
-- PreDeployment.sql
IF EXISTS (SELECT 1 FROM sys.Tables where [name] = 'Persons')
BEGIN
CREATE Table TMP_Persons (ID, Name);
CREATE Table TMP_EmailAddresses (ID, PersonId, Email);
SELECT INTO TMP_Persons FROM Persons
SELECT INTO TMP_EmailAddresses FROM Persons
DELETE FROM Persons
END
-- PostDeployment.sql
IF EXISTS (SELECT 1 FROM sys.Tables where [name] = 'TMP_Persons')
BEGIN
SELECT INTO Persons FROM TMP_Persons
SELECT INTO EmailAddresses FROM TMP_EmailAddresses
DROP TABLE TMP_Persons;
DROP TABLE TMP_EmailAddresses;
END
This (although tricky) is do-able and I've been doing this for majority of my changes. However, the problem comes where you have multiple versions of your database. For example, I have the following scenarios:
New Developers - No prior database
Dev Machine - Database is very current
Production - Database is a week or more old
In the event that Production is more out-of-date than the dev machine (possibly from not deploying for a while or from needing to rollback) the above script may fail. This means that the Dev would need to know and take into account prior versions of the database.
For example, say that the Persons table was previously named Users. I would have to account for this possibility in my Pre-Deployment script.
-- PreDeployment.sql
IF EXISTS (SELECT 1 FROM sys.Tables where [name] = 'Users')
BEGIN
CREATE Table TMP_Persons (ID, Name);
CREATE Table TMP_EmailAddresses (ID, PersonId, Email);
SELECT INTO TMP_Persons FROM Users
SELECT INTO TMP_EmailAddresses FROM Users
DELETE FROM Persons
END
IF EXISTS (SELECT 1 FROM sys.Tables where [name] = 'Persons')
BEGIN
CREATE Table TMP_Persons (ID, Name);
CREATE Table TMP_EmailAddresses (ID, PersonId, Email);
SELECT INTO TMP_Persons FROM Persons
SELECT INTO TMP_EmailAddresses FROM Persons
DELETE FROM Persons
END
As time goes on and more variations occur the PreDeployment script is going to get very chaotic and error-prone. This just seems unmanageable to me. Aside from switching to DbUp or something else, is there a better way to do this within SSDT?

Related

How to add a row and timestamp one SQL Server table based on a change in a single column of another SQL Server table

[UPDATE: 2/20/19]
I figured out a pretty trivial solution to solve this problem.
CREATE TRIGGER TriggerClaims_History on Claims
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON
INSERT INTO Claims_History
SELECT name, status, claim_date
FROM Claims
EXCEPT SELECT name, status, claim_date FROM Claims_History
END
GO
I am standing up a SQL Server database for a project I am working on. Important info: I have 3 tables - enrollment, cancel, and claims. There are files located on a server that populate these tables every day. These files are NOT deltas (i.e. each new file placed on server every day contains data from all previous files) and because of this, I am able to simply drop all tables, create tables, and then populate tables from files each day. My question is regarding my claims table - since tables will be dropped and created each night, I need a way to keep track of all the different status changes.
I'm struggling to figure out the best way to go about this.
I was thinking of creating a claims_history table that is NOT dropped each night. Essentially I'd want my claims_history table to be populated each time an initial new record is added to the claims table. Then I'd want to scan the claims table and add a row to the claims_history table if and only if there was a change in the status column (i.e. claims.status != claims_history.status).
Day 1:
select * from claims
id | name | status
1 | jane doe | received
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
Day 2:
select * from claims
id | name | status
1 | jane doe | processed
select * from claims_history
id | name | status | timestamp
1 | jane doe | received | datetime
1 | jane doe | processed | datetime
Is there a SQL script that can do this? I'd also like to automatically have the timestamp field populate in claims_history table each time a new row is added (status change). I know I could write a python script to handle something like this, but i'd like to keep it in SQL if at all possible. Thank you.
Acording to your questions you need to create a trigger after update of the column claims.status and it very simple to do that use this link to know and see how to do a simple trigger click here create asimple sql server trigger
then as if there is many problem to manipulate dateTime in a query a would suggest you to use UNIX time instead of using datetime you can use Long or bigInt UNix time store the date as a number to know the currente time simple use the query SELECT UNIX_TIMESTAMP()
A very common approach is to use a staging table and a production (or final) table. All your ETLs will truncate and load the staging table (volatile) and then you execute an Stored Procedure that adds only the new records to your final table. This requires that all the data you handle this way have some form of key that identifies unequivocally a row.
What happens if your files suddenly change format or are badly formatted? You will drop your table and won't be able to load it back until you fix your ETL. This approach will save you from that, since the process will fail while loading the staging table and won't impact the final table. You can also keep deleted records for historic reasons instead of having them deleted.
I prefer to separate the staging tables into their proper schema, for example:
CREATE SCHEMA Staging
GO
CREATE TABLE Staging.Claims (
ID INT,
Name VARCHAR(100),
Status VARCHAR(100))
Now you do all your loads from your files into these staging tables, truncating them first:
TRUNCATE TABLE Staging.Claims
BULK INSERT Staging.Claims
FROM '\\SomeFile.csv'
WITH
--...
Once this table is loaded you execute a specific SP that adds your delta between the staging content and your final table. You can add whichever logic you want here, like doing only inserts for new records, or inserting already existing values that were updated on another table. For example:
CREATE TABLE dbo.Claims (
ClaimAutoID INT IDENTITY PRIMARY KEY,
ClaimID INT,
Name VARCHAR(100),
Status VARCHAR(100),
WasDeleted BIT DEFAULT 0,
ModifiedDate DATETIME,
CreatedDate DATETIME DEFAULT GETDATE())
GO
CREATE PROCEDURE Staging.UpdateClaims
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
-- Update changed values
UPDATE C SET
Name = S.Name,
Status = S.Status,
ModifiedDate = GETDATE()
FROM
Staging.Claims AS S
INNER JOIN dbo.Claims AS C ON S.ID = C.ClaimID -- This has to be by the key columns
WHERE
ISNULL(C.Name, '') <> ISNULL(S.Name, '') AND
ISNULL(C.Status, '') <> ISNULL(S.Status, '')
-- Insert new records
INSERT INTO dbo.Claims (
ClaimID,
Name,
Status)
SELECT
ClaimID = S.ID,
Name = S.Name,
Status = S.Status
FROM
Staging.Claims AS S
WHERE
NOT EXISTS (SELECT 'not yet loaded' FROM dbo.Claims AS C WHERE S.ID = C.ClaimID) -- This has to be by the key columns
-- Mark deleted records as deleted
UPDATE C SET
WasDeleted = 1,
ModifiedDate = GETDATE()
FROM
dbo.Claims AS C
WHERE
NOT EXISTS (SELECT 'not anymore on files' FROM Staging.Claims AS S WHERE S.ClaimID = C.ClaimID) -- This has to be by the key columns
COMMIT
END TRY
BEGIN CATCH
DECLARE #v_ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR (#v_ErrorMessage, 16, 1)
END CATCH
END
This way you always work with dbo.Claims and the records are never lost (just updated or inserted).
If you need to check the last status of a particular claim you can create a view:
CREATE VIEW dbo.vClaimLastStatus
AS
WITH ClaimsOrdered AS
(
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
DateRanking = ROW_NUMBER() OVER (PARTITION BY C.ClaimID ORDER BY C.CreatedDate DESC)
FROM
dbo.Claims AS C
)
SELECT
C.ClaimAutoID,
C.ClaimID,
C.Name,
C.Status,
C.ModifiedDate,
C.CreatedDate,
FROM
ClaimsOrdered AS C
WHERE
DateRanking = 1

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

How to insert multiple instances based on an array

We have two roles: Admin and Customer. There are a number of default users with email addresses following the pattern:
An Admin - admin1#.com, admin2#.com etc.
A Customer - user1#.com, user2#.com etc.
Then, we run the script for each combination (and in case with admins, it's done twice, because they're customers too).
insert into AspNetUserRoles values(
(select Id from AspNetUsers where Email = 'AAA'),
(select Id from AspNetRoles where Name = 'BBB'))
Now, based on my question, you can take a guess how it's resolved right now. For each new email, we add a statement or two. If we'd add a new role, we'd have to add a number of statement, possibly as many as the number of registered emails.
I sense there's away to declare a matrix on form:
a#.com, role1, role2
b#.com, role1,
c#.com
d#.com, role1, role3, role4
I've tried for a while but couldn't figure out the syntax, though. The actual DBA says it's not (easily) doable and that the script we have right now is as it's supposed to be done.
I suspect he's full of Christmas candy having been processed but, not being a DBA myself, I can't really argue, unless I have something that works. I also suspect that I didn't google the right way (i.e. I used wrong terms to describe what I want, due to my ignorance).
Edit
Realizing that the question might be misleading, I'll give an example in speudo-code to illustrate my intention.
List<Link> links = new List<Link> {
new {a1,b1}, new {a1,b2},
new {a2,b2},
new {a3,b1}, new {a3,b3}, new {a3,b4} }
foreach(Link in links)
ExecuteSql(
"insert into Links values(
(select Id from FirstTable where Name = link.A),
(select Id from SecondTable where Name = link.B))"
);
The part I can't figure out is how to declare such a list and how to loop through it.
1) Say we start by creating a temp table.
-- Create temp table for user and roles
CREATE TABLE #temp(
AspNetUser varchar(1000) ,
AspNetRoles varchar(1000));
2a) populate it from a File (eg userroles.csv)
a#.com,role1|b#.com,role1|c#.com,|d#.com,role1 role3 role4
Like this
-- Read from csv
BULK INSERT #temp FROM 'D:\userroles.csv'
WITH (
FIELDTERMINATOR =','
,ROWTERMINATOR ='|');
2b) OR do your own inserts in the script
INSERT INTO #temp
(AspNetUser, AspNetRoles)
VALUES
('a#.com','role1'),
('b#.com','role1'),
('c#.com',null),
('d#.com','role1 role3 role4')
3) Insert all combinations into the table by looking up the id's
-- Insert all found combinations
INSERT INTO AspNetUserRoles
SELECT users.Id, roles.Id
FROM
(
SELECT AspNetUser,
CAST ('<Role>' + REPLACE(AspNetRoles, ' ', '</Role><Role>') + '</Role>' AS XML) AS Data
FROM #temp
) AS A
CROSS APPLY Data.nodes ('/Role') AS Split(a)
INNER JOIN AspNetUsers users ON users.Email = AspNetUser
INNER JOIN AspNetRoles roles ON roles.Name = Split.a.value('.', 'VARCHAR(100)')
-- Clean up
drop table #textfile;
You can change delimiters SPACE, , and | to what you like.
You may want to do errorchecking for typos!

How can I insert rows in a new table from another table based on some condition in Sybase?

I need to create a stored procedure to insert data into one table from another based on some conditions. In the existing table, if two columns (primary role and secondary role) have the same value, then I want just one row in the new table with role as primary role.
In case if a row in the old table has different values in primary role and secondary role, I want two rows in the new table, one having the value of role as primary role of old table, and another as secondary.
What is the best way to achieve this?
Right now my query looks something like this
create procedure proc as
begin
insert into newTable values(role)
select primary_role as role from oldTable
where primary_role = secondary_role
end
This does not handle the case where primary role is not the same as secondary role.
Sample
sample row oldTable
PrimaryRole | SecondaryRole | Name
admin | analyst | Sara
sample row newTable
Role | Name
admin | Sara
analyst | Sara
I'm not a sybase expert, but I would do something like this (syntax may need amending). Also, if you are after performance, this can probably be done much more cleverly.
insert into newTable values(role, name) (
select primary_role as role, name from oldTable
where primary_role+name not in (select distinct role+name from newTable)
)
insert into newTable values(role, name) (
select secondary_role as role, name from oldTable
where secondary_role+name not in (select distinct role+name from newTable)
)
This will obviously run 2 different inserts, which is why it could probably be made more performant, but it will essentially try to add all primary roles and all secondary roles, checking to see if the role already exists from the previous run. So no need to check for the case where primary = secondary.
EDIT
Alternatively, you may be able to use UNION ALL:
insert into newTable values(role, name) (
select distinct role, name from (
select primary_role as role, name from oldTable
union all
select secondary_role as role, name from oldTable
)
)

Performing multiple inserts for a single row in a query

I have a table containing data that i need to migrate into another table with a linking table. This is a one time migration as part of an upgrade.
I have a company table that contains records relating to a company and a contact person.
I want to migrate the contact details into another table and link the new person with a linking table
Consider I have this table which is already populated
tblCompany
CompanyId
CompanyName
RegNo
ContactForename
ContactSurname
And i want to migrate the contact person data to
tblPerson
PersonID (identitycolumn)
Forename
Surname
and use the identity column resulting and insert it into the linking table
tblCompanyPerson
CompanyId
PersonId
I've tried a few different ways to approach this using cursors and output variables into a temp table but none seem right to me (or give me the solution...)
The closest i have got is to have a companyID on tblPerson and insert companyId into it and output the new personId and the companyId into a temp table. Then loop through the temp table to create the tblCompanyContact.
example
declare #companycontact TABLE (companyId int, PersonId int)
insert into tblPerson
(Forename,
Surname,
CompanyID)
output inserted.CompanyID, INSERTED.PersonID into #companycontact
select
ContactPersonForeName,
ContactPersonSurename,
CompanyID
from tblCompany c
insert into tblCompanyPerson
(CompanyID,
PersonID)
select c.companyId, PersonId from #companycontact c
Background
Im using MS SQL Server 2008 R2
The tblPerson is already populated with hundreds of thousands of
records.
There is a 'trick' using MERGE statement to achieve mapping between newly inserted and source values:
MERGE tblPerson trgt
USING tblCompany src ON 1=0
WHEN NOT MATCHED
THEN INSERT
(Forename, Surename)
VALUES (src.ContactPersonForeName, src.ContactPersonSurename)
OUTPUT src.CompanyID, INSERTED.PersonID
INTO tblCompanyPerson (CompanyId, PersonID);
That 1=0 condition is to always get everything from source. You might want to replace it or even whole source with some sub-query to actually check whatever you already have same person mapped.
EDIT: Here is some reading about using MERGE and OUTPUT
Because I don't know what SQL you are using its difficult to decide if this is correct. i also don't know if you already tried this but it's the best idea i have:
insert into tblPerson
(Forename, Surename)
Select ContactForename, ContactPersonSurename
from tblCompany
insert into tblCompanyPerson
(CompanyID, PersonID)
select CompanyId, PersonID
from tblPerson, tblCompany
where ContactForename = Forename and ContactPersonSurename = Surename
Sarajog