Inserting SCOPE_IDENTITY() into a junction table - sql

Consider the following little script:
create table #test
(testId int identity
,testColumn varchar(50)
)
go
create table #testJunction
(testId int
,otherId int
)
insert into #test
select 'test data'
insert into #testJunction(testId,otherId)
select SCOPE_IDENTITY(),(select top 10 OtherId from OtherTable)
--The second query here signifies some business logic to resolve a many-to-many
--fails
This, however, will work:
insert into #test
select 'test data'
insert into #testJunction(otherId,testId)
select top 10 OtherId ,(select SCOPE_IDENTITY())
from OtherTable
--insert order of columns is switched in #testJunction
--SCOPE_IDENTITY() repeated for each OtherId
The second solution works and all is well. I know it doesn't matter, but for continuity's sake I like having the insert done in the order in which the columns are present in the database table. How can I acheieve that? The following attempt gives a subquery returned more than 1 value error
insert into #test
select 'test data'
insert into #testJunction(otherId,testId)
values ((select SCOPE_IDENTITY()),(select top 10 drugId from Drugs))
EDIT:
On a webpage a new row is entered into a table with a structure like
QuizId,StudentId,DateTaken
(QuizId is an identity column)
I have another table with Quiz Questions like
QuestionId,Question,CorrectAnswer
Any number of quizzes can have any number of questions, so in this example testJunction
resolves that many to many. Ergo, I need the SCOPE_IDENTITY repeated for however many questions are on the quiz.

The version that fails
insert into #testJunction(testId,otherId)
select SCOPE_IDENTITY(),(select top 10 OtherId from OtherTable)
will insert one row with scope_identity() in the first column and a set of 10 values in the second column. A column can not have sets so that one fails.
The one that works
insert into #testJunction(otherId,testId)
select top 10 OtherId ,(select SCOPE_IDENTITY())
from OtherTable
will insert 10 rows from OtherTable with OtherId in the first column and the scalar value of scope_identity() in the second column.
If you need to switch places of the columns it would look like this instead.
insert into #testJunction(testId,otherId)
select top 10 SCOPE_IDENTITY(), OtherId
from OtherTable

You need the output clause. Look it up in BOL.

Try this way:
Declare #Var int
insert into #test
select 'test data'
select #var=scope_identity()
insert into #testJunction(otherId,testId)
select top 10 #var,drugId from Drugs

Related

Foreach insert statement based on where clause

I have a scenario where I have thousands of Ids (1-1000), I need to insert each of these Ids once into a table with another record.
For example, UserCars - has columns CarId and UserId
I want to INSERT each user in my Id WHERE clause against CarId 1.
INSERT INTO [dbo].[UserCars]
([CarId]
,[UserId])
VALUES
(
1,
**My list of Ids**
)
I'm just not sure of the syntax for running this kind of insert or if it is at all possible.
As you write in the comments that my list of Ids is coming from another table, you can simply use select into with a select clause
See this for more information
insert into UserCars (CarID, UserID)
select CarID, UserID
from othertable
In the select part you can use joins and whatever you need, complex queries are allowed as long as the columns in the result match the columns (CarID, UserID)
or even this to keep up with your example
insert into UserCars (CarID, UserID)
select 1, UserID
from dbo.User
if your data exists on a file, you can use BULK INSERT command, for example:
BULK INSERT UserCars
FROM '\\path\to\your\folder\users-cars.csv';
Just make sure to have the same columns structure both in the file and in the table (e.g. CarId,UserId).
Otherwise, follow #GuidoG comment to insert your data from another table:
insert into UserCars (CarID, UserID) select CarID, UserID from othertable

Loop through each column name from one table and insert that name into another table?

I have two tables. One table has a list of 500 columns. Another table references each column name like this
Select Top 1 * from MyReferenceTable
Which returns the results
(69, 'FirtName', 1, NULL)
(69, 'LastName', 2, NULL)
Where 'FirstName' is the name of the column from an actual table.
So I want to fill this reference table with the column names from the other table as so
Insert Into MyReferenceTable
FileId, ColumnName1, ColumnOrder, DefaultValue
Values(69, Select ColumnName From OtherTable? ,
Select Next Sequential Identity?, NULL)
My issue is how can I loop through the other table get the column name for each row, also insert an identity in sequentialOrder as ColumnOrder?
Typing out that insert statement manually for over 500 columns would take many moons.
This is a terrible idea, but the answer to your question is straight-forward:
INSERT INTO MyReferenceTabel (FileId, ColumnName1, ColumnOrder, DefaultValue)
SELECT 69, [name], [column_id], NULL
FROM sys.columns
WHERE [object_id] = OBJECT_ID('MyOtherTable')
Basically you craft a SELECT statement that returns the values you want, and then just add the INSERT statement over it.
But again, this smells of a terrible design choice that will bite you in the end. But it's still good to know how to get this information, so I'm posting this example here.

Using ignore_duplicate on non primary key

I've got table:
ID (identity, PK), TaskNr, OfferNr
I want to do insert ignore statement but sadly it's not working on MSSQL, so there's a IGNORE_DUP switch. But I need to check duplicates using TaskNr column. Is there any chance to do that?
Edit:
Sample data:
ID (identity, PK), TaskNr, OfferNr
1 BP1234 XAS
2 BD123 JFRT
3 1122AH JDA33
4 22345_a MD_3
Trying to do:
insert ignore into Sample_table (TaskNr, OfferNr) values (BP1234, DFD,)
Should ignore that row and go to next value of insert statement. ID is autoincremented but unique value should be checked using TaskNr column.
SQL Server does not support insert ignore. That is MySQL functionality.
You can do what you want as:
insert ignore into Sample_table (TaskNr, OfferNr)
select x.TaskNr, x.OfferNr
from (select 'BP1234' as TaskNr, 'DFD' as OfferNr) x
where not exists (select 1
from Sample_Table st
where st.TaskNr = x.TaskNr and st.OfferNr = x.OfferNr
);
You can try two options:
insert into ... where not exists ()
t-sql merge statement (https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql)

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

Add rows to a table then loop back and add more rows for a different userid

I have a table with 4 columns - ID, ClubID, FitnessTestNameID and DisplayName
I have another table called Club and it has ID and Name
I want to add two rows of data to the 1st table for each club
I can write a statement like this, but can someone tell me how to create a loop so that I can insert the two rows, set the #clubid + 1 and then loop back again?
declare #clubid int
set #clubid = 1
insert FitnessTestsByClub (ClubID,FitnessTestNameID,DisplayName)
values (#clubid,'1','Height (cm)')
insert FitnessTestsByClub (ClubID,FitnessTestNameID,DisplayName)
values (#clubid,'2','Weight (kg)')
You can probably do this with one statement only. No need for loops:
INSERT INTO FitnessTestsByClub
(ClubID, FitnessTestNameID, DisplayName)
SELECT
c.ID, v.FitnessTestNameID, v.DisplayName
FROM
Club AS c
CROSS JOIN
( VALUES
(1, 'Height (cm)'),
(2, 'Weight (kg)')
) AS v (FitnessTestNameID, DisplayName)
WHERE
NOT EXISTS -- a condition so no duplicates
( SELECT * -- are inserted
FROM FitnessTestsByClub AS f -- and the statement can be run again
WHERE f.ClubID = c.ID -- in the future, when more clubs
) -- have been added.
;
The Table Value Constructor syntax above (the (VALUES ...) construction) is valid from version 2008 and later.
There is a nice article with lots of useful examples of how to use them, by Robert Sheldon: Table Value Constructors in SQL Server 2008