Inserting multiple records in database table using PK from another table - sql

I have DB2 table "organization" which holds organizations data including the following columns
organization_id (PK), name, description
Some organizations are deleted so lot of "organization_id" (i.e. rows) doesn't exist anymore so it is not continuous like 1,2,3,4,5... but more like 1, 2, 5, 7, 11,12,21....
Then there is another table "title" with some other data, and there is organization_id from organization table in it as FK.
Now there is some data which I have to insert for all organizations, some title it is going to be shown for all of them in web app.
In total there is approximately 3000 records to be added.
If I would do it one by one it would look like this:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
VALUES
(
'This is new title',
XXXX,
CURRENT TIMESTAMP,
1,
1,
1
);
where XXXX represent "organization_id" which I should get from table "organization" so that insert do it only for existing organization_id.
So only "organization_id" is changing matching to "organization_id" from table "organization".
What would be best way to do it?
I checked several similar qustions but none of them seems to be equal to this?
SQL Server 2008 Insert with WHILE LOOP
While loop answer interates over continuous IDs, other answer also assumes that ID is autoincremented.
Same here:
How to use a SQL for loop to insert rows into database?
Not sure about this one (as question itself is not quite clear)
Inserting a multiple records in a table with while loop
Any advice on this? How should I do it?

If you seriously want a row for every organization record in Title with the exact same data something like this should work:
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
;
you shouldn't need the column aliases in the select but I am including for readability and good measure.
https://www.ibm.com/support/knowledgecenter/ssw_i5_54/sqlp/rbafymultrow.htm
and for good measure in case you process errors out or whatever... you can also do something like this to only insert a record in title if that organization_id and title does not exist.
INSERT INTO title
(
name,
organization_id,
datetime_added,
added_by,
special_fl,
title_type_id
)
SELECT
'This is new title' as name,
o.organization_id,
CURRENT TIMESTAMP as datetime_added,
1 as added_by,
1 as special_fl,
1 as title_type_id
FROM
organizations o
LEFT JOIN Title t
ON o.organization_id = t.organization_id
AND t.name = 'This is new title'
WHERE
t.organization_id IS NULL
;

Related

Best approach to populate new tables in a database

I have a problem I have been working on the past several hours. It is complex (for me) and I don't expect someone to do it for me. I just need the right direction.
Problem: We had the tables (below) added to our database and I need to update them based off of data already in our DailyCosts table. The tricky part is that I need to take DailyCosts.Notes and move it to PurchaseOrder.PoNumber. Notes is where we currenlty have the PONumbers.
I started with the Insert below, testing it out on one WellID. This is Inserting records from our DailyCosts table to the new PurchaseOrder table:
Insert Into PurchaseOrder (PoNumber,WellId,JObID,ID)
Select Distinct Cast(Notes As nvarchar(20)), WellID, JOBID,
DailyCosts.DailyCostID
From DailyCosts
Where WellID = '24A-23'
It affected 1973 rows (The Notes are in Ntext)
However, I need to update the other new tables because we need to see the actual PONumbers in the application.
This next Insert is Inserting records from our DailyCost table and new PurchaseOrder table (from above) to a new table called PurchaseOrderDailyCost
Insert Into PurchaseOrderDailyCost (WellID, JobID, ReportNo, AccountCode, PurchaseOrderID,ID,DailyCostSeqNo, DailyCostID)
Select Distinct DailyCosts.WellID,DailyCosts.JobID,DailyCosts.ReportNo,DailyCosts.AccountCode,
PurchaseOrder.ID,NEWID(),0,DailyCosts.DailyCostID
From DailyCosts join
PurchaseOrder ON DailyCosts.WellID = PurchaseOrder.WellID
Where DailyCosts.WellID = '24A-23'
Unfortunately, this produces 3,892,729 records. The Notes field contains the same list of PONumbers each day. This is by design so that the people inputting the data out in the field can easily track their PO numbers. The new PONumber column that we are moving the Notes to would store just unique POnumbers. I modified the query by replacing NEWID() with DailyCostID and the Join to ON DailyCosts.DailyCostID = PurchaseOrder.ID
This affected 1973 rows the same as the first Insert.
The next Insert looks like this:
Insert Into PurchaseOrderAccount (WellID, JobID, PurchaseOrderID, ID, AccountCode)
Select PurchaseOrder.WellID, PurchaseOrder.JobID, PurchaseOrder.ID, PurchaseOrderDailyCost.DailyCostID,PurchaseOrderDailyCost.AccountCode
From PurchaseOrder Inner Join
PurchaseOrderDailyCost ON PurchaseOrder.ID = PurchaseOrderDailyCost.DailyCostID
Where PurchaseOrder.WellID = '24A-23'
The page in the application now shows the PONumbers in the correct column. Everything looks like I want it to.
Unfortunately, it slows down the application to an unacceptable level. I need to figure out how to either modify my Insert or delete duplicate records. The problem is that there are multiple foreign key constraints. I have some more information below for reference.
This shows the application after the inserts. These are all duplicate records that I am hoping to elminate
Here is some additional information I received from the vendor about the tables:
-- add a new purchase order
INSERT INTO PurchaseOrder
(WellID, JobID, ID, PONumber, Amount, Description)
VALUES ('MyWell', 'MyJob', NEWID(), 'PO444444', 500.0, 'A new Purchase Order')
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to a DailyCost record with SeqNo 0 and AccountCode 'MyAccount'
INSERT INTO PurchaseOrderDailyCost
(WellID, JobID, ReportNo, AccountCode, DailyCostSeqNo, PurchaseOrderID, ID)
VALUES ('MyWell', 'MyJob', 4, 'MyAccount', 0, 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID())
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to an account code 'MyAccount'
-- (i.e. make it choosable from the DailyCost PO-column dropdown for any DailyCost record whose account code is 'MyAccount')
INSERT INTO PurchaseOrderAccount
(WellID, JobID, PurchaseOrderID, ID, AccountCode)
VALUES ('MyWell', 'MyJob', 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID(), 'MyAccount')
-- link a purchase order with id 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3' to an AFE No. 'MyAFENo'
-- (same behavior as with the account codes above)
INSERT INTO PurchaseOrderAFE
(WellID, JobID, PurchaseOrderID, ID, AFENo)
VALUES ('MyWell', 'MyJob', 'A356FBF4-A19B-4466-9E5C-20C5FD0E95C3', NEWID(), 'MyAFENo')
So it turns out I missed some simple joining principles. The better I get the more silly mistakes I seem to make. Basically, on my very first insert, I did not include a Group By. Adding this took my INSERT from 1973 to 93. Then on my next insert, I joined DailyCosts.Notes on PurchaseOrder.PONumber since these are the only records from DailyCosts I needed. This was previously INSERT 2 on my question. From there basically, everything came together. Two steps forward an one step back. Thanks to everyone that responded to this.

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

I want to make SQL tables that are updated daily yet retain every single day's contents for later lookup. What is the best practice for this?

Basically I'm trying to create a database schema based around multiple unrelated tables that will not need to reference each other AFAIK.
Each table will be a different "category" that will have the same columns in each table - name, date, two int values and then a small string value.
My issue is that each one will need to be "updated" daily, but I want to keep a record of the items for every single day.
What's the best way to go about doing this? Would it be to make the composite key the combination of the date and the name? Or use something called a "trigger"?
Sorry I'm somewhat new to database design, I can be more specific if I need to be.
Yes, you have to create a trigger for each category table
I'm assuming name is PK for each table? If isnt the case, you will need create a PK.
Lets say you have
table categoryA
name, date, int1, int2, string
table categoryB
name, date, int1, int2, string
You will create another table to store changes log.
table category_history
category_table, name, date, int1, int2, string, changeDate
You create two trigger, one for each category table
Where you save what table gerate the update and what time was made.
create trigger before update for categoryA
INSERT INTO category_history VALUES
('categoryA', OLD.name, OLD.date, OLD.int1, Old.int2, OLD.string, NOW());
This is pseudo code, you need write trigger using your rdbms syntaxis, and check how get system date now().
As has already been pointed out, it is poor design to have different identical tables for each category. Better would be a Categories table with one entry for each category and then a Dailies table with the daily information.
create table Categories(
ID smallint not null auto_generated,
Name varchar( 20 ) not null,
..., -- other information about each category
constraint UQ_Category_Name unique( Name ),
constraint PK_Categories( ID )
);
create table Dailies(
CatID smallint not null,
UpdDate date not null,
..., -- Daily values
constraint PK_Dailies( CatID, UpdDate ),
constraint FK_Dailies_Category foreign key( CatID )
references Categories( ID )
);
This way, adding a new category involves inserting a row into the Categories table rather than creating an entirely new table.
If the database has a Date type distinct from a DateTime -- no time data -- then fine. Otherwise, the time part must be removed such as by Oracle's trunc function. This allows only one entry for each category per day.
Retrieving all the values for all the posted dates is easy:
select C.Name as Category, d.UpdDate, d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID;
This can be made into a view, DailyHistory. To see the complete history for Category Cat1:
select *
from DailyHistory
where Name = 'Cat1';
To see all the category information as it was updated on a specific date:
select *
from DailyHistory
where UpdDate = date '2014-05-06';
Most queries will probably be interested in the current values -- that is, the last update made (assuming some categories are not updated every day). This is a little more complicated but still very fast if you are worried about performance.
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID );
Of course, if every category is updated every day, the query is simplified:
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate = <today's date>;
This can also be made into a view. To see today's (or the latest) updates for Category Cat1:
select *
from DailyCurrent
where Name = 'Cat1';
Suppose now that updates are not necessarily made every day. The history view would show all the updates that were actually made. So the query shown for all categories as they were on a particular day would actually show only those categories that were actually updated on that day. What if you wanted to show the data that was "current" as of a particular date, even if the actual update was several days before?
That can be provided with a small change to the "current" query (just the last line added):
select C.Name as Category, d.UpdDate as "Date", d.<daily values>
from Categories C
join Dailies D
on D.CatID = C.ID
and D.UpdDate =(
select Max( UpdDate )
from Dailies
where CatID = D.CatID
and UpdDate <= date '2014-05-06' );
Now this shows all categories with the data updated on that date if it exists otherwise the latest update made previous to that date.
As you can see, this is a very flexible design which allows access the data just about any way desired.

Insert into table some values which are selected from other table

I have my database structure like this ::
Database structure ::
ATT_table- ActID(PK), assignedtoID(FK), assignedbyID(FK), Env_ID(FK), Product_ID(FK), project_ID(FK), Status
Product_table - Product_ID(PK), Product_name
Project_Table- Project_ID(PK), Project_Name
Environment_Table- Env_ID(PK), Env_Name
Employee_Table- Employee_ID(PK), Name
Employee_Product_projectMapping_Table -Emp_ID(FK), Project_ID(FK), Product_ID(FK)
Product_EnvMapping_Table - Product_ID(FK), Env_ID(FK)
I want to insert values in ATT_Table. Now in that table I have some columns like assignedtoID, assignedbyID, envID, ProductID, project_ID which are FK in this table but primary key in other tables they are simply numbers).
Now when I am inputting data from the user I am taking that in form of string like a user enters Name (Employee_Table), product_Name (Product_table) and not ID directly. So I want to first let the user enter the name (of Employee or product or Project or Env) and then value of its primary key (Emp_ID, product_ID, project_ID, Env_ID) are picked up and then they are inserted into ATT_table in place of assignedtoID, assignedbyID, envID, ProductID, project_ID.
Please note that assignedtoID, assignedbyID are referenced from Emp_ID in Employee_Table.
How to do this ? I have got something like this but its not working ::
INSERT INTO ATT_TABLE(Assigned_To_ID,Assigned_By_ID,Env_ID,Product_ID,Project_ID)
VALUES (A, B, Env_Table.Env_ID, Product_Table.Product_ID, Project_Table.Project_ID)
SELECT Employee_Table.Emp_ID AS A,Employee_Table.Emp_ID AS B, Env_Table.Env_ID, Project_Table.Project_ID, Product_Table.Product_ID
FROM Employee_Table, Env_Table, Product_Table, Project_Table
WHERE Employee_Table.F_Name= "Shantanu" or Employee_Table.F_Name= "Kapil" or Env_Table.Env_Name= "SAT11A" or Product_Table.Product_Name = "ABC" or Project_Table.Project_Name = "Project1";
The way this is handled is by using drop down select lists. The list consists of (at least) two columns: one holds the Id's teh database works with, the other(s) store the strings the user sees. Like
1, "CA", "Canada"
2, "USA", 'United States"
...
The user sees
CA | Canada
USA| United States
...
The value that gets stored in the database is 1, 2, ... whatever row the user selected.
You can never rely on the exact, correct input of users. Sooner or later they will make typo's.
I extend my answer, based on your remark.
The problem with the given solution (get the Id's from the parent tables by JOINing all those parent tables together by the entered text and combining those with a number of AND's) is that as soon as one given parameter has a typo, you will get not a single record back. Imagine the consequences when the real F_name of the employee is "Shant*anu*" and the user entered "Shant*aun*".
The best way to cope with this is to get those Id's one by one from the parent tables. Suppose some FK's have a NOT NULL constraint. You can check if the F_name is filled in and inform the user when he didn't fill that field. Suppose the user eneterd "Shant*aun*" as name, the program will not warn the user, as something is filled in. But that is not the check the database will do, because the NOT NULL constraints are defined on the Id's (FK). When you get the Id's one by one from the parent tables. You can verify if they are NOT NULL or not. When the text is filled in, like "Shant*aun*", but the returned Id is NULL, you can inform the user of a problem and let him correct his input: "No employee by the name 'Shantaun' could be found."
SELECT $Emp_ID_A = Emp_ID
FROM Employee_Table
WHERE F_Name= "Shantanu"
SELECT $Emp_ID_B = Emp_ID
FROM Employee_Table
WHERE B.F_Name= "Kapil"
SELECT $Env_ID = Env_ID
FROM Env_Table
WHERE Env_Table.Env_Name= "SAT11A"
SELECT $Product_ID = Product_ID
FROM Product_Table
WHERE Product_Table.Product_Name = "ABC"
SELECT $Project_ID = Project_ID
FROM Project_Table
WHERE Project_Name = "Project1"
Please use AND instead of OR.
INSERT INTO ATT_TABLE(Assigned_To_ID,Assigned_By_ID,Env_ID,Product_ID,Project_ID)
SELECT A.Emp_ID, B.Emp_ID, Env_Table.Env_ID, Project_Table.Project_ID, Product_Table.Product_ID
FROM Employee_Table A, Employee_Table B, Env_Table, Product_Table, Project_Table
WHERE A.F_Name= "Shantanu"
AND B.F_Name= "Kapil"
AND Env_Table.Env_Name= "SAT11A"
AND Product_Table.Product_Name = "ABC"
AND Project_Table.Project_Name = "Project1";
But it is best practice to use drop down list in your scenario, i guess.

Normalizing a table, from one to the other

I'm trying to normalize a mysql database....
I currently have a table that contains 11 columns for "categories". The first column is a user_id and the other 10 are category_id_1 - category_id_10. Some rows may only contain a category_id up to category_id_1 and the rest might be NULL.
I then have a table that has 2 columns, user_id and category_id...
What is the best way to transfer all of the data into separate rows in table 2 without adding a row for columns that are NULL in table 1?
thanks!
You can create a single query to do all the work, it just takes a bit of copy and pasting, and adjusting the column name:
INSERT INTO table2
SELECT * FROM (
SELECT user_id, category_id_1 AS category_id FROM table1
UNION ALL
SELECT user_id, category_id_2 FROM table1
UNION ALL
SELECT user_id, category_id_3 FROM table1
) AS T
WHERE category_id IS NOT NULL;
Since you only have to do this 10 times, and you can throw the code away when you are finished, I would think that this is the easiest way.
One table for users:
users(id, name, username, etc)
One for categories:
categories(id, category_name)
One to link the two, including any extra information you might want on that join.
categories_users(user_id, category_id)
-- or with extra information --
categories_users(user_id, category_id, date_created, notes)
To transfer the data across to the link table would be a case of writing a series of SQL INSERT statements. There's probably some awesome way to do it in one go, but since there's only 11 categories, just copy-and-paste IMO:
INSERT INTO categories_users
SELECT user_id, 1
FROM old_categories
WHERE category_1 IS NOT NULL