SQL query to fillter and update table - sql

i have an employee database table with a column NAME
in the NAME field we have names of employees like this -> LI-MING (ALLEN)
this is there real first name and there English nick name in ()
i would like to know if i can swap this around in an SQL UPDATE query
FROM: LI-MING (ALLEN) TO: ALLEN (LI-MING)
the reason why i would like this is Users want to have it sort this column by nick name

Try this
UPDATE Employee
SET NAME =
SUBSTRING(name,CHARINDEX('(',name)+1,(CHARINDEX(')',name)-CHARINDEX('(',name)-1))+
' ('+SUBSTRING(name,1,CHARINDEX('(',name)-1)+')'
FROM Employee

If I were you I would create seperate colums both for name and nick name. Trying to get a string portion on the fly prevent sql server from using indexes which might be really importand from performance perspective.
So there are basicly two options:
Parse values for seperate columns every time you update or insert a new employee (via TRIGGER, application code, etc).
Or just create two calculated columns but make sure they are marked as PERSISTED.
Hope it helps!

I had worked on several project and I have done it my way to update same issue that you been through in 3 steps:
1) Create table with ID or Name field and Insert the values to the table
2) Trim the values with different functions and insert the final value to different table.
3) Update the old table with the new value
I don't say this is the only way to do thing but there might be other ways as well.
Create table #Employee(
EmployeeName varchar(200)
)
Insert into #Employee
Select 'LI-MING (ALLEN)' union all
Select 'Jio-Kio (Smith)'
Select
substring(employeename,1,patindex('%(%',employeename)-1),
--Len(substring(employeename,1,patindex('%(%',employeename)-1)),
Right(employeename,len(employeename)-(len(substring(employeename,1,patindex('%(%',employeename)))))
from #Employee
Create table #EmployeeNew(
Employeename1 varchar(200),
Employeename2 varchar(200)
)
Insert into #EmployeeNew(Employeename1, Employeename2)
Select
ltrim(rtrim(substring(employeename,1,patindex('%(%',employeename)-1))),
ltrim(rtrim(Right(Employeename,charindex('(',employeename,1)-3)))
from #Employee
Select * from #Employee
Select * from #EmployeeNew
Select cast('('+Employeename1+')'+left(employeename2,len(employeename2)-1) as varchar(200)) from #EmployeeNew
Update e
Set e.EmployeeName = cast('('+e1.Employeename1+')'+left(e1.employeename2,len(e1.employeename2)-1) as varchar(200))
from #Employee e
left outer join #EmployeeNew e1 on ltrim(rtrim(substring(e.employeename,1,patindex('%(%',e.employeename)-1))) =e1.Employeename1

Related

Call function that returns table in a view in SQL Server 2000

SQL Server - Compatibility Level 2000
Person table - PersonId, PersonName, etc.. (~1200 records)
Two user functions - GetPersonAddress(#PersonId), GetPaymentAddress(#PersonId)
These two functions return data in a table with Street, City etc...(one record in the return table for the PersonId)
I have to create a view that joins the person table with these two user functions by passing in the person id.
Limitations:
Cross Apply is not supported on a function in SQL Server 2000
Cursor, temp table and temp variables are not supported in views so that I can loop upon the person table and call the functions.
Can someone help?
You could create functions GetPeopleAddresses() and GetPaymentsAddresses() which return PersonId as a field and then you can use them in JOIN:
SELECT t.PersonId, PersonName, etc..., a1.Address, a2.Address
FROM YourTable t
LEFT JOIN GetPeopleAddresses() a1 ON a1.PersonId = t.PersonId
LEFT JOIN GetPaymentsAddresses() a2 ON a2.PersonId = t.PersonId
Of course, your functions have to return only unique records
I'm afraid that you can't do that with a view in SQL Server 2000 because of the limitations you listed. The next best option as suggested in the comments is a stored procedure that returns the rows that would return the view.
If you need to use the results of the procedure in another query, you can insert the values returned by the procedure in a temporal table. Is not pretty, and you have to make two DB calls (one for creating/populating the temp table, and the other for using it), but it works. For example:
create table #TempResults (
PersonID int not null,
Name varchar(100),
Street varchar(100),
City varchar(100),
<all the other fields>
constraint primary key PK_TempResults (PersonID)
)
insert into #TempResults
exec spTheProcedureThatReplaceTheView #thePersonID
go -- end of the first DB call
select <fields>
from AnotherTable
join #TempResults on <condition>
-- don't forget to drop table when you don't need its current data anymore
drop table #TempResults

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

SQL query with pivot tables?

I'm trying to wrap by brain around how to use pivot tables for a query I need. I have 3 database tables. Showing relevant columns:
TableA: Columns = pName
TableB: Columns = GroupName, GroupID
TableC: Columns = pName, GroupID
TableA contains a list of names (John, Joe, Jack, Jane)
TableB contains a list of groups with an ID#. (Soccer|1, Hockey|2, Basketball|3)
TableC contains a list of the names and the group they belong to (John|1, John|3, Joe|2, Jack|1, Jack|2, Jack|3, Jane|3)
I need to create a matrix like grid view using a SQL query that would return a list of all the names from TableA (Y-axis) and a list of all the possible groups (X-axis). The cell values would be either true or false if they belong to the group.
Any help would be appreciated. I couldn't quite find an existing answer that helped.
You might try it like this
Here I set up a MCVE, please try to create this in your next question yourself...
DECLARE #Name TABLE (pName VARCHAR(100));
INSERT INTO #Name VALUES('John'),('Joe'),('Jack'),('Jane');
DECLARE #Group TABLE(gName VARCHAR(100),gID INT);
INSERT INTO #Group VALUES ('Soccer',1),('Hockey',2),('Basketball',3);
DECLARE #map TABLE(pName VARCHAR(100),gID INT);
INSERT INTO #map VALUES
('John',1),('John',3)
,('Joe',2)
,('Jack',1),('Jack',2),('Jack',3)
,('Jane',3);
This quer will collect the values and perform PIVOT
SELECT p.*
FROM
(
SELECT n.pName
,g.gName
,'x' AS IsInGroup
FROM #map AS m
INNER JOIN #Name AS n ON m.pName=n.pName
INNER JOIN #Group AS g ON m.gID=g.gID
) AS x
PIVOT
(
MAX(IsInGroup) FOR gName IN(Soccer,Hockey,Basketball)
) as p
This is the result.
pName Soccer Hockey Basketball
Jack x x x
Jane NULL NULL x
Joe NULL x NULL
John x NULL x
Some hints:
You might use 1 and 0 instead of x as SQL Server does not know a real boolean
You should add a pID to your names. Never join tables on real data (unless it is something unique and unchangeable [which means never acutally!!!])
UPDATE dynamic SQL (thx to #djlauk)
If you want a query which deals with any amount of groups you have to to this dynamically. But please be aware, that you loose the chance to use this in ad-hoc-SQL like in VIEW or inline TVF, which is quite a big backdraw...
CREATE TABLE #Name(pName VARCHAR(100));
INSERT INTO #Name VALUES('John'),('Joe'),('Jack'),('Jane');
CREATE TABLE #Group(gName VARCHAR(100),gID INT);
INSERT INTO #Group VALUES ('Soccer',1),('Hockey',2),('Basketball',3);
CREATE TABLE #map(pName VARCHAR(100),gID INT);
INSERT INTO #map VALUES
('John',1),('John',3)
,('Joe',2)
,('Jack',1),('Jack',2),('Jack',3)
,('Jane',3);
DECLARE #ListOfGroups VARCHAR(MAX)=
(
STUFF
(
(
SELECT DISTINCT ',' + QUOTENAME(gName)
FROM #Group
FOR XML PATH('')
),1,1,''
)
);
DECLARE #sql VARCHAR(MAX)=
(
'SELECT p.*
FROM
(
SELECT n.pName
,g.gName
,''x'' AS IsInGroup
FROM #map AS m
INNER JOIN #Name AS n ON m.pName=n.pName
INNER JOIN #Group AS g ON m.gID=g.gID
) AS x
PIVOT
(
MAX(IsInGroup) FOR gName IN(' + #ListOfGroups + ')
) as p');
EXEC(#sql);
GO
DROP TABLE #map;
DROP TABLE #Group;
DROP TABLE #Name;
I suspect it may be laborious to keep the pivot up to date if categories are added. Or maybe I just prefer Excel (if you ignore one major advantage). The following approach could be helpful too, assuming you do have Office 365.
I added the three tables using 3 CREATE TABLE statements and 3 INSERT statements based on the code I saw above. (The solutions make use of temporary tables to insert specific values, but I believe you already have the data in your three tables, called TableA, TableB, TableC).
CREATE TABLE TestName (pName VARCHAR(100));
INSERT INTO TestName VALUES('John'),('Joe'),('Jack'),('Jane');
CREATE TABLE TestGroup (gName VARCHAR(100),gID INT);
INSERT INTO TestGroup VALUES ('Soccer',1),('Hockey',2),('Basketball',3);
CREATE TABLE Testmap (pName VARCHAR(100),gID INT);
INSERT INTO Testmap VALUES
('John',1),('John',3)
,('Joe',2)
,('Jack',1),('Jack',2),('Jack',3)
,('Jane',3);
Then, in MS Excel, I added (there may be a shorter sequence but I'm still exploring) the three tables as queries from database > sql server database. After adding them, I added all three to the Data Model (I can elaborate if you ask).
I then inserted PivotTable from the ribbon, chose External data source, but opened the Tables tab (instead of Connections tab), to find my data model (mine was top of the list) and I clicked Open. At some point Excel prompted me to create relationships between tables and it did a good job of auto generating them for me.
After minor tweaks my PivotTable came out like this (I could also ask Excel to show the data as a PivotChart).
Pivot showing groups as columns and names as rows.
The advantage is that you don't have to revisit the PIVOT code in SQL if the list (of groups) changes. As I think someone else mentioned, consider using ids for pName as well, or another way to ensure that you are not stuck the next day if you have two persons named John or Jack.
In Excel you can choose when to refresh the data (or the pivot) and, after refresh, any additional categories will be added and counted.

How to perform a mass SQL insert to one table with rows from two seperate tables

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.
cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)

Performing multiple inserts for a single row in a query

I have a table containing data that i need to migrate into another table with a linking table. This is a one time migration as part of an upgrade.
I have a company table that contains records relating to a company and a contact person.
I want to migrate the contact details into another table and link the new person with a linking table
Consider I have this table which is already populated
tblCompany
CompanyId
CompanyName
RegNo
ContactForename
ContactSurname
And i want to migrate the contact person data to
tblPerson
PersonID (identitycolumn)
Forename
Surname
and use the identity column resulting and insert it into the linking table
tblCompanyPerson
CompanyId
PersonId
I've tried a few different ways to approach this using cursors and output variables into a temp table but none seem right to me (or give me the solution...)
The closest i have got is to have a companyID on tblPerson and insert companyId into it and output the new personId and the companyId into a temp table. Then loop through the temp table to create the tblCompanyContact.
example
declare #companycontact TABLE (companyId int, PersonId int)
insert into tblPerson
(Forename,
Surname,
CompanyID)
output inserted.CompanyID, INSERTED.PersonID into #companycontact
select
ContactPersonForeName,
ContactPersonSurename,
CompanyID
from tblCompany c
insert into tblCompanyPerson
(CompanyID,
PersonID)
select c.companyId, PersonId from #companycontact c
Background
Im using MS SQL Server 2008 R2
The tblPerson is already populated with hundreds of thousands of
records.
There is a 'trick' using MERGE statement to achieve mapping between newly inserted and source values:
MERGE tblPerson trgt
USING tblCompany src ON 1=0
WHEN NOT MATCHED
THEN INSERT
(Forename, Surename)
VALUES (src.ContactPersonForeName, src.ContactPersonSurename)
OUTPUT src.CompanyID, INSERTED.PersonID
INTO tblCompanyPerson (CompanyId, PersonID);
That 1=0 condition is to always get everything from source. You might want to replace it or even whole source with some sub-query to actually check whatever you already have same person mapped.
EDIT: Here is some reading about using MERGE and OUTPUT
Because I don't know what SQL you are using its difficult to decide if this is correct. i also don't know if you already tried this but it's the best idea i have:
insert into tblPerson
(Forename, Surename)
Select ContactForename, ContactPersonSurename
from tblCompany
insert into tblCompanyPerson
(CompanyID, PersonID)
select CompanyId, PersonID
from tblPerson, tblCompany
where ContactForename = Forename and ContactPersonSurename = Surename
Sarajog