A Simple Sql Select Query - sql

I know I am sounding dumb but I really need help on this.
I have a Table (let's say Meeting) which Contains a column Participants.
The Participants dataType is varchar(Max) and it stores Participant's Ids in comma separated form like 1,2.
Now my problem is I am passing a parameter called #ParticipantsID in my Stored Procedure and want to do something like this:
Select Participants from Meeting where Participants in (#ParticipantsID)
Unfortunately I am missing something crucial here.
Can some one point that out?

I've been there before... I changed the DB design to have one record contain a single reference to the other table. If you can't change your DB structures and you have to live with this, I found this solution on CodeProject.
New Function
IF EXISTS(SELECT * FROM sysobjects WHERE ID = OBJECT_ID(’UF_CSVToTable’))
DROP FUNCTION UF_CSVToTable
GO
CREATE FUNCTION UF_CSVToTable
(
#psCSString VARCHAR(8000)
)
RETURNS #otTemp TABLE(sID VARCHAR(20))
AS
BEGIN
DECLARE #sTemp VARCHAR(10)
WHILE LEN(#psCSString) > 0
BEGIN
SET #sTemp = LEFT(#psCSString, ISNULL(NULLIF(CHARINDEX(',', #psCSString) - 1, -1),
LEN(#psCSString)))
SET #psCSString = SUBSTRING(#psCSString,ISNULL(NULLIF(CHARINDEX(',', #psCSString), 0),
LEN(#psCSString)) + 1, LEN(#psCSString))
INSERT INTO #otTemp VALUES (#sTemp)
END
RETURN
END
Go
New Sproc
SELECT *
FROM
TblJobs
WHERE
iCategoryID IN (SELECT * FROM UF_CSVToTable(#sCategoryID))

You would not typically organise your SQL database in quite this way. What you are describing are two entities (Meeting & Participant) that have a one-to-many relationship. i.e. a meeting can have zero or more participants. To model this in SQL you would use three tables: a meeting table, a participant table and a MeetingParticipant table. The MeetingParticipant table holds the links between meetings & participants. So, you might have something like this (excuse any sql syntax errors)
create table Meeting
(
MeetingID int,
Name varchar(50),
Location varchar(100)
)
create table Participant
(
ParticipantID int,
FirstName varchar(50),
LastName varchar(50)
)
create table MeetingParticipant
(
MeetingID int,
ParticipantID int
)
To populate these tables you would first create some Participants:
insert into Participant(ParticipantID, FirstName, LastName) values(1, 'Tom', 'Jones')
insert into Participant(ParticipantID, FirstName, LastName) values(2, 'Dick', 'Smith')
insert into Participant(ParticipantID, FirstName, LastName) values(3, 'Harry', 'Windsor')
and create a Meeting or two
insert into Meeting(MeetingID, Name, Location) values(10, 'SQL Training', 'Room 1')
insert into Meeting(MeetingID, Name, Location) values(11, 'SQL Training', 'Room 2')
and now add some participants to the meetings
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 1)
insert into MeetingParticipant(MeetingID, ParticipantID) values(10, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 2)
insert into MeetingParticipant(MeetingID, ParticipantID) values(11, 3)
Now you can select all the meetings and the participants for each meeting with
select m.MeetingID, p.ParticipantID, m.Location, p.FirstName, p.LastName
from Meeting m
join MeetingParticipant mp on m.MeetingID=mp.MeetingID
join Participant p on mp.ParticipantID=p.ParticipantID
the above should produce
MeetingID ParticipantID Location FirstName LastName
10 1 Room 1 Tom Jones
10 2 Room 1 Dick Smith
11 2 Room 2 Dick Smith
11 3 Room 2 Harry Windsor
If you want to find out all the meetings that "Dick Smith" is in you would write something like this
select m.MeetingID, m.Location
from Meeting m join MeetingParticipant mp on m.MeetingID=mp.ParticipantID
where
mp.ParticipantID=2
and get
MeetingID Location
10 Room 1
11 Room 2
I have omitted important things like indexes, primary keys and missing attributes such as meeting dates, but it is clearer without all the goo.

Your table is not normalized. If you want to query for individual participants, they should be split into their own table, along the lines of:
Meeting
MeetingId primary key
Other stuff
Persons
PersonId primary key
Other stuff
Participants
MeetingId foreign key Meeting(MeetingId)
PersonId foreign key Persons(PersonId)
primary key MeetingId,PersonId
Otherwise, you have to resort to all sorts of trickery (what I call SQL gymnastics) to find out what you want. That trickery never scales well - your queries become slow very quickly as the table grows.
With a properly normalized database, the queries can remain fast well into the multi-millions of records (I work with DB2/z where we are used to truly huge tables).
There are valid reasons for sometimes reverting to second normal form (or even first) for performance but that should be a very hard thought out decision (and based on actual performance data). All databases should initially start of in 3NF.

SELECT * FROM Meeting WHERE Participants LIKE '%,12,%' OR Participants LIKE '12,%' OR Participants LIKE '%,12'
where 12 is the ID you are looking for....
Ugly, what a nasty model.

If I understand your question correctly, you are trying to pass in a comma separated list of participant ids and see if it is in your list. This link lists several ways to do such a thing"
[http://vyaskn.tripod.com/passing_arrays_to_stored_procedures.htm][1]
codezy.blogspot.com

If you store the participant ids in a comma-separated list (as text) in the database, you cannot easily query it (as a list) using SQL. You would have to resort to string-operations.
You should consider changing your schema to use another table to map meetings to participants:
create table meeting_participants (
meeting_id integer not null , -- foreign key
participant_id integer not null
);
That table would have multiple rows per meeting (one for each participant).
You can then query that table for individual participants, or number of participants, and such.

If participants is a separate data type you should be storing it as a child table of your meeting table. e.g.
MEETING
PARTICIPANT 1
PARTICIPANT 2
PARTICIPANT 3
Each participant would hold the meeting ID so you can do a query
SELECT * FROM participants WHERE meeting_id = 1
However, if you must store a comma separated list (for some external reason) then you can do a string search to find the appropriate record. This would be a very inefficient way to do a query though.

That is not the best way to store the information you have.
If it is all you have got then you need to be doing a contains (not an IN). The best answer is to have another table that links Participants to Meetings.
Try SELECT Meeting, Participants FROM Meeting CONTAINS(Participants, #ParticipantId)

Related

How to update multiple comma seperated values in a single column in sql

Question on SQL
Suppose there is a table.
I can't reproduce your syntax error with the information you have provided so I suspect you have mistyped something somewhere.
However, see the comments - this is the wrong way to store your data. Perhaps these code snippets will help.
You need a table to contain the Team and a table to contain the People. You then need a separate table to link the two together.
create table #Teams (TeamId int identity(1,1), TeamName nvarchar(50));
create table #Members (MemberId int identity(1,1), MemberName nvarchar(50));
create table #TeamMembers (MemberId int, TeamId int);
E.g.
-- create your team first
insert into #Teams (TeamName) values ('Warriors');
-- create your people next
insert into #Members (MemberName) values
('John'),('Alexa'),('Tony');
-- Now (and only now) link members to teams
insert into #TeamMembers (MemberId, TeamId) values
(1, 1),(2,1),(3,1)
To get your data all reported together start with these joins
select t.TeamName, m.MemberName
from #Teams t
join #TeamMembers tm on t.TeamId = tm.TeamId
join #Members m on tm.MemberId = m.MemberId;
Things you may need to do your own research for:
One to Many, Many to Many relationships
Database normalisation
If you really want a comma separated list then "sql generate comma separated list"

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

How to perform a mass SQL insert to one table with rows from two seperate tables

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.
cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)

Select Value Of CSV in MasterTable From Reference Table

Consider these two tables:
--Subscriber_File---
ID GenreId FileName
01 1,2 TestFile.pdf
--MasterGenre--
ID Genrename
1 TEst1
2 Test2
When I issue this query, I'd like the result to be formatted as follows
Select * From Subscriber_File
ID GenreId FileName GenreName
1 1,2 TestFile.pdf TEst1,Test2
How can this be done?
Your data is not normalized. Specifically, in the one row for Subscriber_File, you have two facts in one place: the fact that the one entry is realted to both MasterGenre 1 and MaterGenre 2. What if they were related with three MaterGenres? What if 10? The code requited to associate your facts quickly escalates into an unmanageable mess.
The standard solution—when using relational database systems—is to normalize you data, such that each “repeating fact” is represented by one row in a table. (Google "database normalization" and you'll find thousands of articles on the subject. Really.) Here, you might end up with:
Table Subscriber
SubscriberId
FileName
(01, TestFile.pdf)
Table Genre
GenreId
GenreName
(1, Test1)
(2, Test2)
Table SubscriberGenre
SubscriberId
GenreId
(01, 1)
(01, 2)
At which point querying the data becomse trivial:
SELECT sub.SubscribeId, gen.GenreId, sub.FileName, gen.GenreName
From Subscriber sub
Inner join SubscriberGenre subgen
On subgen.SubscriberId = sub.SubscriberId
Inner join Gener gen
On gen.GenreId = subgen.GenreId
This should produce the result set
(01, 1, TestFile.pdf, Test1)
(01, 2, TestFile.pdf, Test2)
Hmm, you’re still challenged with converting those two lines into the one with a “1,2” value. I’ll let someone else answer that; my main point is that without normalized table structures, you’ll have trouble getting anything done.