How to insert values from column A of table X to column B of table Y - and order them randomly - sql

I need to collect the values from the column "EmployeeID" of the table "Employees" and insert them into the column "EmployeeID" of the table "Incident".
At the end, the Values in the rows of the column "EmployeeID" should be arranged randomly.
More precisely;
I created 10 employees with their ID's, counting from 1 up to 10.
Those Employees, in fact the ID's, should receive random Incidents to work on.
So ... there are 10 ID's to spread on all Incidents - which might be 1000s.
How do i do this?
It's just for personal exercise on the local maschine.
I googled, but didn't find an explicit answer to my problem.
Should be simple to solve for you champs. :)
May anyone help me, please?
NOTES:
1) I've already created a column called "EmployeeID" in the table "Incident", therefore I'll need an update statement, won't I?
2) Schema:
[dbo].[EmployeeType]
[dbo].[Company]
[dbo].[Division]
[dbo].[Team]
[dbo].[sysdiagrams]
[dbo].[Incident]
[dbo].[Employees]
3) 1. Pre-solution:
CREATE TABLE IncidentToEmployee
(
IncidentToEmployeeID BIGINT IDENTITY(1,1) NOT NULL,
EmployeeID BIGINT NULL,
Incident FLOAT NULL
PRIMARY KEY CLUSTERED (IncidentToEmployeeID)
)
INSERT INTO IncidentToEmployee
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
SELECT * FROM IncidentToEmployee
GO
3) 2. Output by INNER JOIN ON
In case you are wondering about the "Alias" column;
Nobody really knows which persons are behind the ID's - that's why I used an Alias column.
SELECT Employees.Alias,
IncidentToEmployee.Incident
FROM Employees
INNER JOIN
IncidentToEmployee ON
Employees.EmployeeID = IncidentToEmployee.EmployeeID
ORDER BY Alias
4) Final Solution
As I mentioned, I added at first a column called "EmployeeID" already to my "Incident" table. That's why I couldn't use an INSERT INTO statement at first and had to use an UPDATE statement. I found the most suitable solution now - without creating a new table as I did as a pre-solution.
Take a look at the following code:
ALTER Table Incident
ADD EmployeeID BIGINT NULL
UPDATE Incident
SET Incident.EmployeeID = EmployeeID
FROM Incident INNER JOIN Employees
ON Incident = EmployeeID
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Thank you all for your help - It took way longer to find a solution as I thought it would take; but I finally made it. Thanks!

UPDATE
I think you need to allocate different task to different user, a better approach will be to create a new table let's say EmployeeIncidents having columns Id(primary) , EmployeeID and IncidentID .
Now you can insert random EmployeesID and random IncidentID to new table, this way you will be able to keep records also ,
Updating Incident table will not be a smart choice.
INSERT INTO EmployeeIncidents
SELECT TOP ( 10 )
EmployeesID ,
IncidentID
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()

Written by hand, so may need to tweak syntax, but something like this should do it. The Rand() function will give the same value unless seeded, so you can see with something like date to get randomness.
Insert Into Incidents
Select Top 10
EmployeeID
From Employees
Order By
Rand(GetDate())

Related

creating the sql query for company supervisors

I have four tables
create table emp (emp_ss int, emp_name nvarchar(20));
create table comp(comp_name nvarchar(20), comp_address nvarchar(20));
create table works (emp_ss int, comp_name nvarchar(20));
create table supervises (spv_ss int, emp_ss int );
Here SUPRVISER_SS and EMP_SS are subset of SS. Now I have to find:
the name of all the companies who have more than 4 supervisors
I have made a query for the above problem but not sure whether it is correct or not
SELECT COMP_NAME , COUNT(EMP_SS) FROM WORKS
WHERE EMP_SS IN (SELECT DISTINCT SPV_SS FROM supervises)
GROUP BY COMP_NAME
HAVING COUNT(EMP_SS) > 4;
the name of supervisors who have the largest number of employees
but unable to get the required result of the above condition
SELECT SPV_SS, COUNT(*) max_ FROM supervises GROUP BY SPV_SS
You don't need to have a seperate table for supervisors unless they come with extra information that doesn't belong in the employee table, just add an extra field (foreign key) in Employee table that links to the primary key in the same table.
First question: select company just use a group by companyid clause and then check if the count of supervisors is larger than 4 for.
Second question: select count(empid) and supervisor, use group by supervisor clause and add order by clause on the count column
I explained the logic, as for the actual sql code, you're gonna have to figure that out yourself.

How do I insert data from one table to another when there is unequal number of rows in one another?

I have a table named People that has 19370 rows with playerID being the primary column. There is another table named Batting, which has playerID as a foreign key and has 104324 rows.
I was told to add a new column in the People table called Total_HR, which is included in the Batting table. So, I have to insert that column data from the Batting table into the People table.
However, I get the error:
Msg 515, Level 16, State 2, Line 183 Cannot insert the value NULL into
column 'playerID', table 'Spring_2019_BaseBall.dbo.People'; column
does not allow nulls. INSERT fails. The statement has been terminated.
I have tried UPDATE and INSERT INTO SELECT, however got the same error
insert into People (Total_HR)
select sum(HR) from Batting group by playerID
I expect the output to populate the column Total_HR in the People table using the HR column from the Batting table.
You could use a join
BEGIN TRAN
Update People
Set Total_HR = B.HR_SUM
from PEOPLE A
left outer join
(Select playerID, sum(HR) HR_SUM
from Batting
group by playerID) B on A.playerID = B.playerID
Select * from People
ROLLBACK
Notice that I've put this code in a transaction block so you can test the changes before you commit
From the error message, it seems that playerID is a required field in table People.
You need to specify all required fields of table People in the INSERT INTO clause and provide corresponding values in the SELECT clause.
I added field playerID below, but you might need to add additional required fields as well.
insert into People (playerID, Total_HR)
select playerID, sum(HR) from Batting group by playerID
It is strange, however, that you want to insert rows in a table that should already be there. Otherwise, you could not have a valid foreign key on field playerID in table Batting... If you try to insert such rows from table Batting into table People, you might get another error (violation of PRIMARY KEY constraint)... Unless... you are creating the database just now and you want to populate empty table People from filled/imported table Batting before adding the actual foreign key constraint to table People. Sorry, I will not question your intentions. I personally would consider to update the query somewhat so that it will not attempt to insert any rows that already exist in table People:
insert into People (playerID, Total_HR)
select Batting.playerID, sum(Batting.HR)
from Batting
left join People on People.playerID = Batting.playerID
where People.playerID is null and Batting.playerID is not null
group by playerID
You need to calculate the SUM and then join this result to the People table to bring into the same rows both Total_HR column from People and the corresponding SUM calculated from Batting.
Here is one way to write it. I used CTE to make is more readable.
WITH
CTE_Sum
AS
(
SELECT
Batting.playerID
,SUM(Batting.HR) AS TotalHR_Src
FROM
Batting
GROUP BY
Batting.playerID
)
,CTE_Update
AS
(
SELECT
People.playerID
,People.Total_HR
,CTE_Sum.TotalHR_Src
FROM
CTE_Sum
INNER JOIN People ON People.playerID = CTE_Sum.playerID
)
UPDATE CTE_Update
SET
Total_HR = TotalHR_Src
;

Split one large, denormalized table into a normalized database

I have a large (5 million row, 300+ column) csv file I need to import into a staging table in SQL Server, then run a script to split each row up and insert data into the relevant tables in a normalized db. The format of the source table looks something like this:
(fName, lName, licenseNumber1, licenseIssuer1, licenseNumber2, licenseIssuer2..., specialtyName1, specialtyState1, specialtyName2, specialtyState2..., identifier1, identifier2...)
There are 50 licenseNumber/licenseIssuer columns, 15 specialtyName/specialtyState columns, and 15 identifier columns. There is always at least one of each of those, but the remaining 49 or 14 could be null. The first identifier is unique, but is not used as the primary key of the Person in our schema.
My database schema looks like this
People(ID int Identity(1,1))
Names(ID int, personID int, lName varchar, fName varchar)
Licenses(ID int, personID int, number varchar, issuer varchar)
Specialties(ID int, personID int, name varchar, state varchar)
Identifiers(ID int, personID int, value)
The database will already be populated with some People before adding the new ones from the csv.
What is the best way to approach this?
I have tried iterating over the staging table one row at a time with select top 1:
WHILE EXISTS (Select top 1 * from staging)
BEGIN
INSERT INTO People Default Values
SET #LastInsertedID = SCOPE_IDENTITY() -- might use the output clause to get this instead
INSERT INTO Names (personID, lName, fName)
SELECT top 1 #LastInsertedID, lName, fName from staging
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber1, licenseIssuer1 from staging
IF (select top 1 licenseNumber2 from staging) is not null
BEGIN
INSERT INTO Licenses(personID, number, issuer)
SELECT top 1 #LastInsertedID, licenseNumber2, licenseIssuer2 from staging
END
-- Repeat the above 49 times, etc...
DELETE top 1 from staging
END
One problem with this approach is that it is prohibitively slow, so I refactored it to use a cursor. This works and is significantly faster, but has me declaring 300+ variables for Fetch INTO.
Is there a set-based approach that would work here? That would be preferable, as I understand that cursors are frowned upon, but I'm not sure how to get the identity from the INSERT into the People table for use as a foreign key in the others without going row-by-row from the staging table.
Also, how could I avoid copy and pasting the insert into the Licenses table? With a cursor approach I could try:
FETCH INTO ...#LicenseNumber1, #LicenseIssuer1, #LicenseNumber2, #LicenseIssuer2...
INSERT INTO #LicenseTemp (number, issuer) Values
(#LicenseNumber1, #LicenseIssuer1),
(#LicenseNumber2, #LicenseIssuer2),
... Repeat 48 more times...
.
.
.
INSERT INTO Licenses(personID, number, issuer)
SELECT #LastInsertedID, number, issuer
FROM #LicenseTEMP
WHERE number is not null
There still seems to be some redundant copy and pasting there, though.
To summarize the questions, I'm looking for idiomatic approaches to:
Break up one large staging table into a set of normalized tables, retrieving the Primary Key/identity from one table and using it as the foreign key in the others
Insert multiple rows into the normalized tables that come from many repeated columns in the staging table with less boilerplate/copy and paste (Licenses and Specialties above)
Short of discreet answers, I'd also be very happy with pointers towards resources and references that could assist me in figuring this out.
Ok, I'm not an SQL Server expert, but here's the "strategy" I would suggest.
Calculate the personId on the staging table
As #Shnugo suggested before me, calculating the personId in the staging table will ease the next steps
Use a sequence for the personID
From SQL Server 2012 you can define sequences. If you use it for every person insert, you'll never risk an overlapping of IDs. If you have (as it seems) personId that were loaded before the sequence you can create the sequence with the first free personID as starting value
Create a numbers table
Create an utility table keeping numbers from 1 to n (you need n to be at least 50.. you can look at this question for some implementations)
Use set logic to do the insert
I'd avoid cursor and row-by-row logic: you are right that it is better to limit the number of accesses to the table, but I'd say that you should strive to limit it to one access for target table.
You could proceed like these:
People:
INSERT INTO People (personID)
SELECT personId from staging;
Names:
INSERT INTO Names (personID, lName, fName)
SELECT personId, lName, fName from staging;
Licenses:
here we'll need the Number table
INSERT INTO Licenses (personId, number, issuer)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then licenseNumber1
when 2 then licenseNumber2
...
when 50 then licenseNumber50
end as licenseNumber,
case nbrs.n
when 1 then licenseIssuer1
when 2 then licenseIssuer2
...
when 50 then licenseIssuer50
end as licenseIssuer
from staging
cross join
(select n from numbers where n>=1 and n<=50) nbrs
) WHERE licenseNumber is not null;
Specialties:
INSERT INTO Specialties(personId, name, state)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then specialtyName1
when 2 then specialtyName2
...
when 15 then specialtyName15
end as specialtyName,
case nbrs.n
when 1 then specialtyState1
when 2 then specialtyState2
...
when 15 then specialtyState15
end as specialtyState
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE specialtyName is not null;
Identifiers:
INSERT INTO Identifiers(personId, value)
SELECT * FROM (
SELECT personId,
case nbrs.n
when 1 then identifier1
when 2 then identifier2
...
when 15 then identifier15
end as value
from staging
cross join
(select n from numbers where n>=1 and n<=15) nbrs
) WHERE value is not null;
Hope it helps.
You say: but the staging table could be modified
I would
add a PersonID INT NOT NULL column and fill it with DENSE_RANK() OVER(ORDER BY fname,lname)
add an index to this PersonID
use this ID in combination with GROUP BY to fill your People table
do the same with your names table
And then use this ID for a set-based insert into your three side tables
Do it like this
SELECT AllTogether.PersonID, AllTogether.TheValue
FROM
(
SELECT PersonID,SomeValue1 AS TheValue FROM StagingTable
UNION ALL SELECT PersonID,SomeValue2 FROM StagingTable
UNION ALL ...
) AS AllTogether
WHERE AllTogether.TheValue IS NOT NULL
UPDATE
You say: might cause a conflict with IDs that already exist in the People table
You did not tell anything about existing People...
Is there any sure and unique mark to identify them? Use a simple
UPDATE StagingTable SET PersonID=xyz WHERE ...
to set existing PersonIDs into your staging table and then use something like
UPDATE StagingTable
SET PersonID=DENSE RANK() OVER(...) + MaxExistingID
WHERE PersonID IS NULL
to set new IDs for PersonIDs still being NULL.

Query trying to select but get ambiguous error?

It runs but I select all the columns. Can someone explain to me why my first query doesn't work? I don't think I need a join. If I can get some help that would be good. To be quite honest I've never seen the error before. If it works with SELECT*, I don't understand why I have issues with select specific columns.
These are my tables:
create table product
(
pdt# varchar(10) not null,
pdt_name varchar(30) not null,
pdt_label varchar(30) not null,
constraint product_pk primary key (pdt#));
create table orders
(
pdt# varchar(10) not null,
qty number(11,0) not null,
city varchar(30) not null
);
And these are the values
insert into product values ([111,chair,chr]);
insert into product values ([222,stool,stl]);
insert into product values ([333,table,tbl]);
insert into orders values ([111,22,Ottawa]);
insert into orders values ([222,22,Ottawa]);
insert into orders values ([333,22,Toronto]);
Question is this:
c. List all [pdt#,pdt_name,qty] when the order is from [Ottawa]
I tried:
SELECT pdt#, pdt_name, qty FROM orders, product WHERE city='Ottawa';
I get column is ambiguously defined error. But when I run:
SELECT *, qty FROM orders, product WHERE city='Ottawa';
It runs but I select all the columns. Can someone explain to me why my first query doesn't work? I don't think I need a join. If I can get some help that would be good. To be quite honest I've never seen the error before. If it works with SELECT*, I don't understand why I have issues with select specific columns.
This is because both the tables have pdt# in common and you are selecting it in your query. In cases like these, you have to explicitly specify the table from which the column should be picked up.
You should also join the tables. Else you would get a cross-joined result.
SELECT p.pdt#, p.pdt_name, o.qty
FROM orders o join product p on o.pdt# = p.pdt#
WHERE o.city='Ottawa';
Your second query works because you are selecting all the columns from both the tables and ideally it should not be done. Always specify the columns you need when you are selecting from more than one table.

How to perform a mass SQL insert to one table with rows from two seperate tables

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.
cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)