How to perform a mass SQL insert to one table with rows from two seperate tables - sql

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.

cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)

Related

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

How do I load the data from the first 2 hive tables into the 3rd one below?

The below is a simplified version of the problem I am facing
Let's say I have an employee and a department table in Hive. My goal is to load the data from these 2 tables into a 3rd one below. However, the 3rd table has a few dummy columns set to null and will not be filled by data from either of the employee or department tables. Is it possible to still load the employee and department data and just set the other fields to null?
Employee table(id,first_name,last_name,age,department_id,salary)
1,John,Smith,23,1,40000
2,Bob,Wilson,25,1,45000
3,Fred,Krug,37,2,75000
4,Jeremy,Fisher,41,3,110000
Department table(id,name)
1,Sales
2,IT
3,Marketing
End result(dummy_column0,employeeID,first_name,last_name,age,salary,department_name,dummy_column1)
null,1,John,Smith,23,40000,Sales,null
null,2,Bob,Wilson,25,45000,Sales,null
null,3,Fred,Krug,37,75000,IT,null
null,4,Jeremy,Fisher,41,110000,Marketing,null
Question is given the schema of the end result, how do I load the rest of the non-null data into the 3rd table? Any help would be much appreciated! The end results table already exists at this point so I cannot just recreate it from scratch
Yes. Hive doesn't care of the column names. Its just position of the columns that matter the most. you just have to structure your query in a way so dummy columns have nulls.
insert overwrite table tablename
select null, employeeID, first_name,last_name, age, salary, dept.deptName, null
from employee e join dept d on e.dept_id = d.dept_id;

creating the sql query for company supervisors

I have four tables
create table emp (emp_ss int, emp_name nvarchar(20));
create table comp(comp_name nvarchar(20), comp_address nvarchar(20));
create table works (emp_ss int, comp_name nvarchar(20));
create table supervises (spv_ss int, emp_ss int );
Here SUPRVISER_SS and EMP_SS are subset of SS. Now I have to find:
the name of all the companies who have more than 4 supervisors
I have made a query for the above problem but not sure whether it is correct or not
SELECT COMP_NAME , COUNT(EMP_SS) FROM WORKS
WHERE EMP_SS IN (SELECT DISTINCT SPV_SS FROM supervises)
GROUP BY COMP_NAME
HAVING COUNT(EMP_SS) > 4;
the name of supervisors who have the largest number of employees
but unable to get the required result of the above condition
SELECT SPV_SS, COUNT(*) max_ FROM supervises GROUP BY SPV_SS
You don't need to have a seperate table for supervisors unless they come with extra information that doesn't belong in the employee table, just add an extra field (foreign key) in Employee table that links to the primary key in the same table.
First question: select company just use a group by companyid clause and then check if the count of supervisors is larger than 4 for.
Second question: select count(empid) and supervisor, use group by supervisor clause and add order by clause on the count column
I explained the logic, as for the actual sql code, you're gonna have to figure that out yourself.

Return Complex data types in SQL server

I have two tables as below: One is department, the other is employee. Every department have several employees. Here is table definition:
Create table [dbo].[Department]
(
ID int not null,
Name varchar(100) not null,
revenue int not null
)
Create table [dbo].[Employee]
(
ID int not null,
DepartmentID int not null,--This is foreign key to Department
Name varchar(100) not null,
Level int not null
)
Now, I want to query all the department with employee if department revenue is more than a value. So the result will contain a department list. For every department, it should also contain employee list.
In our client, we use c# to retrieve the data, here is the interface:
public List<Department> GetHighRevenueDepartmentWithEmployee(int revenueBar)
{
throw new NotImplementedException();
}
My question here is: is it possible to get the compound result in one query/stored procedure. It is OK use one stored procedure to query multiple times, but the whole data must be return back to client in one time.(This is performance reason) How can we achieve this? I know we can define table values type here, but I cannot define a table value type which contain another table value as column inside.
Typically, you'd just return a result set with all of the columns from both tables:
SELECT
d.ID as DepartmentID,d.Name as DepartmentName,d.Revenue,
e.ID as EmployeeID,e.Name as EmployeeName,e.Level
FROM
Department d
inner join
Employee e
on d.ID = e.DepartmentID
WHERE
d.revenue > #value
ORDER BY
d.ID
Now, the data returned for the department will be repeated multiple times - once for every employee. But that's fine. Just use the first row for each new department and ignore it in the remaining rows. That would be for the first row of the result set, and for any row where the DepartmentID is different from the previous row.

How to insert values from column A of table X to column B of table Y - and order them randomly

I need to collect the values from the column "EmployeeID" of the table "Employees" and insert them into the column "EmployeeID" of the table "Incident".
At the end, the Values in the rows of the column "EmployeeID" should be arranged randomly.
More precisely;
I created 10 employees with their ID's, counting from 1 up to 10.
Those Employees, in fact the ID's, should receive random Incidents to work on.
So ... there are 10 ID's to spread on all Incidents - which might be 1000s.
How do i do this?
It's just for personal exercise on the local maschine.
I googled, but didn't find an explicit answer to my problem.
Should be simple to solve for you champs. :)
May anyone help me, please?
NOTES:
1) I've already created a column called "EmployeeID" in the table "Incident", therefore I'll need an update statement, won't I?
2) Schema:
[dbo].[EmployeeType]
[dbo].[Company]
[dbo].[Division]
[dbo].[Team]
[dbo].[sysdiagrams]
[dbo].[Incident]
[dbo].[Employees]
3) 1. Pre-solution:
CREATE TABLE IncidentToEmployee
(
IncidentToEmployeeID BIGINT IDENTITY(1,1) NOT NULL,
EmployeeID BIGINT NULL,
Incident FLOAT NULL
PRIMARY KEY CLUSTERED (IncidentToEmployeeID)
)
INSERT INTO IncidentToEmployee
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
SELECT * FROM IncidentToEmployee
GO
3) 2. Output by INNER JOIN ON
In case you are wondering about the "Alias" column;
Nobody really knows which persons are behind the ID's - that's why I used an Alias column.
SELECT Employees.Alias,
IncidentToEmployee.Incident
FROM Employees
INNER JOIN
IncidentToEmployee ON
Employees.EmployeeID = IncidentToEmployee.EmployeeID
ORDER BY Alias
4) Final Solution
As I mentioned, I added at first a column called "EmployeeID" already to my "Incident" table. That's why I couldn't use an INSERT INTO statement at first and had to use an UPDATE statement. I found the most suitable solution now - without creating a new table as I did as a pre-solution.
Take a look at the following code:
ALTER Table Incident
ADD EmployeeID BIGINT NULL
UPDATE Incident
SET Incident.EmployeeID = EmployeeID
FROM Incident INNER JOIN Employees
ON Incident = EmployeeID
SELECT
EmployeeID,
Incident
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Thank you all for your help - It took way longer to find a solution as I thought it would take; but I finally made it. Thanks!
UPDATE
I think you need to allocate different task to different user, a better approach will be to create a new table let's say EmployeeIncidents having columns Id(primary) , EmployeeID and IncidentID .
Now you can insert random EmployeesID and random IncidentID to new table, this way you will be able to keep records also ,
Updating Incident table will not be a smart choice.
INSERT INTO EmployeeIncidents
SELECT TOP ( 10 )
EmployeesID ,
IncidentID
FROM dbo.Employees,
dbo.Incident
ORDER BY NEWID()
Written by hand, so may need to tweak syntax, but something like this should do it. The Rand() function will give the same value unless seeded, so you can see with something like date to get randomness.
Insert Into Incidents
Select Top 10
EmployeeID
From Employees
Order By
Rand(GetDate())