Underlying rows in Group By - sql

I have a table with a certain number of columns and a primary key column (suppose OriginalKey). I perform a GROUP BY on a certain sub-set of those columns and store them in a temporary table with primary key (suppose GroupKey). At a later stage, I may need to get more details about one or more of those groupings (which can be found in the temporary table) i.e. I need to know which were the rows from the original table that formed that group. Simply put, I need to know the mappings between GroupKey and OriginalKey. What's the best way to do this? Thanks in advance.
Example:
Table Student(
StudentID INT PRIMARY KEY,
Level INT, --Grade/Class/Level depending on which country you are from)
HomeTown TEXT,
Gender CHAR)
INSERT INTO TempTable SELECT HomeTown, Gender, COUNT(*) AS NumStudents FROM Student GROUP BY HomeTown, Gender
On a later date, I would like to find out details about all towns that have more than 50 male students and know details of every one of them.

How about joining the 2 tables using the GroupKey, which, you say, are the same?
Or how about doing:
select * from OriginalTable where
GroupKey in (select GroupKey from my_temp_table)

You'd need to store the fields you grouped on in your temporary table, so you can join back to the original table. e.g. if you grouped on fieldA, fieldB, and fieldC, you'd need something like:
select original.id
from original
inner join temptable on
temptable.fieldA = original.fieldA and
temptable.fieldB = original.fieldB and
temptable.fieldC = original.fieldC

Related

How to group by one column and limit to rows where another column has the same value for all rows in group?

I have a table like this
CREATE TABLE userinteractions
(
userid bigint,
dobyr int,
-- lots more fields that are not relevant to the question
);
My problem is that some of the data is polluted with multiple dobyr values for the same user.
The table is used as the basis for further processing by creating a new table. These cases need to be removed from the pipeline.
I want to be able to create a clean table that contains unique userid and dobyr limited to the cases where there is only one value of dobyr for the userid in userinteractions.
For example I start with data like this:
userid,dobyr
1,1995
1,1995
2,1999
3,1990 # dobyr values not equal
3,1999 # dobyr values not equal
4,1989
4,1989
And I want to select from this to get a table like this:
userid,dobyr
1,1995
2,1999
4,1989
Is there an elegant, efficient way to get this in a single sql query?
I am using postgres.
EDIT: I do not have permissions to modify the userinteractions table, so I need a SELECT solution, not a DELETE solution.
Clarified requirements: your aim is to generate a new, cleaned-up version of an existing table, and the clean-up means:
If there are many rows with the same userid value but also the same dobyr value, one of them is kept (doesn't matter which one), rest gets discarded.
All rows for a given userid are discarded if it occurs with different dobyr values.
create table userinteractions_clean as
select distinct on (userid,dobyr) *
from userinteractions
where userid in (
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1 )
order by userid,dobyr;
This could also be done with an not in, not exists or exists conditions. Also, select which combination to keep by adding columns at the end of order by.
Updated demo with tests and more rows.
If you don't need the other columns in the table, only something you'll later use as a filter/whitelist, plain userid's from records with (userid,dobyr) pairs matching your criteria are enough, as they already uniquely identify those records:
create table userinteractions_whitelist as
select userid
from userinteractions
group by userid
having count(distinct dobyr)=1
Just use a HAVING clause to assert that all rows in a group must have the same dobyr.
SELECT
userid,
MAX(dobyr) AS dobyr
FROM
userinteractions
GROUP BY
userid
HAVING
COUNT(DISTINCT dobyr) = 1

PostgreSQL Insert into table with subquery selecting from multiple other tables

I am learning SQL (postgres) and am trying to insert a record into a table that references records from two other tables, as foreign keys.
Below is the syntax I am using for creating the tables and records:
-- Create a person table + insert single row
CREATE TABLE person (
pname VARCHAR(255) NOT NULL,
PRIMARY KEY (pname)
);
INSERT INTO person VALUES ('personOne');
-- Create a city table + insert single row
CREATE TABLE city (
cname VARCHAR(255) NOT NULL,
PRIMARY KEY (cname)
);
INSERT INTO city VALUES ('cityOne');
-- Create a employee table w/ForeignKey reference
CREATE TABLE employee (
ename VARCHAR(255) REFERENCES person(pname) NOT NULL,
ecity VARCHAR(255) REFERENCES city(cname) NOT NULL,
PRIMARY KEY(ename, ecity)
);
-- create employee entry referencing existing records
INSERT INTO employee VALUES(
SELECT pname FROM person
WHERE pname='personOne' AND <-- ISSUE
SELECT cname FROM city
WHERE cname='cityOne
);
Notice in the last block of code, where I'm doing an INSERT into the employee table, I don't know how to string together multiple SELECT sub-queries to get both the existing records from the person and city table such that I can create a new employee entry with attributes as such:
ename='personOne'
ecity='cityOne'
The textbook I have for class doesn't dive into sub-queries like this and I can't find any examples similar enough to mine such that I can understand how to adapt them for this use case.
Insight will be much appreciated.
There doesn’t appear to be any obvious relationship between city and person which will make your life hard
The general pattern for turning a select that has two base tables giving info, into an insert is:
INSERT INTO table(column,list,here)
SELECT column,list,here
FROM
a
JOIN b ON a.x = b.y
In your case there isn’t really anything to join on because your one-column tables have no column in common. Provide eg a cityname in Person (because it seems more likely that one city has many person) then you can do
INSERT INTO employee(personname,cityname)
SELECT p.pname, c.cname
FROM
person p
JOIN city c ON p.cityname = c.cname
But even then, the tables are related between themselves and don’t need the third table so it’s perhaps something of an academic exercise only, not something you’d do in the real world
If you just want to mix every person with every city you can do:
INSERT INTO employee(personname,cityname)
SELECT pname, cname
FROM
person p
CROSS JOIN city c
But be warned, two people and two cities will cause 4 rows to be inserted, and so on (20 people and 40 cities, 800 rows. Fairly useless imho)
However, I trust that the general pattern shown first will suffice for your learning; write a SELECT that shows the data you want to insert, then simply write INSERT INTO table(columns) above it. The number of columns inserted to must match the number of columns selected. Don’t forget that you can select fixed values if no column from the query has the info (INSERT INTO X(p,c,age) SELECT personname, cityname, 23 FROM ...)
The following will work for you:
INSERT INTO employee
SELECT pname, cname FROM person, city
WHERE pname='personOne' AND cname='cityOne';
This is a cross join producing a cartesian product of the two tables (since there is nothing to link the two). It reads slightly oddly, given that you could just as easily have inserted the values directly. But I assume this is because it is a learning exercise.
Please note that there is a typo in your create employee. You are missing a comma before the primary key.

creating the sql query for company supervisors

I have four tables
create table emp (emp_ss int, emp_name nvarchar(20));
create table comp(comp_name nvarchar(20), comp_address nvarchar(20));
create table works (emp_ss int, comp_name nvarchar(20));
create table supervises (spv_ss int, emp_ss int );
Here SUPRVISER_SS and EMP_SS are subset of SS. Now I have to find:
the name of all the companies who have more than 4 supervisors
I have made a query for the above problem but not sure whether it is correct or not
SELECT COMP_NAME , COUNT(EMP_SS) FROM WORKS
WHERE EMP_SS IN (SELECT DISTINCT SPV_SS FROM supervises)
GROUP BY COMP_NAME
HAVING COUNT(EMP_SS) > 4;
the name of supervisors who have the largest number of employees
but unable to get the required result of the above condition
SELECT SPV_SS, COUNT(*) max_ FROM supervises GROUP BY SPV_SS
You don't need to have a seperate table for supervisors unless they come with extra information that doesn't belong in the employee table, just add an extra field (foreign key) in Employee table that links to the primary key in the same table.
First question: select company just use a group by companyid clause and then check if the count of supervisors is larger than 4 for.
Second question: select count(empid) and supervisor, use group by supervisor clause and add order by clause on the count column
I explained the logic, as for the actual sql code, you're gonna have to figure that out yourself.

How do I insert data from one table to another when there is unequal number of rows in one another?

I have a table named People that has 19370 rows with playerID being the primary column. There is another table named Batting, which has playerID as a foreign key and has 104324 rows.
I was told to add a new column in the People table called Total_HR, which is included in the Batting table. So, I have to insert that column data from the Batting table into the People table.
However, I get the error:
Msg 515, Level 16, State 2, Line 183 Cannot insert the value NULL into
column 'playerID', table 'Spring_2019_BaseBall.dbo.People'; column
does not allow nulls. INSERT fails. The statement has been terminated.
I have tried UPDATE and INSERT INTO SELECT, however got the same error
insert into People (Total_HR)
select sum(HR) from Batting group by playerID
I expect the output to populate the column Total_HR in the People table using the HR column from the Batting table.
You could use a join
BEGIN TRAN
Update People
Set Total_HR = B.HR_SUM
from PEOPLE A
left outer join
(Select playerID, sum(HR) HR_SUM
from Batting
group by playerID) B on A.playerID = B.playerID
Select * from People
ROLLBACK
Notice that I've put this code in a transaction block so you can test the changes before you commit
From the error message, it seems that playerID is a required field in table People.
You need to specify all required fields of table People in the INSERT INTO clause and provide corresponding values in the SELECT clause.
I added field playerID below, but you might need to add additional required fields as well.
insert into People (playerID, Total_HR)
select playerID, sum(HR) from Batting group by playerID
It is strange, however, that you want to insert rows in a table that should already be there. Otherwise, you could not have a valid foreign key on field playerID in table Batting... If you try to insert such rows from table Batting into table People, you might get another error (violation of PRIMARY KEY constraint)... Unless... you are creating the database just now and you want to populate empty table People from filled/imported table Batting before adding the actual foreign key constraint to table People. Sorry, I will not question your intentions. I personally would consider to update the query somewhat so that it will not attempt to insert any rows that already exist in table People:
insert into People (playerID, Total_HR)
select Batting.playerID, sum(Batting.HR)
from Batting
left join People on People.playerID = Batting.playerID
where People.playerID is null and Batting.playerID is not null
group by playerID
You need to calculate the SUM and then join this result to the People table to bring into the same rows both Total_HR column from People and the corresponding SUM calculated from Batting.
Here is one way to write it. I used CTE to make is more readable.
WITH
CTE_Sum
AS
(
SELECT
Batting.playerID
,SUM(Batting.HR) AS TotalHR_Src
FROM
Batting
GROUP BY
Batting.playerID
)
,CTE_Update
AS
(
SELECT
People.playerID
,People.Total_HR
,CTE_Sum.TotalHR_Src
FROM
CTE_Sum
INNER JOIN People ON People.playerID = CTE_Sum.playerID
)
UPDATE CTE_Update
SET
Total_HR = TotalHR_Src
;

How to perform a mass SQL insert to one table with rows from two seperate tables

I need some T-SQL help. We have an application which tracks Training Requirements assigned to each employee (such as CPR, First Aid, etc.). There are certain minimum Training Requirements which all employees must be assigned and my HR department wants me to give them the ability to assign those minimum Training Requirements to all personnel with the click of a button. So I have created a table called TrainingRequirementsForAllEmployees which has the TrainingRequirementID's of those identified minimum TrainingRequirements.
I want to insert rows into table Employee_X_TrainingRequirements for every employee in the Employees table joined with every row from TrainingRequirementsForAllEmployees.
I will add abbreviated table schema for clarity.
First table is Employees:
EmployeeNumber PK char(6)
EmployeeName varchar(50)
Second Table is TrainingRequirementsForAllEmployees:
TrainingRequirementID PK int
Third table (the one I need to Insert Into) is Employee_X_TrainingRequirements:
TrainingRequirementID PK int
EmployeeNumber PK char(6)
I don't know what the Stored Procedure should look like to achieve the results I need. Thanks for any help.
cross join operator is suitable when cartesian product of two sets of data is needed. So in the body of your stored procedure you should have something like:
insert into Employee_X_TrainingRequirements (TrainingRequirementID, EmployeeNumber)
select r.TrainingRequirementID, e.EmployeeNumber
from Employees e
cross join TrainingRequirementsForAllEmployees r
where not exists (
select 1 from Employee_X_TrainingRequirements
where TrainingRequirementID = r.TrainingRequirementID
and EmployeeNumber = e.EmployeeNumber
)