Aggregation Sum in SQL (Joins) - sql

I'm having some trouble with summing some column in my query with aggregation.
It's a bit difficult to describe what is happening but I'll try my best:
I have 3 tables - details, extra details and places.
Places is a table that contains places in the world. Details contains details about events that happened, and extra details provides some more data on the events. Each place has an ID and a ParentID (Like New York has an ID and it's parent ID is the US. Something like that). The ID of the event(details) appears a number of times as a column in the extra details table. The extra details table also holds the ID of the place that that event occurred at.
OK after all of that, what I'm trying to achieve is, for each place, the sum of the events that happened there. I know it sounds very specific, but it's what the client asked. Anyhow, example of what I'm trying to get to:
NewYork 60, Chicago 20, Houston 10 Then the US will have 90. And it has several levels.
So this is what I was trying to do:
With C(ID, NAME, COUNTT, ROOT_ID) as
(
SELECT d.ID, d.NAME,
(SELECT COUNT(LX.ID) as COUNTT
FROM EXTRA LX
RIGHT JOIN d ON LX.PLACE_ID = d.ID -- ****
GROUP BY d.ID, d.NAME),
d.ID as ROOT_ID
FROM PLACES d
UNION ALL
SELECT d.ID, d.NAME,
(SELECT COUNT(LX.ID) as COUNTT
FROM EXTRA LX
RIGHT JOIN d ON LX.PLACE_ID = d.ID
GROUP BY d.ID, d.NAME),
C.ROOT_ID
FROM PLACES dx
INNER JOIN C ON dx.PARENT_ID = C.ID
)
SELECT p.ID, p.NAME, S.SumIncludingChildren
FROM places p
INNER JOIN (
SELECT ROOT_ID, SUM(COUNTT) as SumIncludingChildren
FROM C
GROUP BY ROOT_ID
) S
ON p.ID = S.ROOT_ID
ORDER BY p.ID;
The details table is only for showing their data. I'll add that later. It's only comparing the respective columns. To making it work I don't need that. Only for the site data.
It doesn't work because it doesn't recognizes the 'd' where the '****' is. If I'll put a 'new instance' of that table, it won't work either. So I tried to replicate what the right join by doing 'NOT EXISTS IN' on a query that gets all the places instead of the right join...on. Same problem.
Maybe I don't get something. But I'm really seeking a solution and some explanation. I know my code isn't perfect.
Thanks in advance.
EDIT: I'm using OracleSQL on Toad 10.6

create table p(id number, up number, name varchar2(100));
create table e(id number, pid number, dsc varchar2(100));
insert into p values (1, null, 'country');
insert into p values (2, 1, 'center');
insert into p values (3, 1, 'province');
insert into p values (4, 2, 'capital');
insert into p values (5, 2, 'suburb');
insert into p values (6, 3, 'forest');
insert into p values (7, 3, 'village');
insert into p values (8, 7, 'shed');
insert into p values (9, 2, 'gov');
insert into e values (1, 8, 'moo');
insert into e values (2, 8, 'clank');
insert into e values (3, 7, 'sowing');
insert into e values (4, 6, 'shot');
insert into e values (5, 6, 'felling');
insert into e values (6, 5, 'train');
insert into e values (7, 5, 'cottage');
insert into e values (8, 5, 'rest');
insert into e values (9, 4, 'president');
insert into e values (10,1, 'GNP');
commit;
with
places as
(select id,
up,
connect_by_root id as root,
level lvl
from p
connect by prior id = up),
ev_stats as
(select root as place, max(lvl) as max_lvl, count(e.id) as ev_count
from places left outer join e
on places.id = e.pid
group by root)
select max_lvl, p.name, ev_count
from ev_stats inner join p on p.id = ev_stats.place
order by max_lvl desc;

Related

How many elements in one column are linked to an element other column?

Consider I have two tables
Courses Program
---------------------------
course_ID program_id
course_title program_name
program_ID
Now, I want to check no of courses(by course_id) offered by each program (program_id).
select c.program_id ,p.program_name, count(course_id)
from courses c
join Program p on c.Program_id =p.Program_id
group by program_id,program_name
If I understood you correctly, you're searching for a GROUP BY and a corresponding aggregate.
--Creating sample tables and data
SELECT course_ID, course_title, program_ID
INTO #courses
FROM (
VALUES (0, 'course_0', 0),
(1, 'course_1', 0),
(2, 'course_2', 0),
(3, 'course_3', 0),
(4, 'course_4', 1),
(5, 'course_5', 1),
(NULL, 'course_6', 1)
) AS C (course_ID, course_title, program_ID)
SELECT program_ID, program_title
INTO #programs
FROM (
VALUES (0, 'program_0'),
(1, 'program_1')
) AS P (program_ID, program_title)
and after that execute the query
SELECT P.program_title, COUNT(C.course_ID) AS courses_amount
FROM #courses C
INNER JOIN #programs P ON C.program_ID = P.program_ID
GROUP BY P.program_ID, P.program_title
So you basically GROUP BY the value to which you to aggregate to and COUNT the 'course_id'.
COUNT(C.course_ID) only counts actual values and will ignore NULLs.
If you want to count the NULLs as well, just use COUNT(*).
EDIT: Forgot the result...
So it'll look like this:
program_title
courses_amount
program_0
4
program_1
2

How to I find the person who has taught the most classes

I want to try and find the employee who has taught the most classes as the position Teacher. So in this I want to print out Nick, as he has taught the most classes as a Teacher.
However, I am getting the error:
ERROR: column "e.name" must appear in the GROUP BY clause or be used in an aggregate function Position: 24
CREATE TABLE employees (
id integer primary key,
name text
);
CREATE TABLE positions (
id integer primary key,
name text
);
CREATE TABLE teaches (
id integer primary key,
class text,
employee integer,
position integer,
foreign key (employee) references employees(id),
foreign key (position) references positions(id)
);
INSERT INTO employees (id, name) VALUES
(1, 'Clive'), (2, 'Johnny'), (3, 'Sam'), (4, 'Nick');
INSERT INTO positions (id, name) VALUES
(1, 'Assistant'), (2, 'Teacher'), (3, 'CEO'), (4, 'Manager');
INSERT INTO teaches (id, class, employee, position) VALUES
(1, 'Dancing', 1, 1), (2, 'Gardening', 1, 2),
(3, 'Dancing', 1, 2), (4, 'Baking', 4, 2),
(5, 'Gardening', 4, 2), (6, 'Gardening', 4, 2),
(7, 'Baseball', 4, 1), (8, 'Baseball', 2, 1),
(9, 'Baseball', 4, 2);
The SQL statement I am trying to use:
SELECT count(t.class), e.name
FROM positions p
JOIN teaches t
ON p.id = t.position
JOIN employees e
ON e.id = t.employee
WHERE p.name = 'Teacher'
GROUP BY t.employee;
I've been working on this on a sql fiddle:
http://www.sqlfiddle.com/#!17/a8e19c/3
Your query looks pretty good. You just need to fix the GROUP BY clause, so it is consistent with the columns in the SELECT clause. Then ORDER BY and LIMIT:
SELECT count(*) cnt_classes, e.name
FROM positions p
INNER JOIN teaches t ON p.id = t.position
INNER JOIN employees e ON e.id = t.employee
WHERE p.name = 'Teacher'
GROUP BY e.id --> primary key of "employees"
ORDER BY cnt_classes DESC --> order by descending count of classes
LIMIT 1 --> keep the first row only
In your select you are using aggregate COUNT that counts all lines in each group (GROUP BY t.employee) but you don't aggregate e.name.
So for Nick you basically select 4 rows each for one class that have two columns - class name and teacher name. Then you ask server to count class names in Nicks group (by his employee id), that aggregates 4 rows into one with value 4 but you don't do anything about teacher name so you are left with invalid structure where you have 1 row for classes count column and 4 rows for teacher name. Same for other teachers. And that's what server is complaining about. Easiest way to fix that is to add e.name to GROUP BY, that will squeeze those 4 rows of same value into one.
To get teacher that teaches most classes you then only need to sort results by class count descending order and limit result count to 1. That will give you result row with highest class count.
Updated fiddle: http://www.sqlfiddle.com/#!17/a8e19c/7
You're getting the error because you need to need to have every column you're selecting (e.name in this example in the GROUP BY clause, otherwise SQL doesn't know how to group and return a count for that column. You'll also want to use TOP(1) and order by if you want to return the person with the most.
SELECT TOP(1) count(*), e.name
FROM teaches t
INNER JOIN positions p ON t.position = p.id
INNER JOIN employees e ON e.id = t.employee
WHERE p.name = 'Teacher'
GROUP BY e.name
ORDER BY count(*) DESC;

HAVING clause with subquery -- Checking if group has at least one row matching conditions

Suppose I have the following table
DROP TABLE IF EXISTS #toy_example
CREATE TABLE #toy_example
(
Id int,
Pet varchar(10)
);
INSERT INTO #toy
VALUES (1, 'dog'),
(1, 'cat'),
(1, 'emu'),
(2, 'cat'),
(2, 'turtle'),
(2, 'lizard'),
(3, 'dog'),
(4, 'elephant'),
(5, 'cat'),
(5, 'emu')
and I want to fetch all Ids that have certain pets (for example either cat or emu, so Ids 1, 2 and 5).
DROP TABLE IF EXISTS #Pets
CREATE TABLE #Pets
(
Animal varchar(10)
);
INSERT INTO #Pets
VALUES ('cat'),
('emu')
SELECT Id
FROM #toy_example
GROUP BY Id
HAVING COUNT(
CASE
WHEN Pet IN (SELECT Animal FROM #Pets)
THEN 1
END
) > 0
The above gives me the error Cannot perform an aggregate function on an expression containing an aggregate or a subquery. I have two questions:
Why is this an error? If I instead hard code the subquery in the HAVING clause, i.e. WHEN Pet IN ('cat','emu') then this works. Is there a reason why SQL server (I've checked with SQL server 2017 and 2008) does not allow this?
What would be a nice way to do this? Note that the above is just a toy example. The real problem has many possible "Pets", which I do not want to hard code. It would be nice if the suggested method could check for multiple other similar conditions too in a single query.
If I followed you correctly, you can just join and aggregate:
select t.id, count(*) nb_of_matches
from #toy_example t
inner join #pets p on p.animal = t.pet
group by t.id
The inner join eliminates records from #toy_example that have no match in #pets. Then, we aggregate by id and count how many recors remain in each group.
If you want to retain records that have no match in #pets and display them with a count of 0, then you can left join instead:
select t.id, count(*) nb_of_records, count(p.animal) nb_of_matches
from #toy_example t
left join #pets p on p.animal = t.pet
group by t.id
How about this approach?
SELECT e.Id
FROM #toy_example e JOIN
#pets p
ON e.pet = p.animal
GROUP BY e.Id
HAVING COUNT(DISTINCT e.pet) = (SELECT COUNT(*) FROM #pets);

SQL Query to satisfy 2 conditions

I'm new to SQL and after designing the database, i'm having trouble with some queries. The query i'm currently struggling with states:
"A list of the customers who have ordered at least one project with a higher than average expected duration."
SELECT Customer.name
FROM Project, Customer
WHERE Project.c_id = Customer.c_id AND Project.exp_duration > AVG(Project.exp_duration)
I tried to implement this code but i keep gettin the following error message : "An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference."
Can someone help me with this? I've thought about using joins but i can't get it to work either.
Thanks in advance!
Replace the table variables (#Project & #Customer) with your real tables (Project & Customer).
DECLARE #Project TABLE
(
p_id INT,
exp_duration DECIMAL(18,2),
c_id INT
)
DECLARE #Customer TABLE
(
c_id INT,
name VARCHAR(20)
)
INSERT #Project VALUES (1, 10, 1), (2, 5, 1), (3, 20, 1), (4, 10, 2), (5, 15, 2), (6, 20, 1)
INSERT #Customer VALUES (1, 'C1'), (2, 'C2')
-- average duration
-- SELECT AVG(exp_duration) FROM #Project
SELECT DISTINCT C.name
FROM #Customer C INNER JOIN #Project P ON C.c_id = P.c_id
WHERE p.exp_duration > (SELECT AVG(exp_duration) FROM #Project)
The following query gives the list of Customers who have ordered at least one project (i.e. being a part of one or more projects) and whose ExpectedDuration is greater than the Average ExpectedDuration.
I have used left outer join, group by, count and avg aggregate functions.
Select
C.CustomerID,
C.Name
From SampleCustomer C
Left Join SampleProject P
On C.CustomerID = P.CustomerID
Where P.ExpectedDuration > (Select Avg(ExpectedDuration) From SampleProject Where CustomerID = C.CustomerID)
Group By C.CustomerID, C.Name
Having Count(P.ProjectID) >= 1
Order By C.CustomerID;

how to do get multiple columns + count in a single query?

I usually don't ask for "scripts" but for mechanisms but I think that in this case if i'll see an example I would understand the principal.
I have three tables as shown below:
and I want to get the columns from all three, plus a count of the number of episodes in each series and to get a result like this:
Currently, I am opening multiple DB threads and I am afraid that as I get more visitors on my site it will eventually respond really slowly.
Any ideas?
Thanks a lot!
First join all the tables together to get the columns. Then, to get a count, use a window function:
SELECT count(*) over (partition by seriesID) as NumEpisodesInSeries,
st.SeriesId, st.SeriesName, et.episodeID, et.episodeName,
ct.createdID, ct.CreatorName
FROM series_table st join
episode_table et
ON et.ofSeries = st.seriesID join
creator_table ct
ON ct.creatorID = st.byCreator;
Do your appropriate joins between the tables and their IDs as you would expect, and also join onto the result of a subquery that determines the total episode count using the Episodes table.
SELECT SeriesCount.NumEpisodes AS #OfEpisodesInSeries,
S.id AS SeriesId,
S.name AS SeriesName,
E.id AS EpisodeId,
E.name AS EpisodeName,
C.id AS CreatorId,
C.name AS CreatorName
FROM
Series S
INNER JOIN
Episodes E
ON E.seriesId = S.id
INNER JOIN
Creators C
ON S.creatorId = C.id
INNER JOIN
(
SELECT seriesId, COUNT(id) AS NumEpisodes
FROM Episodes
GROUP BY seriesId
) SeriesCount
ON SeriesCount.seriesId = S.id
SQL Fiddle Schema:
CREATE TABLE Series (id int, name varchar(20), creatorId int)
INSERT INTO Series VALUES(1, 'Friends', 1)
INSERT INTO Series VALUES(2, 'Family Guy', 2)
INSERT INTO Series VALUES(3, 'The Tonight Show', 1)
CREATE TABLE Episodes (id int, name varchar(20), seriesId int)
INSERT INTO Episodes VALUES(1, 'Joey', 1)
INSERT INTO Episodes VALUES(2, 'Ross', 1)
INSERT INTO Episodes VALUES(3, 'Phoebe', 1)
INSERT INTO Episodes VALUES(4, 'Stewie', 2)
INSERT INTO Episodes VALUES(5, 'Kevin Kostner', 3)
INSERT INTO Episodes VALUES(6, 'Brad Pitt', 3)
INSERT INTO Episodes VALUES(7, 'Tom Hanks', 3)
INSERT INTO Episodes VALUES(8, 'Morgan Freeman', 3)
CREATE TABLE Creators (id int, name varchar(20))
INSERT INTO Creators VALUES(1, 'Some Guy')
INSERT INTO Creators VALUES(2, 'Seth McFarlane')
Try this:
http://www.sqlfiddle.com/#!3/5f938/17
select min(ec.num) as NumEpisodes,s.Id,S.Name,
Ep.ID as EpisodeID,Ep.name as EpisodeName,
C.ID as CreatorID,C.Name as CreatorName
from Episodes ep
join Series s on s.Id=ep.SeriesID
join Creators c on c.Id=s.CreatorID
join (select seriesId,count(*) as Num from Episodes
group by seriesId) ec on s.id=ec.seriesID
group by s.Id,S.Name,Ep.ID,Ep.name,C.ID,C.Name
Thanks Gordon
I would do the following:
SELECT (SELECT Count(*)
FROM episodetbl e1
WHERE e1.ofseries = s.seriesid) AS "#ofEpisodesInSeries",
s.seriesid,
s.seriesname,
e.episodeid,
e.episodename,
c.creatorid,
c.creatorname
FROM seriestbl s
INNER JOIN creatortbl c
ON s.bycreator = c.creatorid
INNER JOIN episodetbl e
ON e.ofseries = s.seriesid