I'm trying to get the amount of workers on a project in sql, but the group by part of the query has some error. I have a separete table (workersonprojects) which matches the project id-s with the workers working on it. The query should select everything + the amount of workers on 1 project.
SELECT projectid, name, developerleader, consultantleader, projectleader, budget, count(workerid) AS workers
FROM projects
JOIN workersonprojects on (projectid=project)
JOIN workers on (worker=workerid)
GROUP BY projectid;
That's cause you are not including all the columns present in select list to your group by clause as can be seen from your query GROUP BY projectid;. You should change your query like
SELECT p.projectid,
name,
developerleader,
consultantleader,
projectleader,
budget,
xxx.workers
FROM projects p
JOIN workersonprojects wp on p.projectid= wp.project
JOIN workers w on p.worker= w.workerid
JOIN (select projectid, count(workerid) AS workers
FROM projects
GROUP BY projectid ) xxx ON p.projectid = xxx.projectid;
You are using an aggregate function: COUNT(). You have to group by all other columns than COUNT().
According to other answers:
SELECT projectid, name, developerleader, consultantleader, projectleader, budget, count(workerid) AS workers
FROM projects
JOIN workersonprojects on (projectid=project)
JOIN workers on (worker=workerid)
GROUP BY projectid, name, developerleader, consultantleader, projectleader, budget
In group by query you have to define 'projectid' of which table you want.
SELECT workersonprojects.projectid, projects.name,
projects.developerleader, projects.consultantleader, projects.projectleader,
projects.budget,
count(workersonprojects.workerid) AS workers
FROM projects
JOIN workersonprojects on
(workersonprojects.projectid=projects.projectid)
JOIN workers on (workers.worker=workersonprojects.workerid)
GROUP BY workersonprojects.projectid;
Basically for Group BY clause you need to include all the columns in Group by clause which are not used within any Aggregate Function. Here in this case you have SELECTED projectid,name,developerleader,consultantleader,projectleader,budget columns but we have only one column in COUNT() function. So include these columns in SELECT Clause and boom you have a working query.
SELECT projectid,
name,
developerleader,
consultantleader,
projectleader,
budget,
COUNT(workerid) AS workers
FROM projects
JOIN workersonprojects
ON (projectid=project)
JOIN workers
ON (worker=workerid)
GROUP BY PROJECTID,
name,
developerleader,
consultantleader,
projectleader,
budget;
You have to group by with all the given select statement attributes, so give all the attributes you are selecting in the select statement.
Related
In sql server I'm trying to group by each sales people some infos as follow:
I have 2 tables: Positions and Clients
In Positions table, I have the following columns: Client_Id, Balance, Acquisition_Cost and in the Clients table I use the following columns: Client_Id and Sales_person.
I want to group by Sales_person (Clients table) the Client_id, Balance, Acquisition_Cost (Positions table)
I tried this:
SELECT Positions.Client_ID, Positions.Balance, Positions.Acquisition_cost
FROM Positions
INNER JOIN Clients ON Positions.Client_ID = Clients.Client_ID
GROUP BY Sales_person
It gives me "Positions.Client_ID is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause".
I precise I'm pretty new on SQL so that does not ring that much a bell to me.
You need to add Positions.Client_ID in the GROUP BY Clause, since the select expects this column if you really do not want Positions.Client_ID in the select, then no need to add in the GROUP BY Clause
For any column in your SELECT that you aren't including in the GROUP BY, you need to use some kind of aggregate function (MAX, SUM, etc.) on the column. So you could write it like this:
SELECT Positions.Client_ID
, Clients.Sales_person
, SUM(Positions.Balance) Balance_Sum,
, SUM(Positions.Acquisition_cost) Acquisition_Cost_Sum
FROM Positions
INNER JOIN Clients ON Positions.Client_ID = Clients.Client_ID
GROUP BY Positions.Client_ID
, Clients.Sales_person
If you only want the totals by Sales_person and not client_ID, you could just remove client_id from the SELECT AND GROUP BY:
SELECT Clients.Sales_person
, SUM(Positions.Balance) Balance_Sum,
, SUM(Positions.Acquisition_cost) Acquisition_Cost_Sum
FROM Positions
INNER JOIN Clients ON Positions.Client_ID = Clients.Client_ID
GROUP BY Clients.Sales_person
SELECT a.Client_ID, b.Sales_person, SUM(a.Balance) as Balance_Sum,SUM(a.Acquisition_cost) as Acquisition_Cost_Sum FROM Positions as a INNER JOIN Clients as b ON a.Client_ID = b.Client_ID GROUP BY a.Client_ID, b.Sales_person
Thanks all of you guys, here's the code that works for me in this case:
SELECT Positions.Client_Id, Clients.Sales_person, SUM(Positions.Balance) as sum_balance_impacted, SUM(Positions.Acquisition_cost) as sum_acquisition_cost
FROM Positions
INNER JOIN Clients ON Positions.Client_Id= Clients.Client_Id
GROUP BY Clients.Sales_person, Positions.Client_Id```
The result are many repeat rows (all conlums repeat)
how use group by without inner join?
I ned use a subquery inn this case.
This is fictional example, don't worry the logic or sense this example. I need use group by in subquery from many tables.
I can use group by with inner join, but this case I can't use inner join.
select
NAME,
AGE,
JOB
from (
select
pe.name NAME,
pe.age AGE,
jb.work JOB
from
pearson pe,
job jb
)
group by NAME, AGE, JOB
Yes you can use a group by in this case, but you will need to create an alias name of the inner query as shown below:
SELECT A.NAME, A.AGE, A.JOB
FROM (
SELECT pe.name NAME, pe.age AGE, jb.`function` JOB
FROM pearson pe, job jb
) A
GROUP BY A.NAME, A.AGE, A.JOB;
I'm new to SQL and I'm currently trying to learn how to make reports in Visual Studio. I need to make a table, graph and few other things. I decided to do matrix as the last part and now I'm stuck. I write my queries in SQL Server.
I have two tables: Staff (empID, StaffLevel, Surname) and WorkOfArt (artID, name, curator, helpingCurator). In the columns Curator and HelpingCurator I used numbers from empID.
I'd like my matrix to show every empID and the number of paintings where they're acting as a Curator and the number of paintings where they're acting as a Helping Curator (so I want three columns: empID, count(curator), count(helpingCurator).
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.StaffLevel<7
group by Staff.empID;
Select Staff.empID, count(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
I created those two queries and they work perfectly fine, but I need it in one query.
I tried:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff FULL OUTER JOIN WorkOfArt on Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
WHERE Staff.StaffLevel<7
group by Staff.empID;
(as well as using left or right outer join)
- this one gives me a table with empID, but in both count columns there are only 0s - and:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
And this one gives me just the names of the columns.
I have no idea what to do next. I tried to find the answer in google, but all explanations I found were far more advanced for me, so I couldn't understand them... Could you please help me? Hints are fine as well.
The easiest way to do this is most likely with inner select in the select clause, with something like this:
Select
S.empID,
(select count(*) from WorkOfArt C where C.Curator = S.empID)
as CuratorTotal,
(select count(*) from WorkOfArt H where H.HelpingCurator = S.empID)
as HelpingCuratorTotal
FROM Staff S
WHERE S.StaffLevel<7
group by S.empID;
This way the rows with different role aren't causing problems with the calculation. If the tables are really large or you have a lot of different roles, then most likely more complex query with grouping the items first in the WorkOfArt table might have better performance since this requires reading the rows twice.
From a performance perspective, the following query is probably a little more efficient
select e.EmpId, CuratorForCount, HelpingCuratorForCount
from Staff s
inner join ( select Curator, count(*) as CuratorForCount
from WorkOfArt
group by Curator) mainCurator on s.EmpId = mainCurator.Curator
inner join ( select HelpingCurator, count(*) as HelpingCuratorForCount
from WorkOfArt
group by HelpingCurator) secondaryCurator on s.EmpId = secondaryCurator.HelpingCurator
One method, that can be useful if you want to get more than one value aggregated value from the WorkOfArt table is to pre-aggregate the results:
Select s.empID, COALESCE(woac.cnt, 0) as CuratorTotal,
COALESCE(woahc.cnt) as HelpingCuratorTotal
FROM Staff s LEFT JOIN
(SELECT woa.Curator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.Curator
) woac
ON s.empID = woac.Curator LEFT JOIN
(SELECT woa.HelpingCurator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.HelpingCurator
) woahc
ON s.empID = woahc.HelpingCurator
WHERE s.StaffLevel < 7;
Notice that the aggregation on the outer level is not needed.
I have two tables:
Teams (Name, Team ID, Max Size)
Members (Name, Team ID)
I need to figure out out how many slots are available on each team. The closest I got was counting occurrences in Name and grouping by Team ID in the Members table, but after that I have no idea how to subtract Max Size by the count_of_Name. I understand this is a rudimentary question, but I assure you I have been working and researching this question for well over an hour. Thank you in advance.
No subqueries needed:
select t.[Team ID], t.Name, t.MaxSize - COUNT(m.*) as SpotsLeft
from Teams t
left join Members m on m.[Team ID] = t.[Team ID]
group by t.[Team ID], t.Name, t.MaxSize
You could achieve this with a subquery :
select (select max(count(*))
from teams
join members using(teamId)
group by teamId) - count(members.*) as available_slots
from teams
join members using(teamId);
Note that it would be a lot easier if the maximum possible member per teams would be fixed as you could directly substract instead of using a subquery.
SELECT t.MaxSize - (SELECT COUNT(*)
FROM Members m
WHERE m.team_id = t.team_id)
FROM Teams t
Or, to avoid (multiple) sub-queries
SELECT t.MaxSize, q.Team_Count
FROM Teams t
LEFT JOIN (SELECT m.Team_ID, COUNT(*) as Team_Count
FROM Members m
GROUP BY m.Team_ID) as q
ON q.Team_ID = t.Team_ID
Using the following schema:
Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
Get supplier numbers and names for suppliers of the same part to at least two different projects.
These were my answers:
1.
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
2.
SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2
Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works
The semantics of Having
To better understand having, you need to see it from a theoretical point of view.
A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.
The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.
Lets say your query is:
select a, b, count(*)
from Table
where c > 100
group by a, b
having count(*) > 10;
The evaluation of this query can be seen as the following steps:
Perform the WHERE, eliminating rows that do not satisfy it.
Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
Eliminate subsets that do not satisfy the HAVING condition
Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).
In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:
SELECT list
FROM table
GROUP BY attrList
HAVING condition;
can be rewritten as:
SELECT list from (
SELECT listatt
FROM table
GROUP BY attrList) as Name
WHERE condition;
The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:
select a, b, count
from (select a, b, count(*) as count
from Table
where c > 100
group by a, b) as someName
where count > 10;
The solution you need
Your solution seems to be correct:
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.
Best solution to your problem
I personally prefer a simpler cleaner solution:
You need to only group by Supplies (sid, pid, jid**, quantity) to
find the sid of those that supply at least to two projects.
Then join it to the Suppliers table to get the supplier same.
SELECT sid, sname from
(SELECT sid from supplies
GROUP BY sid
HAVING count(DISTINCT jid) >= 2
) AS T1
NATURAL JOIN
Supliers;
It will also be faster to execute, because the join is only done when needed, not all the times.
--dmg
Because we can not use Where clause with aggregate functions like count(),min(), sum() etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link
http://www.sqlfundamental.com/having-clause.php
First of all, you should use the JOIN syntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.
Altought I haven't tested, your first query seems fine to me, but could be re-written as:
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
Or simplified as:
SELECT sid, name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
) > 1
However, your second query seems wrong to me, because you should also GROUP BY pid.
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid, su.pid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
As you may have noticed in the query above, I used the INNER JOIN syntax to perform the filtering, however it can be also written as:
SELECT s.sid, s.name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
GROUP BY su.sid, su.pid
) > 1
What type of sql database are using (MSSQL, Oracle etc)?
I believe what you have written is correct.
You could also write the first query like this:
SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
FROM Supplies su, Projects pr
WHERE su.sid = s.sid
AND pr.jid = su.jid) >= 2
It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.
1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
SELECT S.SID, S.NAME
FROM SUPPLIES SP
JOIN SUPPLIER S
ON SP.SID = S.SID
WHERE PID IN
(SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
I am not slear about your second question