Using group by and having clause - sql

Using the following schema:
Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
Get supplier numbers and names for suppliers of the same part to at least two different projects.
These were my answers:
1.
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
2.
SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2
Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works

The semantics of Having
To better understand having, you need to see it from a theoretical point of view.
A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.
The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.
Lets say your query is:
select a, b, count(*)
from Table
where c > 100
group by a, b
having count(*) > 10;
The evaluation of this query can be seen as the following steps:
Perform the WHERE, eliminating rows that do not satisfy it.
Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
Eliminate subsets that do not satisfy the HAVING condition
Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).
In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:
SELECT list
FROM table
GROUP BY attrList
HAVING condition;
can be rewritten as:
SELECT list from (
SELECT listatt
FROM table
GROUP BY attrList) as Name
WHERE condition;
The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:
select a, b, count
from (select a, b, count(*) as count
from Table
where c > 100
group by a, b) as someName
where count > 10;
The solution you need
Your solution seems to be correct:
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.
Best solution to your problem
I personally prefer a simpler cleaner solution:
You need to only group by Supplies (sid, pid, jid**, quantity) to
find the sid of those that supply at least to two projects.
Then join it to the Suppliers table to get the supplier same.
SELECT sid, sname from
(SELECT sid from supplies
GROUP BY sid
HAVING count(DISTINCT jid) >= 2
) AS T1
NATURAL JOIN
Supliers;
It will also be faster to execute, because the join is only done when needed, not all the times.
--dmg

Because we can not use Where clause with aggregate functions like count(),min(), sum() etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link
http://www.sqlfundamental.com/having-clause.php

First of all, you should use the JOIN syntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.
Altought I haven't tested, your first query seems fine to me, but could be re-written as:
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
Or simplified as:
SELECT sid, name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
) > 1
However, your second query seems wrong to me, because you should also GROUP BY pid.
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid, su.pid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
As you may have noticed in the query above, I used the INNER JOIN syntax to perform the filtering, however it can be also written as:
SELECT s.sid, s.name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
GROUP BY su.sid, su.pid
) > 1

What type of sql database are using (MSSQL, Oracle etc)?
I believe what you have written is correct.
You could also write the first query like this:
SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
FROM Supplies su, Projects pr
WHERE su.sid = s.sid
AND pr.jid = su.jid) >= 2
It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.

1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
SELECT S.SID, S.NAME
FROM SUPPLIES SP
JOIN SUPPLIER S
ON SP.SID = S.SID
WHERE PID IN
(SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
I am not slear about your second question

Related

Join with count

I need to write SQL query like:
Show all countries with more than 1000 users, sorted by user count.
The country with the most users should be at the top.
I have tables:
● Table users (id, email, citizenship_country_id)
● Table countries (id, name, iso)
Users with columns: id, email, citizenship_country_id
Countries with columns: id, name, iso
SELECT countries.name,
Count(users.citiizenship_country_id) AS W1
FROM countries
LEFT JOIN users ON countries.id = users.citizenship_country_id
GROUP BY users.citiizenship_country_id, countries.name
HAVING ((([users].[citiizenship_country_id])>2));
But this does not work - I get an empty result set.
Could you please tell me what I'm doing wrong?
A LEFT JOIN is superfluous for this purpose. To have 1000 users, you need at least one match:
SELECT c.name, Count(*) AS W1
FROM countries c JOIN
users u
ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING COUNT(*) > 1000;
Notice that table aliases also make the query easier to write and to read.
Group by country name and use HAVING Count(u.citiizenship_country_id)>1000, it filters rows after aggregation:
SELECT c.name,
Count(u.citiizenship_country_id) AS W1
FROM countries c
INNER JOIN users u ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING Count(u.citiizenship_country_id)>1000
ORDER BY W1 desc --Order top counts first
;
As #GordonLinoff pointed, you can use INNER JOIN instead of LEFT JOIN, because anyway this query does not return counries without users and INNER JOIN performs better because no need to pass not joined records to the aggregation.

Subtracting values of columns from two different tables

I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;

Use result of multiple rows to do arithmetic operation

I'm writing a query to multiply the count that I receive from subquery to fees amount, But I don't know how to do that. Any help/suggestion?
Oracle query is:
select courseid,coursename,fees*tmp
from course c join registration r on
r.courseid=c.courseid
and tmp IN (select count(*)
from course c join registration r on
r.courseid=c.courseid group by coursename);
I tried to use like a variable tmp ,But i don't think it works in oracle query. Is there an alternative way to do so?
You can't do that, because you can only select data from tables that appeared between FROM and WHERE. The IN operator is a quick way to save having to write a bunch of OR statements, it is not something that can establish a variable in the outer query.
Instead do something like:
select courseid,coursename,fees * COUNT(r.courseID) OVER(PARTITION BY c.coursename)
from course c join registration r on
r.courseid=c.courseid
Edit/update: you noted that this query produces too many rows and you only want to see distinct course names. In that case it would be better to just use the registrations table to count the number of people on the course and then multiply the fees:
SELECT
c.courseid, c.coursename, c.fees * COALESCE(r.numberOfstudents, 0) as courseWorth
FROM
course c
LEFT OUTER JOIN
(select courseid, COUNT(*) as numberofstudents FROM registration GROUP BY courseid) r
ON c.courseID = r.courseid
You can use a windowing function like Caius or you can use a join like this:
select courseid,coursename, fees * COALESCE(sub.cnt,0)
from course c
join registration r on r.courseid=c.courseid
left join (
select coursename, count(*) as cnt
from course c2
join registration r2 on r2.courseid=c2.courseid
group by coursename
) as sub;
note: I make no claim your joins are correct -- I'm basing this query off of your example not on any knowledge of your data model.

How to join two SQL queries into one?

I'm new to SQL and I'm currently trying to learn how to make reports in Visual Studio. I need to make a table, graph and few other things. I decided to do matrix as the last part and now I'm stuck. I write my queries in SQL Server.
I have two tables: Staff (empID, StaffLevel, Surname) and WorkOfArt (artID, name, curator, helpingCurator). In the columns Curator and HelpingCurator I used numbers from empID.
I'd like my matrix to show every empID and the number of paintings where they're acting as a Curator and the number of paintings where they're acting as a Helping Curator (so I want three columns: empID, count(curator), count(helpingCurator).
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.StaffLevel<7
group by Staff.empID;
Select Staff.empID, count(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
I created those two queries and they work perfectly fine, but I need it in one query.
I tried:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff FULL OUTER JOIN WorkOfArt on Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
WHERE Staff.StaffLevel<7
group by Staff.empID;
(as well as using left or right outer join)
- this one gives me a table with empID, but in both count columns there are only 0s - and:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
And this one gives me just the names of the columns.
I have no idea what to do next. I tried to find the answer in google, but all explanations I found were far more advanced for me, so I couldn't understand them... Could you please help me? Hints are fine as well.
The easiest way to do this is most likely with inner select in the select clause, with something like this:
Select
S.empID,
(select count(*) from WorkOfArt C where C.Curator = S.empID)
as CuratorTotal,
(select count(*) from WorkOfArt H where H.HelpingCurator = S.empID)
as HelpingCuratorTotal
FROM Staff S
WHERE S.StaffLevel<7
group by S.empID;
This way the rows with different role aren't causing problems with the calculation. If the tables are really large or you have a lot of different roles, then most likely more complex query with grouping the items first in the WorkOfArt table might have better performance since this requires reading the rows twice.
From a performance perspective, the following query is probably a little more efficient
select e.EmpId, CuratorForCount, HelpingCuratorForCount
from Staff s
inner join ( select Curator, count(*) as CuratorForCount
from WorkOfArt
group by Curator) mainCurator on s.EmpId = mainCurator.Curator
inner join ( select HelpingCurator, count(*) as HelpingCuratorForCount
from WorkOfArt
group by HelpingCurator) secondaryCurator on s.EmpId = secondaryCurator.HelpingCurator
One method, that can be useful if you want to get more than one value aggregated value from the WorkOfArt table is to pre-aggregate the results:
Select s.empID, COALESCE(woac.cnt, 0) as CuratorTotal,
COALESCE(woahc.cnt) as HelpingCuratorTotal
FROM Staff s LEFT JOIN
(SELECT woa.Curator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.Curator
) woac
ON s.empID = woac.Curator LEFT JOIN
(SELECT woa.HelpingCurator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.HelpingCurator
) woahc
ON s.empID = woahc.HelpingCurator
WHERE s.StaffLevel < 7;
Notice that the aggregation on the outer level is not needed.

PostgreSQL - Query with aggregate functions

I need some help for a PostgreSQL query.
I have 4 tables involved on it: customer, organization_complete, entity and address. I retrieve some data from everyone and with this query:
SELECT distinct ON (c.customer_number, trim(lower(o.name)), a.street, a.zipcode, a.area, a.country)
c.xid AS customer_xid, o.xid AS entity_xid, c.customer_number, c.deleted, o.name, o.vat, 'organisation' AS customer_type, a.street, a.zipcode, a.city, a.country
FROM customer c
INNER JOIN organisation_complete o ON (c.xid = o.customer_xid AND c.deleted = 'FALSE')
INNER JOIN entity e ON e.customer_xid = o.customer_xid
INNER JOIN address a ON (a.contact_info_xid = e.contact_info_xid and a.address_type = 'delivery')
WHERE c.account_xid = "<value>"
I get a distinct of all the customers splitted by customer_number, name, street, zipcode, area and country (what's specified after the DISTINCT ON statement).
What I need to retrieve now is a distinct of all customers having a doubled row on DB but I also need to retrieve the customer_xid and the entity_xid, that are primary keys of the respective tables and so are unique. For this reason they can't be included into an aggregate function. All I need is to count how many rows with the same customer_number, name, street, zipcode, area and country I have for each distinct tuple and to select only tuples with a count bigger than 1.
For each selected tuple I need also to take a customer_xid and an entity_xid, at random, like MySQL would do with a_key in a query like this:
SELECT COUNT(*), tab.a_key, tab.b, tab.c from tab
WHERE 1
GROUP BY tab.b
I know MySQL is quite an exception regarding this, I just want to know if may be possible to obtain the same result on PostgreSQL.
Thanks,
L.
This query in MySql is using a nonstandard (see note below) "MySql group by extension": http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
SELECT COUNT(*), tab.a_key, tab.b, tab.c
from tab
WHERE 1
GROUP BY tab.b
Note: This is a feature definied in SQL:2003 Standard as T301 Functional dependencies, it is not required by the standard, and many RDBMS don't support it, including PostgreSql (see this link for version 9.3 - unsupported features: http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html ).
The above query could be expressed in PostgreSQL in this way:
SELECT tab.a_key, tab.b, tab.c,
q.cnt
FROM (
SELECT tab.b,
COUNT(*) As cnt,
MIN(tab.unique_id) As unique_id /* could be also MAX */
from tab
WHERE 1
GROUP BY tab.b
) q
JOIN tab ON tab.unique_id = q.unique_id
where unique_id is a column that uniquely identifies each row in tab (usually a primary key).
Min or Max functions choose one row from the table in a pseudo-random manner.