I am working on the website and I need to execute a pretty complex select query. I managed to write it, but It looks too long for me and I want to make it shorter. Here is my query:
select friend_id, name, surname, mail, birth_date, address
from (select friend_id, name, surname, mail, birth_date, address, count(friend_id) as rents
from (select friend.friend_id, name, surname, mail, birth_date, address
from friend
left join profile p on friend.profile_id = p.profile_id
left join friend_group_record fgr on friend.friend_id = fgr.friend_id
left join meeting m on fgr.friend_group_id = m.friend_group_id
where m.date between '2019-09-01' and '2020-01-01') as filtered_friend_profiles
group by friend_id, name, surname, mail, birth_date, address) as counted_friend_profiles
where counted_friend_profiles.rents >= 25;
Does anyone have idea how to make it shorter?
Shorter? (And correct?!)
SELECT f.friend_id, name, surname, mail, birth_date, address -- ⑤
FROM friend f
JOIN profile p USING (profile_id) -- ①
JOIN friend_group_record fgr ON f.friend_id = fgr.friend_id -- also USING?
JOIN meeting m ON fgr.friend_group_id = m.friend_group_id -- also USING?
WHERE m.date >= '2019-09-01' -- ②
AND m.date < '2020-01-01'
GROUP BY 1, 2, 3, 4, 5, 6 -- ③
HAVING count(*) >= 25; -- ④
① Changed all three instances of LEFT JOIN to [INNER] JOIN, since the condition on the last table in the food chain (meeting) forces all joins to behave like plain joins anyway. (As #wildplasser pointed out.) See:
PostgreSQL: Using AND statement in LEFT JOIN is not working as expected
SQL / PostgreSQL left join ignores "on = constant" predicate, on left table
Also, if column names are identical and distinct among all tables left of the join, USING is convenient short syntax. (Returning only one of each pair of joined columns, which has no bearing on the query at hand.)
Not knowing exact table definitions, I only applied it in the first join, where ambiguities are impossible as there is only a single table left of the join. For persisted queries, generally only advisable if involved column names are stable and distinct over all joined tables.
② Typically, you'd want to exclude the upper bound and BETWEEN is the wrong tool. Related:
How to add a day/night indicator to a timestamp column?
POSTGRES select n equally distributed rows by time over millions of records
How do I write a function in plpgsql that compares a date with a timestamp without time zone?
③ About this short syntax:
Select first row in each GROUP BY group?
Possibly shorter if there are functional dependencies with PK columns. See:
Why can't I exclude dependent columns from `GROUP BY` when I aggregate by a key?
④ count(*) is shorter & faster (and equivalent in this case). Also, can be used in HAVING clause without listing in SELECT list. See:
Postgres: count(*) vs count(id)
⑤ All source column names should be table-qualified, if only for documentation. Also avoids breakage and confusion from later changes to underlying tables.
How about just doing this?
select friend.friend_id, name, surname, mail, birth_date, address
from friend
left join profile p on friend.profile_id = p.profile_id
left join friend_group_record fgr on friend.friend_id = fgr.friend_id
left join meeting m on fgr.friend_group_id = m.friend_group_id
where m.date between '2019-09-01' and '2020-01-01'
group by friend.friend_id, name, surname, mail, birth_date, address
having count(friend.friend_id) >= 25;
See the addition of having clause.
Related
I would like to take values from one table column and subtract those values from another column from another table.
I was able to achieve this by joining those tables and then subtracting both columns from each other.
Data from first table:
SELECT max_participants FROM courses ORDER BY id;
Data from second table:
SELECT COUNT(id) FROM participations GROUP BY course_id ORDER BY course_id;
Here is some code:
SELECT max_participants - participations AS free_places FROM
(
SELECT max_participants, COUNT(participations.id) AS participations
FROM courses
INNER JOIN participations ON participations.course_id = courses.id
GROUP BY courses.max_participants, participations.course_id
ORDER BY participations.course_id
) AS course_places;
In general, it works, but I was wondering, if there is some way to make it simplier or maybe my approach isn't correct and this code will not work in some conditions? Maybe it needs to be optimized.
I've read some information about not to rely on natural order of result set in databases and that information made my doubts to appear.
If you want the values per course, I would recommend:
SELECT c.id, (c.max_participants - COUNT(p.id)) AS free_places
FROM courses c LEFT JOIN
participations p
ON p.course_id = c.id
GROUP BY c.id, c.max_participants
ORDER BY 1;
Note the LEFT JOIN to be sure all courses are included, even those with no participants.
The overall number is a little tricker. One method is to use the above as a subquery. Alternatively, you can pre-aggregate each table:
select c.max_participants - p.num_participants
from (select sum(max_participants) as max_participants from courses) c cross join
(select count(*) as num_participants from participants from participations) p;
I'm new to SQL and I'm currently trying to learn how to make reports in Visual Studio. I need to make a table, graph and few other things. I decided to do matrix as the last part and now I'm stuck. I write my queries in SQL Server.
I have two tables: Staff (empID, StaffLevel, Surname) and WorkOfArt (artID, name, curator, helpingCurator). In the columns Curator and HelpingCurator I used numbers from empID.
I'd like my matrix to show every empID and the number of paintings where they're acting as a Curator and the number of paintings where they're acting as a Helping Curator (so I want three columns: empID, count(curator), count(helpingCurator).
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.StaffLevel<7
group by Staff.empID;
Select Staff.empID, count(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
I created those two queries and they work perfectly fine, but I need it in one query.
I tried:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff FULL OUTER JOIN WorkOfArt on Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
WHERE Staff.StaffLevel<7
group by Staff.empID;
(as well as using left or right outer join)
- this one gives me a table with empID, but in both count columns there are only 0s - and:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
And this one gives me just the names of the columns.
I have no idea what to do next. I tried to find the answer in google, but all explanations I found were far more advanced for me, so I couldn't understand them... Could you please help me? Hints are fine as well.
The easiest way to do this is most likely with inner select in the select clause, with something like this:
Select
S.empID,
(select count(*) from WorkOfArt C where C.Curator = S.empID)
as CuratorTotal,
(select count(*) from WorkOfArt H where H.HelpingCurator = S.empID)
as HelpingCuratorTotal
FROM Staff S
WHERE S.StaffLevel<7
group by S.empID;
This way the rows with different role aren't causing problems with the calculation. If the tables are really large or you have a lot of different roles, then most likely more complex query with grouping the items first in the WorkOfArt table might have better performance since this requires reading the rows twice.
From a performance perspective, the following query is probably a little more efficient
select e.EmpId, CuratorForCount, HelpingCuratorForCount
from Staff s
inner join ( select Curator, count(*) as CuratorForCount
from WorkOfArt
group by Curator) mainCurator on s.EmpId = mainCurator.Curator
inner join ( select HelpingCurator, count(*) as HelpingCuratorForCount
from WorkOfArt
group by HelpingCurator) secondaryCurator on s.EmpId = secondaryCurator.HelpingCurator
One method, that can be useful if you want to get more than one value aggregated value from the WorkOfArt table is to pre-aggregate the results:
Select s.empID, COALESCE(woac.cnt, 0) as CuratorTotal,
COALESCE(woahc.cnt) as HelpingCuratorTotal
FROM Staff s LEFT JOIN
(SELECT woa.Curator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.Curator
) woac
ON s.empID = woac.Curator LEFT JOIN
(SELECT woa.HelpingCurator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.HelpingCurator
) woahc
ON s.empID = woahc.HelpingCurator
WHERE s.StaffLevel < 7;
Notice that the aggregation on the outer level is not needed.
I was wondering if someone could cast their eye over the query I am trying to execute, I can't quite think on the best way to do it.
I need the Email, Firstname and Surname from the Contact table and the HotlineID and Last Action from the Hotline Table. I want to filter on 'flag' column stored in the Hotline table to only show rows where the value is 1. I have achieved this by this query:
select Email, FirstName, Surname, HotlineID, LastAction
from Hotline
left join contact on contact.companyid=hotline.CompanyID
and contact.ContactID=hotline.ContactID
where
hotline.Flag = 1
Now the bit I can't do. In the Actions Table there are 3 columns 'HotlineID' 'Comment' 'Date' the HotlineID in the Actions Table is linked to the HotlineID in the Hotlines Table. Multiple comments can be added for each Hotline and the date they are posted is recorded in the Date column.
Of the returned rows from the first query I want to further filter out any rows where the Max Date (last recorded comment) is less than 48 hours behind the current date. I am using 'addwithvalue' in visual studio to populate the date variable, but for testing purposes I use '2014-12-04'
I've come up with this, which fails. But I am unsure why?
Select Email, FirstName, Surname, hotline.HotlineID, LastAction
from Hotline
left join Contact on Contact.CompanyID=Hotline.CompanyID
and Contact.ContactID=Hotline.ContactID
inner join Actions on actions.HotlineID=hotline.HotlineID
where hotline.flag=1 and CONVERT(VARCHAR(25), Max(Date), 126) LIKE '2014-12-03%'
I'm using SQL Server.
MAX() is an aggregate function of a group of rows. Its use would convert your ordinary query into an aggregate query if it appeared in the select list, which does not appear to be what you want. Evidently SQL Server will not accept it at all in your where clause.
It seems like you want something like this instead:
SELECT
Contact.Email,
Contact.FirstName,
Contact.Surname,
recent.HotlineID,
Hotline.Action
FROM
(SELECT HotlineID, MAX([Date]) as maxDate
FROM Hotline
GROUP BY HotlineID) recent
INNER JOIN Hotline
ON recent.HotlineId = Hotline.HotlineId
LEFT JOIN Contact
ON Hotline.HotlineId = Contact.HotlineId
WHERE
datediff(hour, recent.maxDate, GetDate()) < 48
AND Hotline.Flag = 1
Possibly you want to put the WHERE clause inside the subquery. The resulting query would have a slightly different meaning than the one above, and I'm not sure which you really want.
You can try this
Select Email, FirstName, Surname, hotline.HotlineID, LastAction
from Hotline
left join Contact on Contact.CompanyID=Hotline.CompanyID
and Contact.ContactID=Hotline.ContactID
inner join Actions on actions.HotlineID=hotline.HotlineID
where hotline.flag=1
and CONVERT(VARCHAR(25), Max(Date), 126) < CONVERT(VARCHAR(25), GetDate() - 2, 126)
John's query is good outside of using your Hotlines table in the derived table instead of your Actions table.
SELECT Email, FirstName, Surname, HotlineID, LastAction
FROM Hotline h
INNER JOIN
(SELECT hotlineID, max(date) as Date FROM actions a1 GROUP BY hotlineID) a
ON h.hotlineID = a.hotlineID
LEFT JOIN contact c
ON c.companyid=h.CompanyID and c.ContactID=h.ContactID
WHERE
hotline.Flag = 1
and datediff(hour,[Date],getdate()) > 48
I need some help for a PostgreSQL query.
I have 4 tables involved on it: customer, organization_complete, entity and address. I retrieve some data from everyone and with this query:
SELECT distinct ON (c.customer_number, trim(lower(o.name)), a.street, a.zipcode, a.area, a.country)
c.xid AS customer_xid, o.xid AS entity_xid, c.customer_number, c.deleted, o.name, o.vat, 'organisation' AS customer_type, a.street, a.zipcode, a.city, a.country
FROM customer c
INNER JOIN organisation_complete o ON (c.xid = o.customer_xid AND c.deleted = 'FALSE')
INNER JOIN entity e ON e.customer_xid = o.customer_xid
INNER JOIN address a ON (a.contact_info_xid = e.contact_info_xid and a.address_type = 'delivery')
WHERE c.account_xid = "<value>"
I get a distinct of all the customers splitted by customer_number, name, street, zipcode, area and country (what's specified after the DISTINCT ON statement).
What I need to retrieve now is a distinct of all customers having a doubled row on DB but I also need to retrieve the customer_xid and the entity_xid, that are primary keys of the respective tables and so are unique. For this reason they can't be included into an aggregate function. All I need is to count how many rows with the same customer_number, name, street, zipcode, area and country I have for each distinct tuple and to select only tuples with a count bigger than 1.
For each selected tuple I need also to take a customer_xid and an entity_xid, at random, like MySQL would do with a_key in a query like this:
SELECT COUNT(*), tab.a_key, tab.b, tab.c from tab
WHERE 1
GROUP BY tab.b
I know MySQL is quite an exception regarding this, I just want to know if may be possible to obtain the same result on PostgreSQL.
Thanks,
L.
This query in MySql is using a nonstandard (see note below) "MySql group by extension": http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
SELECT COUNT(*), tab.a_key, tab.b, tab.c
from tab
WHERE 1
GROUP BY tab.b
Note: This is a feature definied in SQL:2003 Standard as T301 Functional dependencies, it is not required by the standard, and many RDBMS don't support it, including PostgreSql (see this link for version 9.3 - unsupported features: http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html ).
The above query could be expressed in PostgreSQL in this way:
SELECT tab.a_key, tab.b, tab.c,
q.cnt
FROM (
SELECT tab.b,
COUNT(*) As cnt,
MIN(tab.unique_id) As unique_id /* could be also MAX */
from tab
WHERE 1
GROUP BY tab.b
) q
JOIN tab ON tab.unique_id = q.unique_id
where unique_id is a column that uniquely identifies each row in tab (usually a primary key).
Min or Max functions choose one row from the table in a pseudo-random manner.
Using the following schema:
Supplier (sid, name, status, city)
Part (pid, name, color, weight, city)
Project (jid, name, city)
Supplies (sid, pid, jid**, quantity)
Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
Get supplier numbers and names for suppliers of the same part to at least two different projects.
These were my answers:
1.
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
2.
SELECT s.sid, s.name
FROM Suppliers s, Supplies su, Project pr, Part p
WHERE s.sid = su.sid AND su.pid = p.pid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid)>=2
Can anyone confirm if I wrote this correctly? I'm a little confused as to how the Group By and Having clause works
The semantics of Having
To better understand having, you need to see it from a theoretical point of view.
A group by is a query that takes a table and summarizes it into another table. You summarize the original table by grouping the original table into subsets (based upon the attributes that you specify in the group by). Each of these groups will yield one tuple.
The Having is simply equivalent to a WHERE clause after the group by has executed and before the select part of the query is computed.
Lets say your query is:
select a, b, count(*)
from Table
where c > 100
group by a, b
having count(*) > 10;
The evaluation of this query can be seen as the following steps:
Perform the WHERE, eliminating rows that do not satisfy it.
Group the table into subsets based upon the values of a and b (each tuple in each subset has the same values of a and b).
Eliminate subsets that do not satisfy the HAVING condition
Process each subset outputting the values as indicated in the SELECT part of the query. This creates one output tuple per subset left after step 3.
You can extend this to any complex query there Table can be any complex query that return a table (a cross product, a join, a UNION, etc).
In fact, having is syntactic sugar and does not extend the power of SQL. Any given query:
SELECT list
FROM table
GROUP BY attrList
HAVING condition;
can be rewritten as:
SELECT list from (
SELECT listatt
FROM table
GROUP BY attrList) as Name
WHERE condition;
The listatt is a list that includes the GROUP BY attributes and the expressions used in list and condition. It might be necessary to name some expressions in this list (with AS). For instance, the example query above can be rewritten as:
select a, b, count
from (select a, b, count(*) as count
from Table
where c > 100
group by a, b) as someName
where count > 10;
The solution you need
Your solution seems to be correct:
SELECT s.sid, s.name
FROM Supplier s, Supplies su, Project pr
WHERE s.sid = su.sid AND su.jid = pr.jid
GROUP BY s.sid, s.name
HAVING COUNT (DISTINCT pr.jid) >= 2
You join the three tables, then using sid as a grouping attribute (sname is functionally dependent on it, so it does not have an impact on the number of groups, but you must include it, otherwise it cannot be part of the select part of the statement). Then you are removing those that do not satisfy your condition: the satisfy pr.jid is >= 2, which is that you wanted originally.
Best solution to your problem
I personally prefer a simpler cleaner solution:
You need to only group by Supplies (sid, pid, jid**, quantity) to
find the sid of those that supply at least to two projects.
Then join it to the Suppliers table to get the supplier same.
SELECT sid, sname from
(SELECT sid from supplies
GROUP BY sid
HAVING count(DISTINCT jid) >= 2
) AS T1
NATURAL JOIN
Supliers;
It will also be faster to execute, because the join is only done when needed, not all the times.
--dmg
Because we can not use Where clause with aggregate functions like count(),min(), sum() etc. so having clause came into existence to overcome this problem in sql. see example for having clause go through this link
http://www.sqlfundamental.com/having-clause.php
First of all, you should use the JOIN syntax rather than FROM table1, table2, and you should always limit the grouping to as little fields as you need.
Altought I haven't tested, your first query seems fine to me, but could be re-written as:
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
Or simplified as:
SELECT sid, name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
) > 1
However, your second query seems wrong to me, because you should also GROUP BY pid.
SELECT s.sid, s.name
FROM
Supplier s
INNER JOIN (
SELECT su.sid
FROM Supplies su
GROUP BY su.sid, su.pid
HAVING COUNT(DISTINCT su.jid) > 1
) g
ON g.sid = s.sid
As you may have noticed in the query above, I used the INNER JOIN syntax to perform the filtering, however it can be also written as:
SELECT s.sid, s.name
FROM Supplier s
WHERE (
SELECT COUNT(DISTINCT su.jid)
FROM Supplies su
WHERE su.sid = s.sid
GROUP BY su.sid, su.pid
) > 1
What type of sql database are using (MSSQL, Oracle etc)?
I believe what you have written is correct.
You could also write the first query like this:
SELECT s.sid, s.name
FROM Supplier s
WHERE (SELECT COUNT(DISTINCT pr.jid)
FROM Supplies su, Projects pr
WHERE su.sid = s.sid
AND pr.jid = su.jid) >= 2
It's a little more readable, and less mind-bending than trying to do it with GROUP BY. Performance may differ though.
1.Get supplier numbers and names for suppliers of parts supplied to at least two different projects.
SELECT S.SID, S.NAME
FROM SUPPLIES SP
JOIN SUPPLIER S
ON SP.SID = S.SID
WHERE PID IN
(SELECT PID FROM SUPPPLIES GROUP BY PID, JID HAVING COUNT(*) >= 2)
I am not slear about your second question