I have the following tables:
Animals (animal_id, producer_id, ...)
Producers (prod_id, ...)
BoughtAnimals (animal_id (fk), ...)
and I'd like to make a query that tells me for each producer, how many animals it has, and how many of those animals were bought. After much thought, I tried the following approach:
select Producers.name, count (distinct A1.animal_id), count(distinct BoughtAnimals.animal_id)
from Producers, Animals A1, Animals A2, BoughtAnimals
where
Producers.nif = A1.producer_id and
Producers.nif = A2.producer_id and
BoughtAnimals.animal_id = A2.animal_id
group by Producers.name;
but I did it only by trial and error, and I find it hard to reason about several Animal tables at once. Is there any other approach to make this query? Or is this the usual way of doing it?
Try something like this
select p.name,
sum(case when ba.anyfield is not null then 1 else 0 end) bought_count,
count(1) total_count
from Producers p join Animals a on (p.nif = a.producer_id)
left join BoughtAnimals ba using (animal_id)
group by p.name;
Use a simple JOIN, you could then put the "COUNT" in a HAVING statement. See documentation for LEFT / INNER JOIN and HAVING, depending on your SGDB.
I'm ignoring the producers table for now; all the critical data you need is in the other two tables. Once this part is right, you can just do an inner join on the producers table to get the other details you need.
select a1.producer_id,
count(a1.animal_id) as num_animals,
count(b1.animal_id) as num_bought
from animals a1
left join boughtanimals b1 on (b1.animal_id = a1.animal_id)
group by producer_id;
It's not clear to me whether that last column is better named "num_bought" or "num_sold". Also not clear is what it means for a producer to "have" an animal, given that some animals are either bought or sold.
Related
I have a basic question regarding a problem I faced:
Let's say I have this model with these tables :
Food(Quantity, Animal_id)
Race(Race_code, Race_name)
Animal(Animal_id, Race_code)
I have been asked to find the total quantity eaten by each race with select query.
(Of course by using SUM function. Race_name is also required for the display)
But I don't know to link the attributes of these tables to go from the quantity to the race name (I only know that my reasoning will be like this Quanity->animal_id->race_code->race_name). Any help ?
Looks like a join, doesn't it? Columns that aren't aggregated (race_name in this case) have to be put into the group by clause.
select r.race_name,
sum(f.quantity) sum_quantity
from race r join animal a on a.race_code = r.race_code
join food f on f.animal_id = a.animal_id
group by r.race_name;
Using subquery:
select race_name ,(select sum(quantity) from food where animal_id in (select animal_id from animal a where r.race_code = a.race_code))
from race r
I find myself in a bit of an unworkable situation with a SQL query and I'm hoping that I'm missing something or might learn something new. The structure of the DB2 database I'm working with isn't exactly built for this sort of query, but I'm tasked with this...
Let's say we have Table People and Table Groups. Groups can contain multiple people, and one person can be part of multiple groups. Yeah, it's already messy. In any case, there are a couple of intermediary tables linking the two. The problem is that I need to start with a list of groups, get all of the people in those groups, and then get all of the groups with which the people are affiliated, which would be a superset of the initial group set. This would mean starting with groups, joining down to the people, and then going BACK and joining to the groups again. I need information from both tables in the result set, too, so that rules out a number of techniques.
I have to join this with a number of other tables for additional information and the query is getting enormous, cumbersome, and slow. I'm wondering if there's some way that I could start with People, join it to Groups, and then specify that if a person has one group that is in the supplied set of groups (which is done via a subquery), then ALL groups for that person should be returned. I don't know of a way to make this happen, but I'm thinking (hoping) that there's a relatively clean way to make this happen in SQL.
A quick and dirty example:
SELECT ...
FROM GROUPS g
JOIN LINKING_A a
ON g.GROUPID = a.GROUPID
AND GROUPID IN (subquery)
JOIN LINKING_B b
ON a.GROUPLIST = b.GROUPLIST
JOIN PEOPLE p
ON b.PERSONID = p.PERSONID
--This gets me all people affiliated with groups,
-- but now I need all groups affiliated with those people...
JOIN LINKING_B b2
ON p.PERSONID = b2.PERSONID
JOIN LINKING_A a2
ON b2.GROUPLIST = a.GROUPLIST
JOIN GROUPS g2
ON a2.GROUPID = g.GROUPID
And then I can return information from p and g2 in the result set. You can see where I'm having trouble. That's a lot of joining on some large tables, not to mention a number of other joins that are performed in this query as well. I need to be able to query by joining PEOPLE to GROUPS, then specify that if any person has an associated group that is in the subquery, it should return ALL groups affiliated with that entry in PEOPLE. I'm thinking that GROUP BY might be just the thing, but I haven't used that one enough to really know. So if Bill is part of group A, B, and C, and our subquery returns a set containing Group A, the result set should include Bill along with groups A, B, and C.
The following is a shorter way to get all the groups that people in the supplied group list are in. Does this help?
Select g.*
From Linking_B b
Join Linking_B b2
On b2.PersonId = b.PersonId
Join Group g
On g.GroupId = b2.GroupId
Where b.Groupid in (SubQuery)
I'm not clear why you have both Linking_A and Linking_B. Generally all you should need to represent a many-to-many relationship between two master tables is a single association table with GroupID and PersonId.
I often recommend using "common table expressions" [CTE's] in order to help you break a problem up into chunks that can be easier to understand. CTE's are specified using a WITH clause, which can contain several CTE's before starting the main SELECT query.
I'm going to assume that the list of groups you want to start with is specified by your subquery, so that will be the 1st CTE. The next one selects people who belong to those groups. The final part of the query then selects groups those people belong to, and returns the columns from both master tables.
WITH g1 as
(subquery)
, p1 as
(SELECT p.*
from g1
join Linking a1 on g1.groupID=a1.groupID
join People p on p.personID=a1.personID )
SELECT p1.*, g2.*
from p1
join Linking a2 on p2.personID=a2.personID
join Groups g2 on g2.groupID=a2.groupID
I think I'd build the list of people you want to pull records for first, then use that to query out all the groups for those people. This will work across any number of link tables with the appropriate joins added:
with persons_wanted as
(
--figure out which people are in a group you want to include
select p.person_key
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where g.group name in ('GROUP_I_WANT_PEOPLE_FROM', 'THIS_ONE_TOO')
group by p.person_key --we only want each person_key once
)
--now pull all the groups for the list of people in at least one group we want
select p.name as person_name, g.name as group_name, ...
from person p
join link l1
on p.person_key = l1.person_key
join groups g
on l1.group_key = g.group_key
where p.person_key in (select person_key from persons_wanted);
I'm taking a database course this semester, and we're learning SQL. I understand most simple queries, but I'm having some difficulty using the count aggregate function.
I'm supposed to relate an advertisement number to a property number to a branch number so that I can tally up the amount of advertisements by branch number and compute their cost. I set up what I think are two appropriate new views, but I'm clueless as to what to write for the select statement. Am I approaching this the correct way? I have a feeling I'm over complicating this bigtime...
with ad_prop(ad_no, property_no, overseen_by) as
(select a.ad_no, a.property_no, p.overseen_by
from advertisement as a, property as p
where a.property_no = p.property_no)
with prop_branch(property_no, overseen_by, allocated_to) as
(select p.property_no, p.overseen_by, s.allocated_to
from property as p, staff as s
where p.overseen_by = s.staff_no)
select distinct pb.allocated_to as branch_no, count( ??? ) * 100 as ad_cost
from prop_branch as pb, ad_prop as ap
where ap.property_no = pb.property_no
group by branch_no;
Any insight would be greatly appreciated!
You could simplify it like this:
advertisement
- ad_no
- property_no
property
- property_no
- overseen_by
staff
- staff_no
- allocated_to
SELECT s.allocated_to AS branch, COUNT(*) as num_ads, COUNT(*)*100 as ad_cost
FROM advertisement AS a
INNER JOIN property AS p ON a.property_no = p.property_no
INNER JOIN staff AS s ON p.overseen_by = s.staff_no
GROUP BY s.allocated_to;
Update: changed above to match your schema needs
You can condense your WITH clauses into a single statement. Then, the piece I think you are missing is that columns referenced in the column definition have to be aggregated if they aren't included in the GROUP BY clause. So you GROUP BY your distinct column then apply your aggregation and math in your column definitions.
SELECT
s.allocated_to AS branch_no
,COUNT(a.ad_no) AS ad_count
,(ad_count * 100) AS ad_cost
...
GROUP BY s.allocated_to
i can tell you that you are making it way too complicated. It should be a select statement with a couple of joins. You should re-read the chapter on joins or take a look at the following link
http://www.sql-tutorial.net/SQL-JOIN.asp
A join allows you to "combine" the data from two tables based on a common key between the two tables (you can chain more tables together with more joins). Once you have this "joined" table, you can pretend that it is really one table (aliases are used to indicate where that column came from). You understand how aggregates work on a single table right?
I'd prefer not to give you the answer so that you can actually learn :)
Lets look at some very simple example, have 3 tables:
dbo.Person(PersonId, Name, Surname)
dbo.Pet(PetId, Name, Breed)
dbo.PersonPet(PersonPetId, PersonId, PetId)
Need to select all persons with theirs pets if person has any.... for ex. in final application it should look smth like:
whats the most efficient way:
Select all persons and then in data access layer fill each person pets list with separate select?
Use join in sql level and then in data access layer filter all persons duplicates, by adding only one to result list and from other just filling pet list?
any other ideas?
The most efficient way is to select them all at once:
select p.*, pt.*
from Person p
left outer join PersonPet pp on p.PersonId = pp.PersonId
left outer join Pet pt on pp.PetId = pt.PetId
Need to select all persons with theirs pets if person has any...
Use:
SELECT per.name,
per.surname,
pt.name
FROM dbo.PERSON per
LEFT JOIN dbo.PERSONPET perpet ON perpet.personid = per.personid
JOIN dbo.PETS pt ON pt.petid = perpet.petid
Personally I would do it as a stored proc on the sql server. Whichever way you do it though, for display purposes you're going to have to filter out the duplicate Name and Surname.
The majority of the time taken to retrieve records is spent setting up and tearing down a query to the database. It doesn't make much difference how much data or how many tables you use in the query. It will be much more efficient to use a single query to get all the data. If your data access layer fetches each separately you'll get poor speed. Definitely use a join.
If your client and back end support multiple result sets you could also do somthing like (assuming its MSSQL)
Create Proc usp_GetPeopleAndPets
AS
BEGIN
SELECT PersonId, Name, Surname
FROM
dbo.Person p;
SELECT
pp.PersonID,
p.PetId, p.Name, p.Breed
FROM
dbo.PersonPet pp
INNER JOIN dbo.Pet p
ON pp.PetId = p.PetId
Order BY pp.PersonId
END
The data retrieval time would be roughly equivalent since its one trip to the DB. Some clients support relationships between results which allow you to do something like
Person.GetPets() or Pet.GetPerson()
The main advantage to this approach is that if you're working with people you don't have to worry about the "filter[ing] all person duplicate[s]"
I have a table B with cids and cities. I also have a table C that has these cids with extra information. I want to list all the cids in table C that are associated with ALL appearances of a given city in Table B.
My current solution relies on counting the number of times the given city appears in Table B and selecting only the cids that appear that many times. I don't know all the SQL syntax yet, but is there a way to select for this kind of pattern?
My current solution:
SELECT Agents.aid
FROM Agents, Customers, Orders
WHERE (Customers.city='Duluth')
AND (Agents.aid = Orders.aid)
AND (Customers.cid = Orders.cid)
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
It only works because I know right now with the HAVING statement.
Thanks for the help. I wasn't sure how to google this problem, since it's pretty specific.
EDIT: I'm pinpointing my problem a bit. I need to know how to determine if EVERY row in a table has a certain value for a field. Declaring a variable and counting the rows in a sub-selection and filtering out my results by IDs that appear that many times works, but It's really ugly.
There HAS to be a way to do this without explicitly count()ing rows. I hope.
Not an answer to your question, but a general improvement.
I'd recommend using JOIN syntax to join your tables together.
This would change your query to be:
SELECT Agents.aid
FROM Agents
INNER JOIN Orders
ON Agents.aid = Orders.aid
INNER JOIN Customers
ON Customers.cid = Orders.cid
WHERE Customers.city='Duluth'
GROUP BY Agents.aid
HAVING count(Agents.aid) > 1
What variant of SQL are you using?
To start with, you can (and should) use JOIN instead of doing it in the WHERE clause, e.g.,
select Agents.aid
from Agents
join Orders on Agents.aid = Orders.aid
join Customers on Customers.cid = Orders.cid
where Customers.city = 'Duluth'
group by Agents.aid
having count(Agents.aid) > 1
After that, I'm afraid I might be a little lost. Using the table names in your example query, what (in English, not pseudocode) are you trying to retrieve? For example, I think your sample query is retrieving the PK for all Agents that have been involved in at least 2 Orders involving Customers in Duluth.
Also, some table definitions for Agents, Orders, and Customers might help (then again, they might be irrelevant).
I'm not sure if I understood you problem, but I think the following query is what you want:
SELECT *
FROM customers b
INNER JOIN orders c USING (cid)
WHERE b.city = 'Duluth'
AND NOT EXISTS (SELECT 1
FROM customers b2
WHERE b2.city = b.city
AND b2.cid <> cid);
Probably you will need some indexes on these columns.