SQL return based on Aggregate Function - sql

I have this query that works, but it returns information for all cities and I only want to return based on the max population in the city for each row for one country but aggregate functions can't be used in the where clause. How can I limit my results to one per country?
SELECT lab6.country.name, max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.name, lab6.city.name"

PostgreSQL supports window functions that you can take advantage with.
SELECT countryName, cityName, largest_pop
FROM
(
SELECT a.name countryName,
b.name cityName,
b.population AS largest_pop,
DENSE_RANK() OVER (PARTITION BY a.name
ORDER BY b.population DESC) rn
FROM lab6.country a, lab6.city b
WHERE a.country_code = b.country_code
) x
WHERE rn = 1
Window Functions

Maybe I'm misunderstanding but do you just want to return the largest city in each country?
If so, you simply can group by country, instead of by country and city. You'll need to include the attribute that identifies a country, and the name of that country in your GROUP BY statement. Your query will end up looking like:
SELECT lab6.country.name AS cName, max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.country_code, lab6.country.name
If you want to also include the name of the largest city, you'll first need to decide what to do if there are multiple largest cities (countries where there are two or more cities with the same, largest, population). I'm going to assume you're okay with including them all. In that case, you can simply do a sub-query in your FROM clause, joined on cities with the same population:
SELECT lc.cName, lab6.city.name, lc.largest_pop
FROM (
SELECT lab6.country.country_code AS cCode
lab6.country.name AS cName,
max(lab6.city.population) AS largest_pop
FROM lab6.country, lab6.city
WHERE lab6.country.country_code = lab6.city.country_code
GROUP BY lab6.country.country_code, lab6.country.name
) AS lc
JOIN lab6.city ON lc.cCode = lab6.city.country_code
WHERE lab6.city.population = lc.largest_pop

Related

SQL: Select inside select?

I have a table of car accident in a major city, and the structure is like:
accident_table has the following columns:
id, caseno, date_of_occurrence, street, iucr, primary_type,
description, district, community_area, year, updated_on
I want to write a query that finds the street which has the most accidents for each district(I think the street count for each street is the number of accident that happened on that street).
Here is what I have:
SELECT DISTINCT on (street)
street,
district
FROM
(
SELECT
count(street) as street_cnt,
street,
district
FROM accident_table
)
WHERE street_count = (SELECT max(street_cnt))
It did not give me syntax error, but timed out, so I guess it took too long to run.
What's wrong and how to fix it?
Thanks,
Philip
First aggregate to get the count of accidents for each street. Then use the rank() window function to rank the streets within a district by the count of accidents in them. Then only select the ones that were ranked at the top.
SELECT x.district,
x.street,
x.accidents
FROM (SELECT a.district,
a.street,
count(*) accidents,
rank() OVER (PARTITION BY a.district
ORDER BY count(*) DESC) r
FROM accident_table a
GROUP BY a.district,
a.street) x
WHERE x.r = 1;
Your code looks like Postgres. In that database, you can express this without a subquery:
SELECT DISTINCT ON (a.district)
a.district, a.street, COUNT(*) as accidents
FROM accident_table a
GROUP BY a.district, a.street
ORDER BY a.district, COUNT(*) DESC;
That said, your problem is performance, which is probably not affected by subqueries. An index on accident_table(district, street) might help performance.

SQL, finding the minimum of the maximum of people living in housing sorted by city

I have a relation housing_complex with the following columns:
city
building_name
inhabitants
I want to create a query that finds the minimum of all the maximum inhabitants grouped by the city.
So far I can find a table of maximums with:
select max(inhabitants)
from housing_complex
group by city
How would I find the minimum of the output?
you can use subquery
SELECT MIN(inhabitants)
FROM housing_complex
WHERE inhabitants IN (
SELECT MAX(inhabitants)
FROM housing_complex
GROUP BY city
)
or in Common Table Expression (CTE)
WITH tmp AS (
SELECT MAX(inhabitants) as m, city as c
FROM housing_complex
GROUP BY city
)
SELECT MIN(m), c
FROM tmp
How would I find the minimum of the output?
Use order by and limit:
select max(inhabitants)
from housing_complex
group by city
order by max(inhabitants) asc
limit 1;

Get the first 3 elements for each value of an attribute SQL

I'm using the mondial database that can be queried here http://www.semwebtech.org/sqlfrontend/
I am trying to get the 3 religion that are most praticed on each continent I've come up with this:
select religion.name, sum(religion.percentage) as total, continent
from religion join encompasses on religion.country = encompasses.country
group by name, continent order by continent, total DESC
This gives me a list of each religion with for each continent ordered by their popularity but how do I get the first 3 results for each continent ?
I have looked up cursor but I don't see how to apply them to my case and it looks like there is a simple answer
I would use window functions:
select rc.*
from (select r.name, sum(r.percentage) as total, e.continent,
row_number() over (partition by e.continent order by sum(r.percentage) as total desc) as seqnum
from religion r join
encompasses e
on r.country = e.country
group by r.name, e.continent
) rc
where seqnum <= 3;
Having explained how to modify your query to answer the question, let me now point out that your query is wrong. The sum of the percentage of people is not the same as the total number of people. For instance, I think Bhutan is pretty close to 100% Buddhist (Buddhism is the state religion). But there are more Buddhists in India (~0.7% Buddhist, according to one source).

SQL Query: Find the name of the company that has been assigned the highest number of patents

Using this query I can find the Company Assignee number for company with most patents but I can't seem to print the company name.
SELECT count(*), patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) =
(SELECT max(count(*))
FROM Patent
Group by patent.assignee);
COUNT(*) --- ASSIGNEE
9 19715
9 27895
Nesting above query into
SELECT company.compname
FROM company
WHERE ( company.assignee = ( *above query* ) );
would give an error "too many values" since there are two companies with most patents but above query takes only one assignee number in the WHERE clause. How do I solve this problem? I need to print name of BOTH companies with assignee number 19715 and 27895. Thank you.
You have started down the path of using nested queries. All you need to do is remove COUNT(*):
SELECT company.compname
FROM company
WHERE company.assignee IN
(SELECT patent.assignee
FROM Patent
GROUP BY patent.assignee
HAVING count(*) = (SELECT max(count(*))
FROM Patent
GROUP BY patent.assignee
)
);
I wouldn't write the query this way. The use of max(count(*)) is particularly jarring, but it is valid Oracle syntax.
Applying an aggregate function on another aggregate function (like max(count(*))) is illegal in many databases but I believe using the ALL operator instead and a join to get the company name would solve your problem.
Try this:
SELECT COUNT(*), p.assignee, c.compname
FROM Patent p
JOIN Company c ON c.assignee = p.assignee
GROUP BY p.assignee, c.compname
HAVING COUNT(*) >= ALL -- this predicate will return those rows
( -- for which the comparison holds true
SELECT COUNT(*) -- for all instances.
FROM Patent -- it can only be true for the highest count
GROUP BY assignee
);
Assuming you have Oracle, I thought about this a bit differently:
select
c.compname
from
company c
join
(
select
assignee,
dense_rank() over (order by count(1) desc) rnk
from
patent
group by
assignee
) p
on p.assignee = c.assignee
where
p.rnk = 1
;
I like this because is lets you find the any rank. For example, if you want the top 3 you would just change p.rnk = 1 to p.rnk <= 3. If you want 10th place, you just change it to p.rnk = 10. Adding the total count and rank into the results would be easy from here too. Overall I think it's more versatile.

not a single-group group function with MAX in select

Select sg_gameno, Max(sg_Year), sg_end, sg_hostcity, country_olympic_name
from Summergames s, Country co
where s.country_isocode = co.country_isocode
Don't know whats wrong with this. I want to get the lastest year. Should i use MAX or something else.
If you want to aggregate one column (sg_year) and to not aggregate others, you need a GROUP BY clause.
Select sg_gameno, Max(sg_Year), sg_end, sg_hostcity, country_olympic_name
from Summergames s,
Country co
where s.country_isocode = co.country_isocode
group by sg_gameno, sg_end, sg_hostcity, country_olympic_name
is syntactically valid. Whether it provides you the results you want is another question-- you'd need to tell us what your tables look like, what data is in them, what result you want, etc.
In Oracle, you can't have aggregate functions and individual columns in the SELECT list, unless the individual columns are included in GROUP BY clause.
You can use RANK or DENSE_RANK function to rank the records based on thet year and then select the from the resultset the top ranked rows.
select * from (
select sg_gameno, sg_Year, sg_end, sg_hostcity, country_olympic_name,
rank() over (order by sg_year desc) as ranking
from Summergames s, Country co
where s.country_isocode = co.country_isocode
)
where ranking = 1;
You can also use following query to get the same result. You will have to select the one that performs best for you.
select sg_gameno, sg_Year, sg_end, sg_hostcity, country_olympic_name
from Summergames s, Country co
where s.country_isocode = co.country_isocode
and sg_Year = (select max(sg_Year)
from Summergames s, Country co
where s.country_isocode = co.country_isocode);