How to avoid using nested aggregate functions? - sql

I need some help here, I'm sure you guys know how to do it:
Let's start with table structure:
author(name, nationality, Gender);
article(title, year, conference);
publication(articleTitle, authorName);
I need to know the Gender of authors which have the highest number of publications. By the Way I'm using PostgreSQL, don't know if that matters.
Here's My idea:
select gender from author
join publication on(name = authorName)
having (count(articleTitle) = max(count(articleTitle)))
group by gender
Now, I know i cannot use nested aggregate functions, and that's why I'm trying to use nested selects, something like select gender where gender in (another select) But I did not managed to avoid the aggregate function issue.
Hope you can help me, Thank you

This query gets you the authors, ordered by the number of publications:
select a.name, a.gender, count(*) as num_publications
from author a join
publication p
on a.name = p.authorName
group by a.name, a.gender
order by num_publications desc;
If you want the top three, then use fetch first or limit:
select a.name, a.gender, count(*) as num_publications
from author a join
publication p
on a.name = p.authorName
group by a.name, a.gender
order by num_publications desc
fetch first 3 rows only;

Related

SQL Collect duplicates to one place? PostgreSQL

Sorry I'm new here and I'm also new with SQL and can't really explain my problem in the title...
So I have a TV show database, and there I have a Genre column, but for a TV show there are multiple Genres stored, so when I'm selecting all my TV Shows how can I combine them?
It needs to look like this:
https://i.stack.imgur.com/3EhBj.png
So I have to combine the string together, here is my code so far what I wrote:
SELECT title,
year,
runtime,
MIN(name) as name,
ROUND(rating, 1) as rating,
trailer,
homepage
FROM shows
JOIN show_genres
on shows.id = show_genres.show_id
JOIN genres
on show_genres.genre_id = genres.id
GROUP BY title,
year,
runtime,
rating,
trailer,
homepage
ORDER BY rating DESC
LIMIT 15;
I also have some other stuff here, that's my exerciese tasks! Thanks!
Also here is the relationship model:
https://i.stack.imgur.com/M89ho.png
Basically you need string aggregation - in Postgres, you can use string_agg() for this.
For efficiency, I would recommend moving the aggregation to a correlated subquery or a lateral join rather than aggregating in the outer query, so:
SELECT
s.title,
s.year,
s.runtime,
g.genre_names,
ROUND(s.rating, 1) as rating,
s.trailer,
s.homepage
FROM shows s
LEFT JOIN LATERAL (
SELECT string_agg(g.name, ', ') genre_names
FROM show_genres sg
INNER JOIN genres g ON g.id = sg.genre_id
WHERE sg.show_id = s.id
) g ON 1 = 1
ORDER BY s.rating DESC
LIMIT 15

SQL Not a GROUP BY expression

I'm still new to SQL.
I've got a query to count the number of students that attend a certain lecture and I've been trying to group the records by the lectureid so I don't have 10 records for the same lecture.
SELECT ATTENDANCESHEET.LECTUREID,TOPIC, (
SELECT COUNT(STUDENTID) AS ATTENDANCE
FROM ATTENDANCESHEET
WHERE ATTENDANCESHEET.STUDENTID = LECTURE.STUDENTID
)
FROM ATTENDANCESHEET,LECTURE
WHERE ATTENDANCESHEET.LECTUREID = LECTURE.LECTUREID
GROUP BY ATTENDANCESHEET.LECTUREID;
I'm getting the error "not a GROUP BY expression". Can someone help me, please?
The error is because you have a correlated query. The correlation clause (the where in the subquery) is using a column from the outer query that is not aggregated. In addition, you have a column topic that is not in the group by.
I believe the query you want is more simply written as:
select a.lectureid, count(*) as attendance
from attendancesheet a
group by a.lectureid;
I notice that you have topic in the select. That is also an issue. Perhaps you want:
select l.lectureid, l.topic, count(*) as attendance
from attendancesheet a join
lecture l
on a.lectureid = l.lectureid
group by l.lectureid;
Or, if you have studentid in lecture, perhaps:
select l.lectureid, l.topic, count(*) as attendance
from lecture l
group by l.lectureid;
EDIT:
The data structure doesn't make sense to me, but perhaps you need both keys for the join:
select l.lectureid, l.topic, count(*) as attendance
from attendancesheet a join
lecture l
on a.lectureid = l.lectureid and a.studentid = l.lectureid
group by l.lectureid;
to solve the issue of group by without knowing the expected result
SELECT ATTENDANCESHEET.LECTUREID,TOPIC, (
SELECT COUNT(STUDENTID) AS ATTENDANCE
FROM ATTENDANCESHEET
WHERE ATTENDANCESHEET.STUDENTID = LECTURE.STUDENTID
)
FROM ATTENDANCESHEET,LECTURE
WHERE ATTENDANCESHEET.LECTUREID = LECTURE.LECTUREID
GROUP BY ATTENDANCESHEET.LECTUREID,TOPIC,LECTURE.STUDENTID; -- added the topic and studentid from lecture table
but I think what he's trying to do is
SELECT ATTENDANCESHEET.LECTUREID,TOPIC, count(LECTURE.STUDENTID) cntstudent
FROM ATTENDANCESHEET,LECTURE
WHERE ATTENDANCESHEET.LECTUREID = LECTURE.LECTUREID
GROUP BY ATTENDANCESHEET.LECTUREID,TOPIC
Try adding TOPIC to the group by :)

How to use GROUP BY in SQL subquery without arithmetic count

The result are many repeat rows (all conlums repeat)
how use group by without inner join?
I ned use a subquery inn this case.
This is fictional example, don't worry the logic or sense this example. I need use group by in subquery from many tables.
I can use group by with inner join, but this case I can't use inner join.
select
NAME,
AGE,
JOB
from (
select
pe.name NAME,
pe.age AGE,
jb.work JOB
from
pearson pe,
job jb
)
group by NAME, AGE, JOB
Yes you can use a group by in this case, but you will need to create an alias name of the inner query as shown below:
SELECT A.NAME, A.AGE, A.JOB
FROM (
SELECT pe.name NAME, pe.age AGE, jb.`function` JOB
FROM pearson pe, job jb
) A
GROUP BY A.NAME, A.AGE, A.JOB;

how to count two values from three dataset

I have 3 datasets: company, post, postedited,
I want to count the numbers of companies' post and postedited. some companies post but did not edited.
here is my query :
SELECT company.name, company.id, count(*),
( select count(*)
from post, postedited
where post.id=postedited.post_id)
from company, post as p
where company.id=p.company_id
group by company_id
the outcome of post is right, but the column of postedited is the same. what's wrong with my query?
Your subquery is completely unrelated to the main query. It selects post and postedited and counts. You are showing this result for every row of the main query.
You want the subquery relate to the main query's post. So remove the post table from the subquery's from clause:
(select count(*) from postedited where postedited.post_id = p.id)
Now this subquery selects a count for the post_id of the main query's records. At last you must get the sum of the counts:
select
c.name, c.id, count(*) as posts,
sum(select count(*) from postedited pe where pe.post_id = p.id) as edits
from company c
join post p on p.company_id = c.id
group by c.id;
You can achieve the same thus:
select
c.name, c.id, count(distinct p.id) as posts, count(pe.post_id) as edits
from company c
join post p on p.company_id = c.id
left join postedited pe on pe.post_id = p.id
group by c.id;
SELECT c.name AS companyName
, c.id AS companyID
, COUNT(DISTINCT p.id) AS postCount
, COUNT(DISTINCT pe.post_id) AS postEditCount
FROM company c
LEFT OUTER JOIN post p ON p.Company_ID = c.ID
LEFT OUTER JOIN postEdited pe ON pe.Company_ID = c.ID
GROUP BY c.id, c.name
That will give you a list of all companies in your company table with a count of each of their posts and edited posts. If you need to further query against that dataset, you can. Or you can add a WHERE clause to the above query to filter it.
And I agree, please don't use comma syntax. It's very easy to produce unintended results, and it doesn't give a good representation of what you're actually querying against. Plus, it's no longer standard and being deprecated in many flavors of SQL. Good JOIN syntax will make your life much easier.

Select the countries with fewest number of tuples

At http://www.dofactory.com/sql/sandbox I'm experimenting with submitting my own SQL queries against their sample database to become better at SQL. What I want to do is to select all countries from Customer that have exactly the fewest number of tuples. Here is my query attempt:
SELECT a.Country
FROM [Customer] a, (SELECT COUNT(*) AS Tot
FROM [Customer]
GROUP BY Country) b
GROUP BY a.Country
HAVING COUNT(*) = MIN(b.Tot)
However, the website returns an empty table instead of the correct result which is (Ireland, Norway, Poland). The correct result is easily realized by grouping the table by country and using COUNT(*), and then looking at the countries that have the smallest COUNT(*) value out of all COUNT(*) values. I would like some advice on how to generate the correct result without any assumptions about the table's data.
I would do this using SELECT TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES c.Country
FROM Customer c
GROUP BY c.Country
ORDER BY COUNT(*) ASC;
Two notes:
When using table aliases, make them abbreviations for the tables. This makes the query much easier to follow.
Never use commas in the FROM clause. Always use proper, explicit JOIN syntax.
Learned somtihing new(WITH TIES) from Gordon Linoff, again...
Here my solution without it...
Select a.Country from [Customer] a
group by a.Country
having count(*) = (select min(b.Tot) from (SELECT COUNT(*) AS Tot FROM [Customer] GROUP BY Country) b)
If you are not using sql 2012 then,
declare #Fewer int=2
;With CTE as
(
select c.*
,ROW_NUMBER()over(partition by countryid order by customerid)rn
from dbo.Customers C
)
select * from cte
where rn<=#Fewer