SQL Group by not returning accurate answer - sql

it is current result
select cat.sms_schoolcategoryid as categoryId,
cat.sms_name as category ,
count(sch.sms_name) as schoolname,
count(stu.accountnumber)NoofStudent
from Filteredsms_schoolcategory cat
inner join Filteredsms_school sch
on cat.sms_schoolcategoryid=sch.sms_schoolcategoryid
inner join FilteredAccount stu
on sch.sms_schoolid=stu.sms_schoolid
group by cat.sms_schoolcategoryid,
cat.sms_name
;
I have three tables one is Category and 2nd is Schools and 3rd is students. i just want to count the schools on behalf of category when I join tables category and school it returns me accurate result and when i join students table with schools table it returns me wrong result. Please Guide me how it is possible.

There is guesswork involved unless you provide more information on you data model. However, it appears that your issue is rooted in the following:
When you join the student table, for each combination of school category and school (*) you generate additional records. I.e., your sql no longer counts schools by school category but students.
For a concrete solution and a good advice see #Nick.McDermaid's comment.

Try this one
select count(stu.accountnumber) as NoofStudent
, catsch.categoryId
, catsch.category
, catsch.schoolname
from FilteredAccount stu
inner join (
select cat.sms_schoolcategoryid as categoryId
, cat.sms_name as category
, count(sch.sms_name) as schoolname
, sch.sms_schoolid
from Filteredsms_schoolcategory cat
inner join Filteredsms_school sch
on cat.sms_schoolcategoryid = sch.sms_schoolcategoryid
group by cat.sms_schoolcategoryid, cat.sms_name, sch.sms_schoolid) catsch
on catsch.sms_schoolid = stu.sms_schoolid
group catsch.categoryId
, catsch.category
, catsch.schoolname

count() returns the number of non-NULL values. So, your two count() will return the same values. You can quickly fix the query using count(distinct):
select cat.sms_schoolcategoryid as categoryId,
cat.sms_name as category ,
count(distinct sch.sms_name) as schoolname,
count(distinct stu.accountnumber) as NoofStudent
from Filteredsms_schoolcategory cat inner join
Filteredsms_school sch
on cat.sms_schoolcategoryid = sch.sms_schoolcategoryid inner join
FilteredAccount stu
on sch.sms_schoolid = stu.sms_schoolid
group by cat.sms_schoolcategoryid, cat.sms_name ;
Actually, you probably don't need the second count distinct. Just count(stu.accountnumber) should count the students.

Related

Join with count

I need to write SQL query like:
Show all countries with more than 1000 users, sorted by user count.
The country with the most users should be at the top.
I have tables:
● Table users (id, email, citizenship_country_id)
● Table countries (id, name, iso)
Users with columns: id, email, citizenship_country_id
Countries with columns: id, name, iso
SELECT countries.name,
Count(users.citiizenship_country_id) AS W1
FROM countries
LEFT JOIN users ON countries.id = users.citizenship_country_id
GROUP BY users.citiizenship_country_id, countries.name
HAVING ((([users].[citiizenship_country_id])>2));
But this does not work - I get an empty result set.
Could you please tell me what I'm doing wrong?
A LEFT JOIN is superfluous for this purpose. To have 1000 users, you need at least one match:
SELECT c.name, Count(*) AS W1
FROM countries c JOIN
users u
ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING COUNT(*) > 1000;
Notice that table aliases also make the query easier to write and to read.
Group by country name and use HAVING Count(u.citiizenship_country_id)>1000, it filters rows after aggregation:
SELECT c.name,
Count(u.citiizenship_country_id) AS W1
FROM countries c
INNER JOIN users u ON c.id = u.citizenship_country_id
GROUP BY c.name
HAVING Count(u.citiizenship_country_id)>1000
ORDER BY W1 desc --Order top counts first
;
As #GordonLinoff pointed, you can use INNER JOIN instead of LEFT JOIN, because anyway this query does not return counries without users and INNER JOIN performs better because no need to pass not joined records to the aggregation.

how to count two values from three dataset

I have 3 datasets: company, post, postedited,
I want to count the numbers of companies' post and postedited. some companies post but did not edited.
here is my query :
SELECT company.name, company.id, count(*),
( select count(*)
from post, postedited
where post.id=postedited.post_id)
from company, post as p
where company.id=p.company_id
group by company_id
the outcome of post is right, but the column of postedited is the same. what's wrong with my query?
Your subquery is completely unrelated to the main query. It selects post and postedited and counts. You are showing this result for every row of the main query.
You want the subquery relate to the main query's post. So remove the post table from the subquery's from clause:
(select count(*) from postedited where postedited.post_id = p.id)
Now this subquery selects a count for the post_id of the main query's records. At last you must get the sum of the counts:
select
c.name, c.id, count(*) as posts,
sum(select count(*) from postedited pe where pe.post_id = p.id) as edits
from company c
join post p on p.company_id = c.id
group by c.id;
You can achieve the same thus:
select
c.name, c.id, count(distinct p.id) as posts, count(pe.post_id) as edits
from company c
join post p on p.company_id = c.id
left join postedited pe on pe.post_id = p.id
group by c.id;
SELECT c.name AS companyName
, c.id AS companyID
, COUNT(DISTINCT p.id) AS postCount
, COUNT(DISTINCT pe.post_id) AS postEditCount
FROM company c
LEFT OUTER JOIN post p ON p.Company_ID = c.ID
LEFT OUTER JOIN postEdited pe ON pe.Company_ID = c.ID
GROUP BY c.id, c.name
That will give you a list of all companies in your company table with a count of each of their posts and edited posts. If you need to further query against that dataset, you can. Or you can add a WHERE clause to the above query to filter it.
And I agree, please don't use comma syntax. It's very easy to produce unintended results, and it doesn't give a good representation of what you're actually querying against. Plus, it's no longer standard and being deprecated in many flavors of SQL. Good JOIN syntax will make your life much easier.

What Join to use against 2 Tables for All Data

Hi I am looking to find out what join I would use if I wanted to join 2 tables together. I currently have a list of all students so 25 students to 1 class and the other table only shows 7 of those names with their test results.
What I would like is to have 1:1 join for the ones with the test results and the other ones without I would like to show them underneath so all in all I have 20 records.
If somebody could please advise on how I could achieve this please.
Thanks in advance.
It sounds like you want an OUTER JOIN.
For this example, we'll assume that there is a table named student and that it contains a column named id which is UNIQUE (or PRIMARY) KEY.
We'll also assume that there is another table named test_result which contains a column named student_id, and that column is a foreign key referencing the id column in student.
For demonstration purposes, we'll just make up some names for the other columns that might appear in these tables, name and score.
SELECT s.id
, s.name
, r.score
FROM student s
LEFT
JOIN test_result r
ON r.student_id = s.id
ORDER
BY r.student_id IS NULL
, s.score DESC
, s.id
Note that if student_id is not unique in test_result, there is potential to return multiple rows that match a row in student.
To get (at most) one row returned from test_result per student, we could use an inline view.
SELECT s.id
, s.name
, r.score
FROM student s
LEFT
JOIN ( SELECT t.student_id
, MAX(t.score) AS score
FROM test_result t
GROUP BY t.student_id
) r
ON r.student_id = s.id
ORDER
BY r.student_id IS NULL
, s.score DESC
, s.id
The expressions in the ORDER BY clause are designed to return the students that have matching row(s) in test_result first, followed by students that don't.
This is just a demonstration, and very likely excludes some important criteria, such as which test a score should be returned for. But without a sample schema and some example data, we're just guessing.
You are looking for a left outer join or a full outer join.
The left outer join will show all students and their tests if they have them.
select *
from Students as s
left outer join Tests as t
on s.StudentId = t.StudentId
The full outer join will show all students with their tests if they have them, and tests even if they do not have students.
select *
from Students as s
full outer join Tests as t
on s.StudentId = t.StudentId

SQL Access: how to obtain output involving multiple tables without running 2 queries?

I would like to find out the most popular genre of film for a certain age group, for example 20-30 year-olds. I'm quite new to SQL and would appreciate any help I can get, apologies if this is too minor.
The relevant tables for this query are:
FILM {FID (PK), ..., Film_Title}
MEMBER {MID (PK), ..., Date_of_Birth}
LIST {MID (FK), FID (FK)}
GENRE {GID (PK), Genre}
FILM_ACTOR_DIRECTOR_GENRE {FID (FK), ..., GID (FK)}
FILM and MEMBER table should be quite self-explanatory, while a LIST is a selection of films a MEMBER wishes to rent. It's like a shopping basket. Each member only has one list and each list can contain many films. FILM_ACTOR_DIRECTOR_GENRE contains Genre belonging to each film. Each film can only have one genre.
So far I have managed to get an output which shows:
Genre # People Aged 20-30
------- -------------------
Action 5
Comedy 4
Horror 2
etc. etc.
However it involves creating a table and then running another query. Is there a way to obtain the most popular genre within a particular age group without having to run 2 separate queries?
The 2 queries I've used are:
SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
INTO Genre_by_Age
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#));
for creating the new table with information I want, and:
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM Genre_by_Age
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
to obtain the output shown above.
Is there a way to obtain the above result without running 2 separate queries? Thanks for your time!
How about using a subquery?
SELECT Genre_Name, COUNT(*) as Number_of_People_aged_20_to_30
FROM (SELECT DISTINCT Genre.Genre_Name, Member.Date_of_Birth
FROM ((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
) as t
GROUP BY Genre_Name
ORDER BY COUNT(*) DESC;
I think this should work:
SELECT Genre.Genre_Name, count(Member.MID) as Number_of_People_aged_20_to_30
FROM
((((Genre
INNER JOIN Film_Actor_Director_Genre ON Genre.GID = Film_Actor_Director_Genre.GID)
INNER JOIN Film ON Film_Actor_Director_Genre.FID = Film.FID)
INNER JOIN List ON Film.FID = List.FID)
INNER JOIN Member ON Member.MID = List.MID)
WHERE (((Member.[Date_of_Birth]) Between #4/16/1995# And #4/16/1985#))
GROUP BY Genre.Genre_Name
ORDER BY count(Member.MID) DESC;

Why do the results always turns out to be the same even if I change the parameter inside the function COUNT()?

The results of the three following code fragments turns out to be the same no matter what parameters are inside the parenthesis of function COUNT(), why?
SELECT Category.Category, Category.CategoryID, COUNT(Category) AS Popularity
FROM FavCategory INNER JOIN Category
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
HAVING COUNT(FavCategory.MemberID)>=2;
SELECT Category.Category, Category.CategoryID, COUNT(FavCategory.CategoryID) AS Popularity
FROM FavCategory INNER JOIN Category
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
HAVING COUNT(FavCategory.CategoryID)>=4;
SELECT Category.Category, Category.CategoryID, COUNT(FavCategory.MemberID) AS Popularity
FROM FavCategory INNER JOIN Category
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
HAVING COUNT(FavCategory.MemberID)>=2;
SELECT Category.Category, Category.CategoryID, COUNT(FavCategory.MemberID+Category.CategoryID) AS Popularity
FROM FavCategory INNER JOIN Category
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
HAVING COUNT(FavCategory.MemberID)>=2;
Here are the records on the Category and FavCategory table
You are asking about the why. They are all the same, as COUNT counts depending on if a field/expression is not null or null. If the value is not null, COUNT counts it; if null, it ignores it
You don't have any nulls on your table, hence all your three queries report the same value. Try COUNT('DRACULA'), COUNT(42), COUNT(0) or even COUNT(-1), they will count 3 for CategoryID 3 and 2 for CategoryID 1, they will work the same as your three queries.
And of course, you can also use COUNT(*) if you are using INNER JOIN, and it is advisable. If you are using LEFT JOIN, it's incorrect to use COUNT(*), you must do this: COUNT(secondTable.foreignKeyColumnHere); or if Access supports counting based on cardinality(like in Postgresql), just do this: COUNT(secondTable.*)
For a primer on counting and enlightenment regarding its proper use (plug alert), read my post about count at http://www.ienablemuch.com/2010/04/debunking-myth-that-countdracula-is.html
#JDein
Given this data:
create table Person
(
PersonId int not null primary key,
Name varchar(100) not null,
Middlename varchar(100) null
);
insert into Person(PersonId,Name,MiddleName) values
(1,'John','Winston'),
(2,'Paul','James'),
(3,'George',NULL),
(4,'Ringo','Parkin');
All of these would return 4:
select count(PersonID) from Person;
select count(Name) from Person;
select count(*) from Person;
select count(1) from Person;
select count(0) from Person;
select count(2) from Person;
select count(-1) from Person;
select count(42) from Person;
select count('Dracula') from Person;
Except for the following, this returns 3:
select count(MiddleName) from Person;
Live test: http://www.sqlfiddle.com/#!3/c1b1e/8
My guess is that you're actually after distinct values for the column, in which case use:
COUNT(DISTINCT (FavCategory.CategoryID))
(etc).
From the SQL Server documentation for COUNT (you haven't specified which database you're using):
COUNT(ALL expression) evaluates expression for each row in a group and returns the number of nonnull values.
(I believe ALL is the default, as opposed to DISTINCT.)
Given that none of the values are null in your tables, just using an expression is equivalent to COUNT(*) - i.e. it'll return the row count for the group. That's why every expression is giving the same result.
If you weren't after distinct results, please explain what you're trying to achieve, and we may be able to suggest an alternative. (Well, someone else may be able to - I suspect I won't, being a SQL beginner.)
If you want that the popularity could be seen more easily in the results, you should probably add an ORDER BY clause to sort the results by the COUNT column:
SELECT
Category.Category,
Category.CategoryID,
COUNT(FavCategory.MemberID) AS Popularity
FROM FavCategory INNER JOIN Category
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
HAVING COUNT(FavCategory.MemberID)>=2
ORDER BY Popularity DESC;
Perhaps you would also like to include categories that are not among the favourite ones. In that case you would need to replace INNER JOIN with LEFT JOIN and swap the sides of the join:
SELECT
Category.Category,
Category.CategoryID,
COUNT(FavCategory.MemberID) AS Popularity
FROM Category LEFT JOIN FavCategory
ON FavCategory.CategoryID= Category.CategoryID
GROUP BY Category, Category.CategoryID
ORDER BY Popularity DESC;
Note also that in this case it is vital that you count values of one of the joined table's (FavCategory) columns (MemberID in the above example). If some categories do not have matches in FavCategories, MemberID would be NULL and, as a result, not counted by COUNT.
You are trying to get the row count for combination of Category, Category.CategoryID. What these means is that DB will create all the unq combination of these 2 colums and then print the rows count for these each unq combination. If you have same cols in group clause and same where clause than your row count will not change. Having can effect on the row count but without data to look at its very difficult to tell if its having any effect