SQL SUM Aggregation with DISTINCT Statement - sql

Table 1: Schema for the bookworm database. Primary keys are underlined. There are some foreign key references to link the tables together; you can make use of these with natural joins.
Author(aid, alastname, afirstname, acountry, aborn, adied).
Book(bid, btitle, pid, bdate, bpages, bprice).
City(cid, cname, cstate, ccountry).
Publisher(pid, pname).
Author_Book(aid, bid).
Publisher_City(pid, cid).
This question has me a little confused, any help is much appreciated.
Find the number of authors of each country.
So far I have tried
SELECT
SUM(DISTINCT acountry) AS total,
COUNT(DISTINCT acountry)AS N
FROM Author;
And received:
ERROR: function sum(character varying) does not exist
LINE 1: select sum(distinct acountry) as total, count(distinct acoun...
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

You can't sum strings. As acountry contains the string value (Assuming country name). And to get the the number of authors of each country. To get the country wise author count you need to use the GroupBy country name and count the rows within that group which you can try as below,
SELECT COUNT(*) AS totalAuthor,
acountry AS Country
FROM Author
GROUP BY acountry

You need to group by country, and count the number of records for each group
SELECT acountry, COUNT(1) as NumberOfAuthors FROM Author GROUP BY acountry

Related

Retrieving column value in table2 via same ID in table1

I have this SQL query that returns overdue assignments
SELECT DUE_DATE,
SUBJECT,
ASSIGNMENT,
STUDENT_NAME,
TEACHER_NAME
FROM(SELECT DISTINCT
a.due_date AS due_date,
a.subject AS subject,
a.assignment AS assignment,
a.student_name AS student_name,
a.student_id AS student_id,
a.teacher_name AS teacher_name,
a.teacher_id AS teacher_id
FROM DB.ASSIGNMENT a,
DB.ALL b,
WHERE (trunc(a.DATE_CREATED) >= trunc(db.utc_sysdate)))
WHERE((trunc(due_date) < trunc(db.utc_sysdate));
and I want to include both the teacher and student emails as additional columns in my SQL query - I was wondering how to map their id in table ASSIGNMENT in order to get their respective emails in table ALL with the existing query I have?
We do lack some information, but - wouldn't your query be like this?
select distinct
a.due_date,
a.subject,
a.assignment,
a.student_name,
a.student_email,
a.teacher_name,
a.teacher_email
from db.assignment a join db.all b
on trunc(a.date_created) >= trunc(b.utc_sysdate)
and trunc(a.due_date) < trunc(b.utc_sysdate);
What's the difference, if compared to your query?
your query is invalid
comma after db.all b
the final where clause references db. "alias" (although it is probably schema name, according to inline view's from clause)
there's no point in aliasing column names using exactly the same name; what's the difference between a.due_date as due_date and a.due_date itself? None. So don't use it, you're just causing confusion
as you want to include student's and teacher's e-mail addresses, why don't you just do that? Add those columns into the query ...
it seems that you don't need an inline view; put both where conditions into the same query and remove columns you don't need (both IDs)

Selecting fields that are not in GROUP BY when nested SELECTS aren't allowed

I have the tables:
Product(code (PK), pname, (....), sid (FK)),
Supplier(sid(PK), sname, (....))
The assignment is:
Find Suppliers that supply only one product. Display their name (sname) and product name (pname).
It seem to me like a GROUP BY problem, so I used:
SELECT sid FROM
Product GROUP BY sid
HAVING CAST(COUNT(*) AS INTEGER) = 1;
This query have found me the list of sid's that supply one product only, but now I have encountered a problem:
The assignment forbids any form of nested SELECT queries.
The result of the query I have written has only one column. (The sid column)
Thus, I am unable to access the product name as it is not in the query result table, and if I would have added it to the GROUP BY statement, then the grouping will based on product name as well, which is an unwanted behavior.
How should I approach the problem, then?
Note: I use PostgreSQL
You can phrase the query as:
SELECT s.sid, s.sname, MAX(p.pname) as pname
FROM Product p JOIN
Supplier s
ON p.sid = s.sid
GROUP BY s.sid, s.sname
HAVING COUNT(*) = 1;
You don't need to convert COUNT(*) to an integer. It is already an integer.
You could put
max(pname)
in the SELECT list. That's an aggregate, so it would be fine.

Mixing column names with built-in functions while using the Group BY cluase

I have constructed a relational database in SQLite to store coronavirus data of countries and regions around the world. The database schema is as follows:
Country (Name, Population)
DemographicData (CountryName*, AgeGroup, Gender, Cases, Deaths, Hospitalisations)
CountryData (CountryName*, Date, DailyNewCases, DailyNewDeaths, CumulativeCases, CumulativeDeaths,
CumulativeRecoveries)
Region (CountryName*, RegionName, Description)
RegionData (RegionName*, Date, DailyNewCases, DailyNewDeaths, CumulativeCases, CumulativeDeaths, CumulativeRecoveries)
The primary keys are underlined. The foreign keys are denoted by asterisks (*).
I wrote the following code to display the number of deaths per million population by country:
SELECT CountryName,
MAX(CumulativeDeaths) AS ConfirmedDeaths,
(ConfirmedDeaths*1000000/Population) AS DeathsPerMillion
FROM Country c
INNER JOIN CountryData d ON (c.Name=d.CountryName)
GROUP BY d.CountryName
When I executed the query in SQLite, it returned the error message:"no such column:confirmed deaths". Why it returned such kind of error message? How can I fix this error to get what I want to achieve?
Correct. You cannot re-use a column alias in the same select where it is defined (nor in the from or where).
The calculation is simple. So just repeat it:
SELECT CountryName,
MAX(CumulativeDeaths) AS ConfirmedDeaths,
(MAX(CumulativeDeaths)*1000000/Population) AS DeathsPerMillion
FROM Country c INNER JOIN
CountryData d
ON c.Name = d.CountryName
GROUP BY d.CountryName, Population;
Note that you are referring to Population in the SELECT as well. It should be part of the GROUP BY. Or be the argument to an aggregation function (such as MAX() or SUM().

How does GROUP BY use COUNT(*)

I have this query which finds the number of properties handled by each staff member along with their branch number:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s, PropertyForRent p
WHERE s.staffNo=p.staffNo
GROUP BY s.branchNo, s.staffNo
The two relations are:
Staff{staffNo, fName, lName, position, sex, DOB, salary, branchNO}
PropertyToRent{propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo, branchNo}
How does SQL know what COUNT(*) is referring to? Why does it count the number of properties and not (say for example), the number of staff per branch?
This is a bit long for a comment.
COUNT(*) is counting the number of rows in each group. It is not specifically counting any particular column. Instead, what is happening is that the join is producing multiple properties, because the properties are what cause multiple rows for given values of s.branchNo and s.staffNo.
It gets even a little more "confusing" if you include a column name. The following would all typically return the same value:
COUNT(*)
COUNT(s.branchNo)
COUNT(s.staffNo)
COUNT(p.propertyNo)
With a column name, COUNT() determines the number of rows that do not have a NULL value in the column.
And finally, you should learn to use proper, explicit join syntax in your queries. Put join conditions in the on clause, not the where clause:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s JOIN
PropertyForRent p
ON s.staffNo = p.staffNO
GROUP BY s.branchNo, s.staffNo;
GROUP BY clauses partition your result set. These partitions are all the sql engine needs to know - it simply counts their sizes.
Try your query with only count(*) in the select part.
In particular, COUNT(*) does not produce the number of distinct rows/columns in your result set!
Some people might think that count(*) really count all the columns, however the sql optimizer is smarter than that.
COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. Which mean that you can't use Distinct with count(*)
Count(*) will return the cardinality (elements in table) of the specified mapping.
What you have to remember is that when using count over a specific column, null won't be allowed while count(*) will allow null in the rows as it could be any field.
How does SQL know what COUNT(*) is referring to?
I'm pretty sure, however not 100% sure as I can't find in doc, that the sql optimizer simply do a count on the primary key (not null) instead of trying to handle null in rows.

Can’t figure out Query and Sub-Queries

I’m having trouble figuring this problem out.
I’m doing some revision exercises for university and would like to understand this BEFORE my exam in 2 days.
I’ve attempted some things (which I’ll post at the end). Please be kind, this is my first Database subject so my attempts may seem very stupid to you.
The question is as follows:
Which artist/s has/have the largest number of shows on at the moment?
Show the First & Last Name of the artist/s and their Address.
ORDER BY clause cannot be used.
Write a single SQL Statement.
Use Sub-Queries.
Relevant tables in the database:
Shows (ShowName, ArtistId, ShowStartDate, ShowEndDate)
Artists (ArtistId, FirstName, FamilyName, Address, PhoneNum)
We assume ArtistId, ShowStartDate, FirstName, FamilyName and Address cannot be null.
Now, I think that I have to count the number of shows each artist has on at the moment. Then, get the ArtistId for the artist/s that has/have the most. Use the ArtistId to retrieve the artist details (names and address).
I got as far as this (which is very wrong):
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId = (SELECT ArtistId
FROM Shows
WHERE ArtistId = (SELECT MAX(Counted)
FROM (SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId)
GROUP BY ArtistId));
Well, I know
SELECT ArtistId, COUNT(ArtistId)
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId
gives me a table with the count of how many times each ArtistId is listed.
Which is good.
But from this results table, I need to get the ArtistId/’s of the ones that have the highest count.
And this is where I’m lost.
Anyone can shed some light?
(As for which DBMS I am using: We have to use one created and supplied by the university. It’s very basic SQL. Simpler than Access 2010).
Thank you
(If you provide an answer [thank you thank you] could you also briefly explain the reasoning behind it?)
You need to find maximum of the count of shows by artist, then find out which artists have that count by re-running the count query but applying a having clause matching the maximum just found.
select FirstName, FamilyName, Address
from Artists
where ArtistId in -- use an in() to select the artists
(select ArtistId from -- just select the artist id from the results
(select ArtistId, count(*) c -- re-run the count query, but see having clause
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
having count(*) = -- use a having clause to only select those with the max count
(select max(c) from -- this is simply the maximum count
(select ArtistId, count(*) c -- find all counts by artist
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
) counts
)
)
)
Some syntax notes:
count(*) c means the column (with value count(*)) is given the alias c, so it can be referred to by an outer query. You can't refer to it as count(*), because that would be interpreted as an attempt at aggregation.
max(c) gets the maximum of the column named (or aliased) c (AFAIK you can't code max(count(*)) - maybe you could try it - I just typed this in without a console to test it)
counts is a table alias, which is a syntactic requirement when selecting from a result set
You haven't specified which database you're using, so you may have to replace current_date with your database's equivalent.
Some dbs allow you to reuse a query in a query (using a with clause), which would avoid rerunning the count subquery.
This query uses only subselects, but you can do it with a join too.
Try this:
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId IN (
SELECT ArtistId
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S1
WHERE Counted = (
SELECT MAX(Counted)
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S2
GROUP BY ArtistId)
);
It is simple and should work in your case.