Can’t figure out Query and Sub-Queries - sql

I’m having trouble figuring this problem out.
I’m doing some revision exercises for university and would like to understand this BEFORE my exam in 2 days.
I’ve attempted some things (which I’ll post at the end). Please be kind, this is my first Database subject so my attempts may seem very stupid to you.
The question is as follows:
Which artist/s has/have the largest number of shows on at the moment?
Show the First & Last Name of the artist/s and their Address.
ORDER BY clause cannot be used.
Write a single SQL Statement.
Use Sub-Queries.
Relevant tables in the database:
Shows (ShowName, ArtistId, ShowStartDate, ShowEndDate)
Artists (ArtistId, FirstName, FamilyName, Address, PhoneNum)
We assume ArtistId, ShowStartDate, FirstName, FamilyName and Address cannot be null.
Now, I think that I have to count the number of shows each artist has on at the moment. Then, get the ArtistId for the artist/s that has/have the most. Use the ArtistId to retrieve the artist details (names and address).
I got as far as this (which is very wrong):
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId = (SELECT ArtistId
FROM Shows
WHERE ArtistId = (SELECT MAX(Counted)
FROM (SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId)
GROUP BY ArtistId));
Well, I know
SELECT ArtistId, COUNT(ArtistId)
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId
gives me a table with the count of how many times each ArtistId is listed.
Which is good.
But from this results table, I need to get the ArtistId/’s of the ones that have the highest count.
And this is where I’m lost.
Anyone can shed some light?
(As for which DBMS I am using: We have to use one created and supplied by the university. It’s very basic SQL. Simpler than Access 2010).
Thank you
(If you provide an answer [thank you thank you] could you also briefly explain the reasoning behind it?)

You need to find maximum of the count of shows by artist, then find out which artists have that count by re-running the count query but applying a having clause matching the maximum just found.
select FirstName, FamilyName, Address
from Artists
where ArtistId in -- use an in() to select the artists
(select ArtistId from -- just select the artist id from the results
(select ArtistId, count(*) c -- re-run the count query, but see having clause
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
having count(*) = -- use a having clause to only select those with the max count
(select max(c) from -- this is simply the maximum count
(select ArtistId, count(*) c -- find all counts by artist
from Shows
where current_date between ShowStartDate and ShowEndDate
group by ArtistId
) counts
)
)
)
Some syntax notes:
count(*) c means the column (with value count(*)) is given the alias c, so it can be referred to by an outer query. You can't refer to it as count(*), because that would be interpreted as an attempt at aggregation.
max(c) gets the maximum of the column named (or aliased) c (AFAIK you can't code max(count(*)) - maybe you could try it - I just typed this in without a console to test it)
counts is a table alias, which is a syntactic requirement when selecting from a result set
You haven't specified which database you're using, so you may have to replace current_date with your database's equivalent.
Some dbs allow you to reuse a query in a query (using a with clause), which would avoid rerunning the count subquery.
This query uses only subselects, but you can do it with a join too.

Try this:
SELECT FirstName, FamilyName, Address
FROM Artists
WHERE ArtistId IN (
SELECT ArtistId
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S1
WHERE Counted = (
SELECT MAX(Counted)
FROM (
SELECT ArtistId, COUNT(ArtistId) AS Counted
FROM Shows
WHERE ShowEndDate IS null
GROUP BY ArtistId) S2
GROUP BY ArtistId)
);
It is simple and should work in your case.

Related

Selecting fields that are not in GROUP BY when nested SELECTS aren't allowed

I have the tables:
Product(code (PK), pname, (....), sid (FK)),
Supplier(sid(PK), sname, (....))
The assignment is:
Find Suppliers that supply only one product. Display their name (sname) and product name (pname).
It seem to me like a GROUP BY problem, so I used:
SELECT sid FROM
Product GROUP BY sid
HAVING CAST(COUNT(*) AS INTEGER) = 1;
This query have found me the list of sid's that supply one product only, but now I have encountered a problem:
The assignment forbids any form of nested SELECT queries.
The result of the query I have written has only one column. (The sid column)
Thus, I am unable to access the product name as it is not in the query result table, and if I would have added it to the GROUP BY statement, then the grouping will based on product name as well, which is an unwanted behavior.
How should I approach the problem, then?
Note: I use PostgreSQL
You can phrase the query as:
SELECT s.sid, s.sname, MAX(p.pname) as pname
FROM Product p JOIN
Supplier s
ON p.sid = s.sid
GROUP BY s.sid, s.sname
HAVING COUNT(*) = 1;
You don't need to convert COUNT(*) to an integer. It is already an integer.
You could put
max(pname)
in the SELECT list. That's an aggregate, so it would be fine.

How to modify query to walk entire table rather than a single

I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:
You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.

How does GROUP BY use COUNT(*)

I have this query which finds the number of properties handled by each staff member along with their branch number:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s, PropertyForRent p
WHERE s.staffNo=p.staffNo
GROUP BY s.branchNo, s.staffNo
The two relations are:
Staff{staffNo, fName, lName, position, sex, DOB, salary, branchNO}
PropertyToRent{propertyNo, street, city, postcode, type, rooms, rent, ownerNo, staffNo, branchNo}
How does SQL know what COUNT(*) is referring to? Why does it count the number of properties and not (say for example), the number of staff per branch?
This is a bit long for a comment.
COUNT(*) is counting the number of rows in each group. It is not specifically counting any particular column. Instead, what is happening is that the join is producing multiple properties, because the properties are what cause multiple rows for given values of s.branchNo and s.staffNo.
It gets even a little more "confusing" if you include a column name. The following would all typically return the same value:
COUNT(*)
COUNT(s.branchNo)
COUNT(s.staffNo)
COUNT(p.propertyNo)
With a column name, COUNT() determines the number of rows that do not have a NULL value in the column.
And finally, you should learn to use proper, explicit join syntax in your queries. Put join conditions in the on clause, not the where clause:
SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount
FROM Staff s JOIN
PropertyForRent p
ON s.staffNo = p.staffNO
GROUP BY s.branchNo, s.staffNo;
GROUP BY clauses partition your result set. These partitions are all the sql engine needs to know - it simply counts their sizes.
Try your query with only count(*) in the select part.
In particular, COUNT(*) does not produce the number of distinct rows/columns in your result set!
Some people might think that count(*) really count all the columns, however the sql optimizer is smarter than that.
COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. Which mean that you can't use Distinct with count(*)
Count(*) will return the cardinality (elements in table) of the specified mapping.
What you have to remember is that when using count over a specific column, null won't be allowed while count(*) will allow null in the rows as it could be any field.
How does SQL know what COUNT(*) is referring to?
I'm pretty sure, however not 100% sure as I can't find in doc, that the sql optimizer simply do a count on the primary key (not null) instead of trying to handle null in rows.

how to perform these queries?

I have these three tables:
create table albums(sernum number primary key,
Albname varchar2(30) not null,
Artist varchar2(20) not null,
Pdate number(4),
Recompany varchar2(10),
Media char(2) not null);
create table tracks(sernum number not null,
song varchar2(50) not null,
primary key(sernum, song),
foreign key(sernum) references albums(sernum));
create table performers(sernum number not null,
Artist varchar2(30) not null,
Instrument varchar2(50) not null,
primary key(sernum, Artist, Instrument),
foreign key (sernum) references albums(sernum));
I want to perform two queries in sql oracle:
list the names of the artists that used all instruments.
list the names of the albums containing the maximum number or songs.
here is my tries:
select distinct(a.Artist) from albums a where a.Artist like (select p.Artist, distinct(p.Instrument) from performers p) group by a.Artist, p.Instrument;
select a.Albname from albums a, inner join tracks t on where a.sernum in(select max(t.sernum) group by t.sernum);
Query 1 - get artists who have played all instruments:
SELECT
p.Artist
FROM
(
SELECT Artist, count(distinct Instrument) as InstrumentCount
FROM performers
GROUP BY artist
) p
JOIN
(
SELECT COUNT(DISTINCT Instrument) as InstrumentCount
FROM performers
) i
ON p.InstrumentCount = i.InstrumentCount
Explanation: 1st subquery gets the count of instruments played by each artist. 2nd subquery gets the count of unique instruments. The two are joined together based on this instrument count to give us only those artists whose instrument counts match the maximum.
--
Query 2 - Get albums containing the maximum number of songs:
WITH
AlbumTrackCount
(
SELECT
sernum,
COUNT(1) as TrackCount
FROM tracks
GROUP BY sernum
)
SELECT
a.Albname
FROM albums a
JOIN AlbumTrackCount atc
ON a.sernum = atc.sernum
AND atc.TrackCount =
(
SELECT MAX(TrackCount)
FROM AlbumTrackCount
)
Explanation: the WITH up top establishes a subquery we'll reuse; it gets us the track count within each album. Down below, we join the albums with this album track count, and add a filter that only those albums with a track count equal to the maximum track count of any of the albums. Note that this is different from the top query, which just got every instrument ever; here, it is important to first count up the tracks within each album, and then get the maximum of those counts.
Below are some of the issues with your queries:
SELECT DISTINCT (a.artist)
FROM albums a
WHERE a.artist LIKE (SELECT p.artist,
distinct(p.Instrument)
from performers p)
group by a.Artist, p.Instrument;
LIKE indicates that you're going to use a wildcard. When comparing against a sub-query in the where clause, you typically use in as the operator.
DISTINCT is not a function. It always applies to all of the columns in a SELECT statement.
DISTINCT and GROUP BY serve very similar purposes. You would rarely use both in the same statement.
You can't reference a column from a correlated sub-query (i.e. a query in the where clause), in the outer query.
SELECT a.albname
FROM albums a,
inner join tracks t
on
where a.sernum in(select max(t.sernum) group by t.sernum);
Your using both a comma and inner join to connect two tables. The comma indicates pre-SQL:1999 syntax, whereas INNER JOIN is SQL:1999. While, technically you can use both in a single FROM clause, you can't use both between a single pair of tables. Also, you shouldn't use both. Sticj to SQL:1999.
Your ON clause is empty. You should probably be joining your two tables here. If you really want to not have a join condition, change the join to CROSS JOIN (to re-iterate: you almost certainly don't actually want this).
You have a SELECT statement without a FROM clause. That is not allowed.

SQL Database SELECT question

Need some help with an homework assignment on SQL
Problem
Find out who (first name and last name) has played the most games in the chess tournament with an ID = 41
Background information
I got a table called Games, which contains information...
game ID
tournament ID
start_time
end_time
white_pieces_player_id
black_pieces_player_id
white_result
black_result
...about all the separate chess games that have taken place in three different tournaments ....
(tournaments having ID's of 41,42 and 47)
...and the first and last names of the players are stored in a table called People....
person ID (same ID which comes up in the table 'Games' as white_pieces_player_id and
black_pieces_player_id)
first_name
last_name
...how to make a SELECT statement in SQL that would give me the answer?
sounds like you need to limit by tournamentID in your where clause, join with the people table on white_pieces_player_id and black_pieces_player_id, and use the max function on the count of white_result = win union black_result = win.
interesting problem.
what do you have so far?
hmm... responding to your comment
SELECT isik.eesnimi
FROM partii JOIN isik ON partii.valge=isik.id
WHERE turniir='41'
group by isik.eesnimi
having count(*)>4
consider using the max() function instead of the having count(*)> number
you can add the last name to the select clause if you also add it to the group by clause
sry, I only speak American. What language is this code in?
I would aggregate a join to that table to a derived table like this:
SELECT a.last_name, a.first_name, CNT(b.gamecount) totalcount
FROM players a
JOIN (select cnt(*) gamecount, a.playerid
FROM games
WHERE a.tournamentid = 47
AND (white_player_id = a.playerid OR black_player_id = a.playerid)
GROUP BY playerid
) b
ON b.playerid = a.playerid
GROUP BY last_name, first_name
ORDER BY totalcount
something like this so that you are getting both counts for their black/white play and then joining and aggregating on that.
Then, if you only want the top one, just select the TOP 1