SQLZoo "More Join Operations" #15 - sql

Edit:
SQLZoo More Join Operations problem 15 has changed since I asked this question. It now states: "List the films released in the year 1978 ordered by the number of actors in the cast, then by title."
I give thanks to all who tried to help with the original phrasing. I've updated the accepted answer to match the current problem.
Original Question:
I'm trying to solve problem number 15 under SQLZoo More Join Operations (I'm brushing up for an interview tomorrow)
The question is: "List the 1978 films by order of cast list size. "
My answer is:
SELECT movie.title, count(casting.actorid)
FROM movie INNER JOIN casting
ON movie.id=casting.movieid
WHERE movie.yr=1978
GROUP BY movie.id
ORDER BY count(casting.actorid) desc
This is essentially identical to the answer given by Gideon Dsouza except that my solution does not assume titles are unique:
SELECT m.title, Count(c.actorid)
FROM casting c JOIN movie m ON
m.id = c.movieid
WHERE m.yr = 1978
GROUP BY m.title
ORDER BY Count(c.actorid) DESC
Neither my solution nor his is marked correct.
The results from my solution and the "correct" solution are given at the end. My list has two movies ("Piranha" and "The End") that the "correct" solution lacks. And the "correct" solution has two movies ("Force 10 From Navarone" and "Midnight Express") that mine lacks.
Since these movies are all in the smallest cast size, I hypothesized that SQLZoo is cutting off the query at 50 rows and it was an ordering irregularity that causes the difference. However, I tried adding ,fieldname to the end of my order by clause for all values of fieldname but none yielded an identical answer.
Am I doing something wrong or is SQLZoo broken?
Result Listings
My solution yields (after using libreoffice to make a fixed width column):
The Bad News Bears Go to Japan 50
The Swarm 37
Grease 28
American Hot Wax 27
The Boys from Brazil 26
Heaven Can Wait 25
Big Wednesday 21
Orchestra Rehearsal 19
A Night Full of Rain 19
A Wedding 19
The Cheap Detective 19
Go Tell the Spartans 18
Superman 17
Movie Movie 17
The Driver 17
The Cat from Outer Space 17
Death on the Nile 17
The Star Wars Holiday Special 17
Blue Collar 16
J.R.R. Tolkien's The Lord of the 16
Ice Castles 16
International Velvet 16
Coming Home 15
Revenge of the Pink Panther 15
The Brink's Job 15
David 15
The Chant of Jimmie Blacksmith 15
The Water Babies 15
Violette Nozière 15
Occupation in 26 Pictures 15
Without Anesthesia 15
Bye Bye Monkey 15
Alexandria... Why? 15
Who'll Stop The Rain 15
Gray Lady Down 15
Damien: Omen II 14
The Empire of Passion 14
Bread and Chocolate 14
I Wanna Hold Your Hand 14
Closed Circuit 14
Almost Summer 13
Goin' South 13
An Unmarried Woman 13
The Left-Handed Woman 13
Foul Play 13
The End 12
California Suite 12
In Praise of Older Women 12
Jaws 2 12
Piranha 12
The correct answer is given as:
The Bad News Bears Go to Japan 50
The Swarm 37
Grease 28
American Hot Wax 27
The Boys from Brazil 26
Heaven Can Wait 25
Big Wednesday 21
A Wedding 19
A Night Full of Rain 19
Orchestra Rehearsal 19
The Cheap Detective 19
Go Tell the Spartans 18
Superman 17
The Star Wars Holiday Special 17
Death on the Nile 17
The Cat from Outer Space 17
Movie Movie 17
The Driver 17
Blue Collar 16
Ice Castles 16
J.R.R. Tolkien's The Lord of the 16
International Velvet 16
Coming Home 15
The Brink's Job 15
Gray Lady Down 15
Bye Bye Monkey 15
Without Anesthesia 15
Violette Nozière 15
The Water Babies 15
Revenge of the Pink Panther 15
Who'll Stop The Rain 15
Alexandria... Why? 15
Occupation in 26 Pictures 15
David 15
The Chant of Jimmie Blacksmith 15
The Empire of Passion 14
Damien: Omen II 14
Closed Circuit 14
Bread and Chocolate 14
I Wanna Hold Your Hand 14
An Unmarried Woman 13
Almost Summer 13
Goin' South 13
Foul Play 13
The Left-Handed Woman 13
Jaws 2 12
California Suite 12
In Praise of Older Women 12
Force 10 From Navarone 12
Midnight Express 12

There is no problem on SQL ZOO but you just need to add title to ORDER By clause
because the requirement is to order count then title. Below is the modified version of your sql:
SELECT m.title, Count(c.actorid)
FROM casting c JOIN movie m ON
m.id = c.movieid
WHERE m.yr = 1978
GROUP BY m.title
ORDER BY Count(c.actorid) DESC, title

I do think SQLZoo is coding their answer a little bit differently than OP causing ties to be non-ordered.. as far as I can tell. I was stuck on this problem as well with an answer similar to OP.
I did try different combinations of GROUP BY and JOIN (per comments) before I resorted to custom ordering.. maybe I missed the correct combination..
So, in order to get SQLZoo's answer (the "smiley face"), I had to use CASE title WHEN to put in a custom order for ties:
SELECT title, COUNT(actorid)
FROM movie JOIN casting
ON (movieid=movie.id)
WHERE yr=1978
GROUP BY title
ORDER BY COUNT(actorid) DESC,
CASE title
WHEN 'A Wedding' THEN 1
WHEN 'A Night Full of Rain' THEN 2
WHEN 'Orchestra Rehearsal' THEN 3
WHEN 'The Cheap Detective' THEN 4
WHEN 'The Driver' THEN 1
WHEN 'Movie Movie' THEN 2
WHEN 'Superman' THEN 3
WHEN 'The Star Wars Holiday Special' THEN 4
WHEN 'Death on the Nile' THEN 5
WHEN 'The Cat from Outer Space' THEN 6
WHEN 'Blue Collar' THEN 1
WHEN 'Ice Castles' THEN 2
WHEN "J.R.R Tolkien's The Lord of the Rings" THEN 3
WHEN 'International Velvet' THEN 4
WHEN 'Alexandria... Why?' THEN 1
WHEN 'Occupation in 26 Pictures' THEN 2
WHEN 'The Chant of Jimmie Blacksmith' THEN 3
WHEN 'David' THEN 4
WHEN "The Brink's Job" THEN 5
WHEN 'Coming Home' THEN 6
WHEN 'Gray Lady Down' THEN 7
WHEN 'Bye Bye Monkey' THEN 8
WHEN 'Without Anesthesia' THEN 9
WHEN 'Violette Nozière' THEN 10
WHEN 'The Water Babies' THEN 11
WHEN 'Revenge of the Pink Panther' THEN 12
WHEN "Who'll Stop The Rain" THEN 13
WHEN 'The Empire of Passion' THEN 1
WHEN 'Damien: Omen II' THEN 2
WHEN 'Closed Circuit' THEN 3
WHEN 'Bread and Chocolate' THEN 4
WHEN 'I Wanna Hold Your Hand' THEN 5
WHEN 'Foul Play' THEN 1
WHEN 'The Left-Handed Woman' THEN 2
WHEN 'An Unmarried Woman' THEN 3
WHEN 'Almost Summer' THEN 4
WHEN "Goin' South" THEN 5
WHEN 'Piranha' THEN 1
WHEN 'Jaws 2' THEN 2
WHEN 'California Suite' THEN 3
WHEN 'In Praise of Older Woman' THEN 4
WHEN 'Force 10 From Navarone' THEN 5
WHEN 'Midnight Express' THEN 6
WHEN 'The End' THEN 7
END ASC
It is super cumbersome.. you should understand OP's answer before you copy/paste the CASE title WHEN to get SQLZoo's "smiley face".
A helpful SORTresource: tutorialspoint.com -- scroll down to:
To fetch the rows with own preferred order, the SELECT query would as follows:

I have had the same problem with this exercise about ordering the answer so, I go trying each column rs. This works for the site:
ORDER BY count(c.actorid) DESC, budget DESC

It seems that sqlzoo.net is being really cerebral about using an alias for the count of actors on the casting crew. Here is the query I've finally arrived at, and it's worked. I've used actors as alias, but it will accept the answer as correct with cast or any other alias for the count.
SELECT title, count(actorid) AS actors
FROM movie
JOIN casting ON movie.id = movieid
WHERE yr = 1978
GROUP BY title
ORDER BY actors DESC
Please note that sqlzoo.net may or may not accept the correct answer right away. Try it a few times; you might even need to navigate away and then back to that page to get it working and see the smiley face. The page behaves weirdly to say the least, but I hope it will help others looking for the correct answer. :)

SELECT movie.title, COUNT(*) AS actors
FROM movie
JOIN casting
ON movie.id = casting.movieid
WHERE movie.yr = 1978
GROUP BY title
ORDER BY actors DESC, title

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Snowflake/SQL Duplicate Records based on values within comma seperated list

I have a dataframe of User IDs and Tags as shown below under 'Current Data' .
The Goal:
I want to be able to duplicate records per each value under the tags column. As you can see in the target output, user ID 21 is repeated 3x for each of the three tags that are in the source 'TAGS' - everything is duplicated except the Tag column - 1 Record per item in the comma separated list.
Issue:
I looked at using the SPLIT_TO_TABLE functionality in snowflake but it doesn't work in my use case as not all the tags are consistently in some kind of order and in some cases, the cell is also blank.
Current Data:
USER_ID CITY STATUS PPL TAGS
21 LA checked 6 bad ui/ux,dashboards/reporting,pricing
32 SD checked 9 buggy,laggy
21 ATL checked 9
234 MIA checked 5 glitchy, bad ui/ux, horrible
The target:
USER_ID CITY STATUS PPL TAGS
21 LA checked 6 bad ui/ux
21 LA checked 6 dashboards/reporting
21 LA checked 6 Pricing
32 SD checked 9 buggy
32 SD checked 9 laggy
21 ATL checked 9
234 MIA checked 5 glitchy
234 MIA checked 5 bad ui/ux
234 MIA checked 5 horrible
Sql:
select table1.value
from table(split_to_table('a.b', '.')) as table1
SPLIT_TO_TABLE works. Below is the query using your sample data:
select USER_ID, CITY, STATUS, PPL, VALUE
from (values
(21,'LA','checked',6,'bad ui/ux,dashboards/reporting,pricing')
,(32,'SD','checked',9,'buggy,laggy')
,(21,'ATL','checked',9,'')
,(234,'MIA','checked',5,'glitchy, bad ui/ux, horrible')
) as tbl (USER_ID,CITY,STATUS,PPL,TAGS)
, lateral split_to_table(tbl.tags,',');
Result:
USER_ID CITY STATUS PPL VALUE
21 LA checked 6 bad ui/ux
21 LA checked 6 dashboards/reporting
21 LA checked 6 pricing
32 SD checked 9 buggy
32 SD checked 9 laggy
21 ATL checked 9
234 MIA checked 5 glitchy
234 MIA checked 5 bad ui/ux
234 MIA checked 5 horrible

Adding rows in a table from data that is not in a column

I'm trying to create a table to add all Medals won by the participant countries in the Olympics.
I scraped the data from Wikipedia and have something similar to this:
Year
Country_Name
Host_city
Host_Country
Gold
Silver
Bronze
1986
146
Los Angeles
United States
41
32
30
1986
67
Los Angeles
United States
12
12
12
And so on
I double-checked the data for some years, and it seems very accurate. The Country_Name has an ID because I have a Country_ID table that I created and updated the names with the ID:
Country_ID
Country_Name
1986
1
1986
2
So far so good. Now I want to create a new table where I'll have all countries in a specific year and the total medals for that country. I managed to easily do that for countries that participated in an edition, here's an example for the 1896 edition:
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1896 AND Year < 1900
Group By a.Country_Name, a.Year
And I'll have this table:
Country_ID
Year
Cumultative_Gold
Cumultative_Silver
Cumultative_Bronze
Total_Medals
6
1986
2
0
0
5
7
1986
2
1
2
5
35
1986
1
2
3
6
46
1986
5
4
2
11
49
1986
6
5
2
13
51
1986
2
3
2
7
52
1986
10
18
19
47
58
1986
2
1
3
6
85
1986
1
0
1
2
131
1986
1
2
0
3
146
1986
11
7
2
20
To add the other editions I just have to edit the dates, "Where a.Year >= 1900 AND Year < 1904", for example.
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1900 AND Year < 1904
Group By a.Country_Name, a.Year
And the table will grow.
But I'd like to also add all the other countries for the year 1896. This way I'll have a full record of all countries. So for example, you see that Country 1 has no medals in the 1896 Olympic edition, but I'd like to also add it there, even if the sum becomes NULL (where I'll update with a 0).
Why do I want that? I'd like to do an Animated Bar Chart Race, and with the data I have, some counties go "away" from the race. For example, the US didn't participate in the 1980 Olympics, so for a brief moment, the Bar for the US in the chart goes away just to return in 1984 (when it participated again). Another example is the Soviet Union, even though they do not participate anymore, it's the second participant with most medals won (only behind the US), but as the country does not have more participation after 1988, the bar just goes away after that year. By keeping a record of medals for all countries in all editions would prevent that from happening.
I'm pretty sure there are lots of countries that have won metals that were not around in 1896. But if you want a row for every country and every year, then generate the rows you want using cross join. Then join in the available information:
select c.Country_Name, y.Year,
SUM(cm.Gold) As Cumulative_Gold,
SUM(cm.Silver) As Cumulative_Silver,
SUM(cm.Bronze) As Cumulative_Bronze,
COALESCE(SUM(cm.Gold), 0) + COALESCE(SUM(cm.Silver), 0) + COALESCE(SUM(cm.Bronze), 0) AS Total_Medals
from (select distinct year from Country_Medals) y cross join
(select distinct country_name from country_medals) c left join
country_medals cm
on cm.year = y.year and
cm.country_name = c.country_name
group By c.Country_Name, y.Year

How to query DBpedia online using SQL?

DBpedia just released their data as tables, suitable to import into a relational database. How can I query this data online using SQL?
Dataset:
http://wiki.dbpedia.org/DBpediaAsTables
I took the raw data, uploaded it to BigQuery, and made it public. So far I've done it with the 'person' and the 'place' table. Check them at https://bigquery.cloud.google.com/table/fh-bigquery:dbpedia.person.
Now is easy to know what are the most popular alma maters, for example:
SELECT COUNT(*), almaMater_label
FROM [fh-bigquery:dbpedia.person]
WHERE almaMater_label != 'NULL'
GROUP BY 2
ORDER BY 1 DESC
It's a little more complicated than that, as some people have more than one alma mater - and the particular way DBpedia encodes that. I left the complete query at http://www.reddit.com/r/bigquery/comments/1rjee7/query_wikipedia_in_bigquery_the_dbpedia_dataset/.
Btw, the top alma maters are:
494 Harvard University
320 University of Cambridge
314 University of Michigan
267 Yale University
216 Trinity College Cambridge
You can also do joins between tables.
For example, for each building (from the place table) that has an architect: What year was that architect born? How many buildings with an architect born that year are listed in DBpedia?
SELECT COUNT(*), LEFT(b.birthDate, 4) birthYear
FROM [fh-bigquery:dbpedia.place] a
JOIN EACH [fh-bigquery:dbpedia.person] b
ON a.architect = b.URI
WHERE a.architect != 'NULL'
AND birthDate != 'NULL'
GROUP BY 2
ORDER BY 2
Results:
...
8 1934
13 1935
9 1937
7 1938
17 1939
7 1941
1 1943
15 1944
10 1945
12 1946
7 1947
9 1950
20 1951
1 1952
...
(Google BigQuery has a free monthly quota to query, up to a 100GB each month)
(DBpedia data from version 3.4 on is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License. http://dbpedia.org/Datasets#h338-24)

Retrieve top 48 unique records from database based on a sorted Field

I have database table that I am after some SQL for (Which is defeating me so far!)
Imagine there are 192 Athletic Clubs who all take part in 12 Track Meets per season.
So that is 2304 individual performances per season (for example in the 100Metres)
I would like to find the top 48 (unique) individual performances from the table, these 48 athletes are then going to take part in the end of season World Championships.
So imagine the 2 fastest times are both set by "John Smith", but he can only be entered once in the world champs. So i would then look for the next fastest time not set by "John Smith"... so on and so until I have 48 unique athletes..
hope that makes sense.
thanks in advance if anyone can help
PS
I did have a nice screen shot created that would explain it much better. but as a newish user i cannot post images.
I'll try a copy and paste version instead...
ID AthleteName AthleteID Time
1 Josh Lewis 3 11.99
2 Joe Dundee 4 11.31
3 Mark Danes 5 13.44
4 Josh Lewis 3 13.12
5 John Smith 1 11.12
6 John Smith 1 12.18
7 John Smith 1 11.22
8 Adam Bennett 6 11.33
9 Ronny Bower 7 12.88
10 John Smith 1 13.49
11 Adam Bennett 6 12.55
12 Mark Danes 5 12.12
13 Carl Tompkins 2 13.11
14 Joe Dundee 4 11.28
15 Ronny Bower 7 12.14
16 Carl Tompkin 2 11.88
17 Nigel Downs 8 14.14
18 Nigel Downs 8 12.19
Top 4 unique individual performances
1 John Smith 1 11.12
3 Joe Dundee 4 11.28
5 Adam Bennett 6 11.33
6 Carl Tompkins 2 11.88
Basically something like this:
select top 48 *
from (
select athleteId,min(time) as bestTime
from theRaces
where raceId = '123' -- e.g., 123=100 meters
group by athleteId
) x
order by bestTime
try this --
select x.ID, x.AthleteName , x.AthleteID , x.Time
(
select rownum tr_count,v.AthleteID AthleteID, v.AthleteName AthleteName, v.Time Time,v.id id
from
(
select
tr1.AthleteName AthleteName, tr1.Time time,min(tr1.id) id, tr1.AthleteID AthleteID
from theRaces tr1
where time =
(select min(time) from theRaces tr2 where tr2.athleteId = tr1.athleteId)
group by tr1.AthleteName, tr1.AthleteID, tr1.Time
having tr1.Time = ( select min(tr2.time) from theRaces tr2 where tr1.AthleteID =tr2.AthleteID)
order by tr1.time
) v
) x
where x.tr_count < 48