How to query DBpedia online using SQL? - sql

DBpedia just released their data as tables, suitable to import into a relational database. How can I query this data online using SQL?
Dataset:
http://wiki.dbpedia.org/DBpediaAsTables

I took the raw data, uploaded it to BigQuery, and made it public. So far I've done it with the 'person' and the 'place' table. Check them at https://bigquery.cloud.google.com/table/fh-bigquery:dbpedia.person.
Now is easy to know what are the most popular alma maters, for example:
SELECT COUNT(*), almaMater_label
FROM [fh-bigquery:dbpedia.person]
WHERE almaMater_label != 'NULL'
GROUP BY 2
ORDER BY 1 DESC
It's a little more complicated than that, as some people have more than one alma mater - and the particular way DBpedia encodes that. I left the complete query at http://www.reddit.com/r/bigquery/comments/1rjee7/query_wikipedia_in_bigquery_the_dbpedia_dataset/.
Btw, the top alma maters are:
494 Harvard University
320 University of Cambridge
314 University of Michigan
267 Yale University
216 Trinity College Cambridge
You can also do joins between tables.
For example, for each building (from the place table) that has an architect: What year was that architect born? How many buildings with an architect born that year are listed in DBpedia?
SELECT COUNT(*), LEFT(b.birthDate, 4) birthYear
FROM [fh-bigquery:dbpedia.place] a
JOIN EACH [fh-bigquery:dbpedia.person] b
ON a.architect = b.URI
WHERE a.architect != 'NULL'
AND birthDate != 'NULL'
GROUP BY 2
ORDER BY 2
Results:
...
8 1934
13 1935
9 1937
7 1938
17 1939
7 1941
1 1943
15 1944
10 1945
12 1946
7 1947
9 1950
20 1951
1 1952
...
(Google BigQuery has a free monthly quota to query, up to a 100GB each month)
(DBpedia data from version 3.4 on is licensed under the terms of the Creative Commons Attribution-ShareAlike 3.0 license and the GNU Free Documentation License. http://dbpedia.org/Datasets#h338-24)

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Correlating a Three Column query with no duplicated results

I'm trying to achieve a result where only one result for each TEAM and each PLACE is returned.
The twist is that the highest result should from each place should have priority.
My table currently looks something like this:
ENTRY_ID TEAM_ID DATE PLACE SCORE
1 1 2021-10-12 Ireland 64
2 2 2021-10-12 Ireland 31
3 3 2021-10-12 France 137
4 2 2021-10-12 France 61
5 5 2021-10-12 France 38
6 1 2021-10-12 France 66
7 2 2021-10-12 Italy 17
8 3 2021-10-12 Italy 61
9 1 2021-10-12 Italy 74
The competition is held at three different places at the same time, with technically all teams being able to have people playing in all of them at the same time.
Each team however can only win one point so, in the example, it's possible to see that Team 1 would win both in Italy and Ireland, but it should be awarded only one point for the highest score, so only Italy. The point in Ireland should go to the second place.
I've tried over 30 queries I've found in several correlated questions, but none of them seems to be applicable to my situation.
Basically:
"Return the highest score on each PLACE, but only calls each TEAM once.
If that certain TEAM was already called, ignore it, get the second place."
So I could retrieve all three winners with no further processing. The results I'm trying to achieve should repeat neither the TEAM_ID nor PLACE, in this particular example it should output:
3 FRANCE (Since it has the highest score in France at 137)
1 ITALY (For the highest score in Italy at 74)
2 IRELAND (For the second-highest score in Ireland, since Team 1 already won in Italy)
The production model of this table has far more entries so it's unlikely there would be any clashes with too many second-places.
How can I achieve that?

Loop through a table based on multiple conditions

Students table
student_id student_name
1 John
2 Mary
Grades table
student_id year grade_level school Course Mark
1 2015 10 Smith High Algebra 95
1 2015 10 Smith High English 96
1 2016 11 Smith High Geometry 85
1 2016 11 Smith High Science 88
2 2015 10 Smith High Algebra 98
2 2015 10 Smith High English 93
2 2016 11 Smith High Geometry 97
2 2016 11 Smith High Science 86
I'm trying to show results for each year and what class a student took with the grade.
So the final output i'm looking for is something like:
[student_id1] [year1] [grade1] [school1]
[course1] [mark1]
[course2] [mark2]
[course3] [mark3]...
[student_id1] [year2] [grade2] [school1]
[course1] [mark1]
[course2] [mark2]
[course3] [mark3]...
[student_id2] [year1] [grade1] [school1]
[course1] [mark1]
[course2] [mark2]
[course3] [mark3]...
This would all go in one column/row. So in this particular example, this would be my result:
1 2015 10 Smith High
Algebra 95
English 96
1 2016 11 Smith High
Geometry 85
Science 88
2 2015 10 Smith High
Algebra 98
English 93
2 2016 11 Smith High
Geometry 97
Science 86
So anytime a student id, year, grade, or school name changes, I would have a line for that and loop through the classes taken within that group. And all of this would be in one column/row.
This is what I have so far but I'm not sure how I can properly loop through course and grades for each group. I'd appreciate it if I can be pointed in the right direction.
select s.student_id + '' + year + '' + grade_level + '' + school
from students
join grades on students.student_id = grades.student_id
If you want to do it in your SQL Enviromnment, it depends on the Database Management System you are using.
For example, if you are using Transact SQL you can try to look at this link.
Generally this kind of loops and interactions are done in the programming language that is coupled with the SQL DB.
Anyway, you should look at Stored Procedures and Cursors if you really want to do this in SQL.
You are trying to mix presentation with retrieval of data from database tables. Looping through the resultset in sql can be achieved via cursor but that isn't adviced. You are better off by pulling the required data using two queries and later print it using a language of your choice.

SQLZoo "More Join Operations" #15

Edit:
SQLZoo More Join Operations problem 15 has changed since I asked this question. It now states: "List the films released in the year 1978 ordered by the number of actors in the cast, then by title."
I give thanks to all who tried to help with the original phrasing. I've updated the accepted answer to match the current problem.
Original Question:
I'm trying to solve problem number 15 under SQLZoo More Join Operations (I'm brushing up for an interview tomorrow)
The question is: "List the 1978 films by order of cast list size. "
My answer is:
SELECT movie.title, count(casting.actorid)
FROM movie INNER JOIN casting
ON movie.id=casting.movieid
WHERE movie.yr=1978
GROUP BY movie.id
ORDER BY count(casting.actorid) desc
This is essentially identical to the answer given by Gideon Dsouza except that my solution does not assume titles are unique:
SELECT m.title, Count(c.actorid)
FROM casting c JOIN movie m ON
m.id = c.movieid
WHERE m.yr = 1978
GROUP BY m.title
ORDER BY Count(c.actorid) DESC
Neither my solution nor his is marked correct.
The results from my solution and the "correct" solution are given at the end. My list has two movies ("Piranha" and "The End") that the "correct" solution lacks. And the "correct" solution has two movies ("Force 10 From Navarone" and "Midnight Express") that mine lacks.
Since these movies are all in the smallest cast size, I hypothesized that SQLZoo is cutting off the query at 50 rows and it was an ordering irregularity that causes the difference. However, I tried adding ,fieldname to the end of my order by clause for all values of fieldname but none yielded an identical answer.
Am I doing something wrong or is SQLZoo broken?
Result Listings
My solution yields (after using libreoffice to make a fixed width column):
The Bad News Bears Go to Japan 50
The Swarm 37
Grease 28
American Hot Wax 27
The Boys from Brazil 26
Heaven Can Wait 25
Big Wednesday 21
Orchestra Rehearsal 19
A Night Full of Rain 19
A Wedding 19
The Cheap Detective 19
Go Tell the Spartans 18
Superman 17
Movie Movie 17
The Driver 17
The Cat from Outer Space 17
Death on the Nile 17
The Star Wars Holiday Special 17
Blue Collar 16
J.R.R. Tolkien's The Lord of the 16
Ice Castles 16
International Velvet 16
Coming Home 15
Revenge of the Pink Panther 15
The Brink's Job 15
David 15
The Chant of Jimmie Blacksmith 15
The Water Babies 15
Violette Nozière 15
Occupation in 26 Pictures 15
Without Anesthesia 15
Bye Bye Monkey 15
Alexandria... Why? 15
Who'll Stop The Rain 15
Gray Lady Down 15
Damien: Omen II 14
The Empire of Passion 14
Bread and Chocolate 14
I Wanna Hold Your Hand 14
Closed Circuit 14
Almost Summer 13
Goin' South 13
An Unmarried Woman 13
The Left-Handed Woman 13
Foul Play 13
The End 12
California Suite 12
In Praise of Older Women 12
Jaws 2 12
Piranha 12
The correct answer is given as:
The Bad News Bears Go to Japan 50
The Swarm 37
Grease 28
American Hot Wax 27
The Boys from Brazil 26
Heaven Can Wait 25
Big Wednesday 21
A Wedding 19
A Night Full of Rain 19
Orchestra Rehearsal 19
The Cheap Detective 19
Go Tell the Spartans 18
Superman 17
The Star Wars Holiday Special 17
Death on the Nile 17
The Cat from Outer Space 17
Movie Movie 17
The Driver 17
Blue Collar 16
Ice Castles 16
J.R.R. Tolkien's The Lord of the 16
International Velvet 16
Coming Home 15
The Brink's Job 15
Gray Lady Down 15
Bye Bye Monkey 15
Without Anesthesia 15
Violette Nozière 15
The Water Babies 15
Revenge of the Pink Panther 15
Who'll Stop The Rain 15
Alexandria... Why? 15
Occupation in 26 Pictures 15
David 15
The Chant of Jimmie Blacksmith 15
The Empire of Passion 14
Damien: Omen II 14
Closed Circuit 14
Bread and Chocolate 14
I Wanna Hold Your Hand 14
An Unmarried Woman 13
Almost Summer 13
Goin' South 13
Foul Play 13
The Left-Handed Woman 13
Jaws 2 12
California Suite 12
In Praise of Older Women 12
Force 10 From Navarone 12
Midnight Express 12
There is no problem on SQL ZOO but you just need to add title to ORDER By clause
because the requirement is to order count then title. Below is the modified version of your sql:
SELECT m.title, Count(c.actorid)
FROM casting c JOIN movie m ON
m.id = c.movieid
WHERE m.yr = 1978
GROUP BY m.title
ORDER BY Count(c.actorid) DESC, title
I do think SQLZoo is coding their answer a little bit differently than OP causing ties to be non-ordered.. as far as I can tell. I was stuck on this problem as well with an answer similar to OP.
I did try different combinations of GROUP BY and JOIN (per comments) before I resorted to custom ordering.. maybe I missed the correct combination..
So, in order to get SQLZoo's answer (the "smiley face"), I had to use CASE title WHEN to put in a custom order for ties:
SELECT title, COUNT(actorid)
FROM movie JOIN casting
ON (movieid=movie.id)
WHERE yr=1978
GROUP BY title
ORDER BY COUNT(actorid) DESC,
CASE title
WHEN 'A Wedding' THEN 1
WHEN 'A Night Full of Rain' THEN 2
WHEN 'Orchestra Rehearsal' THEN 3
WHEN 'The Cheap Detective' THEN 4
WHEN 'The Driver' THEN 1
WHEN 'Movie Movie' THEN 2
WHEN 'Superman' THEN 3
WHEN 'The Star Wars Holiday Special' THEN 4
WHEN 'Death on the Nile' THEN 5
WHEN 'The Cat from Outer Space' THEN 6
WHEN 'Blue Collar' THEN 1
WHEN 'Ice Castles' THEN 2
WHEN "J.R.R Tolkien's The Lord of the Rings" THEN 3
WHEN 'International Velvet' THEN 4
WHEN 'Alexandria... Why?' THEN 1
WHEN 'Occupation in 26 Pictures' THEN 2
WHEN 'The Chant of Jimmie Blacksmith' THEN 3
WHEN 'David' THEN 4
WHEN "The Brink's Job" THEN 5
WHEN 'Coming Home' THEN 6
WHEN 'Gray Lady Down' THEN 7
WHEN 'Bye Bye Monkey' THEN 8
WHEN 'Without Anesthesia' THEN 9
WHEN 'Violette Nozière' THEN 10
WHEN 'The Water Babies' THEN 11
WHEN 'Revenge of the Pink Panther' THEN 12
WHEN "Who'll Stop The Rain" THEN 13
WHEN 'The Empire of Passion' THEN 1
WHEN 'Damien: Omen II' THEN 2
WHEN 'Closed Circuit' THEN 3
WHEN 'Bread and Chocolate' THEN 4
WHEN 'I Wanna Hold Your Hand' THEN 5
WHEN 'Foul Play' THEN 1
WHEN 'The Left-Handed Woman' THEN 2
WHEN 'An Unmarried Woman' THEN 3
WHEN 'Almost Summer' THEN 4
WHEN "Goin' South" THEN 5
WHEN 'Piranha' THEN 1
WHEN 'Jaws 2' THEN 2
WHEN 'California Suite' THEN 3
WHEN 'In Praise of Older Woman' THEN 4
WHEN 'Force 10 From Navarone' THEN 5
WHEN 'Midnight Express' THEN 6
WHEN 'The End' THEN 7
END ASC
It is super cumbersome.. you should understand OP's answer before you copy/paste the CASE title WHEN to get SQLZoo's "smiley face".
A helpful SORTresource: tutorialspoint.com -- scroll down to:
To fetch the rows with own preferred order, the SELECT query would as follows:
I have had the same problem with this exercise about ordering the answer so, I go trying each column rs. This works for the site:
ORDER BY count(c.actorid) DESC, budget DESC
It seems that sqlzoo.net is being really cerebral about using an alias for the count of actors on the casting crew. Here is the query I've finally arrived at, and it's worked. I've used actors as alias, but it will accept the answer as correct with cast or any other alias for the count.
SELECT title, count(actorid) AS actors
FROM movie
JOIN casting ON movie.id = movieid
WHERE yr = 1978
GROUP BY title
ORDER BY actors DESC
Please note that sqlzoo.net may or may not accept the correct answer right away. Try it a few times; you might even need to navigate away and then back to that page to get it working and see the smiley face. The page behaves weirdly to say the least, but I hope it will help others looking for the correct answer. :)
SELECT movie.title, COUNT(*) AS actors
FROM movie
JOIN casting
ON movie.id = casting.movieid
WHERE movie.yr = 1978
GROUP BY title
ORDER BY actors DESC, title

Query to run through each instance. MS-Access

I have a MS-Access database with two tables which I would like to query from, the basic table schema is shown below. I am looking to pull out the details for the earliest parish church in each parish – and in the instance that there is no church with ‘parish’ in the name; I would like to pull out the earliest church.
SITEDETAIL:
Site
Reference No. | Civil Parish | Site Name | NGR East | NGR North
1 Assynt Old Parish Church 6137 3172
2 Assynt St. Marys 6097 3870
3 Assynt New Parish Church 6249 3490
4 Bower Grimbister 2095 4067
5 Bower St. Andrews 2304 3194
6 Halkirk Firth Parish Church 7136 3450
7 Holm Strath Parish Church 4586 2045
8 Holm St Nicholas Parish 4132 3146
SITEDATES:
Site
Reference No. | Date
1 1812
2 1300
3 1900
4 1760
5 1750
6 1838
7 1619
8 1774
I have written a query that pulls out all the instances of ‘parish’:
SELECT SITEDETAIL.SITEREFNO, SITEDETAIL.CIVPARBUR_CDE, SITEDETAIL.SITENAME, SITEDETAIL.NGRE, SITEDETAIL.NGRN, SITEDATES.DATE
FROM SITEDETAIL INNER JOIN SITEDATES ON SITEDETAIL.SITEREFNO = SITEDATES.SITEREFNO
WHERE (((SITEDETAIL.SITENAME) Like "par*"));
However, this does not take into account the instances of multiple/no churches with ‘par*’ in the name.
Is it possible to create an SQL query that runs through each civil parish and selects the earliest ‘parish’ or earliest church, or is it necessary to write a perl script to run through them? Is this possible using DBI?
Desired output:
Site
Reference No. | Civil Parish | Site Name | NGR East | NGR North | Date
1 Assynt Old Parish Church 6137 3172 1812
5 Bower St. Andrews 2304 3194 1750
6 Halkirk Firth Parish Church 7136 3450 1838
7 Holm Strath Parish Church 4586 2045 1619
NB:In the case of Assynt, 'Old Parish Church' is selected despite being older because of having 'parish' in the name.
The following query should get you what you need. It's a little long, but it does the trick:
`select LIST.Civil_Parish, SD.Site_name, LIST.MSite_Date
from
(
select Civil_Parish, min(Site_date) as MSite_date
from SiteDetail
where Boolean = 1
group by Civil_Parish
union
select Civil_parish, min(Site_date) as MSite_date
from SiteDetail
where Civil_parish not in
(select Civil_parish
from SiteDetail
where Boolean = 1)
group by Civil_Parish) as LIST
left join sitedetail SD on LIST.Civil_Parish = SD.Civil_Parish and LIST.MSite_Date = SD.Site_Date`
Please note the following:
1) I am using PowerUser's boolean suggestion. If the Boolean column has value 1, then the row is a Parish Church, and 0 if it is not.
2) I combined the tables "SiteDates" and "SiteDetails" for the purpose of this example, as they are 1 to 1.
The core of the query is A) finding the oldest Parish church in a Parish, then B) find Parishes without Parish Churches.
The code for A) is as follows:
'select Civil_Parish, min(Site_date) as MSite_date
from SiteDetail
where Boolean = 1
group by Civil_Parish'
We then union that with the oldest churches in parishes that do not have a parish church:
'select Civil_parish, min(Site_date) as MSite_date
from SiteDetail
where Civil_parish not in
(select Civil_parish
from SiteDetail
where Boolean = 1)
group by Civil_Parish'
We then join the union query (named "LIST" here) with our original "SITEDETAIL" table on Parish and Date to bring in the church name.