How to exclude values from output in SPARQL query for Wikidata? - sparql

I'm trying to write a SPARQL query using Wikidata Query Service to retrieve all prime ministers of the Netherlands from 1970. However, I want to filter the output by checking if a prime minister worked for a university. If a minister worked for a university, it should not be in the output.
I think I must use the FILTER NOT EXISTS expression, but do not know how to properly write this line. Can someone please help me out?
See below for my query and output:
SELECT ?pmLabel ?start ?companyLabel
WHERE
{
?pm wdt:P39 wd:Q3058109.
?pm p:P39 ?posHeld.
?pm wdt:P108 ?company.
?posHeld ps:P39 wd:Q3058109.
?posHeld pq:P580 ?start.
FILTER(year(?start) > 1970)
# FILTER NOT EXISTS(?company (something) "Universit")
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # labels
}
ORDER BY DESC(?start)
+----------------------+------------------+------------------------------+
| primeMinisterLabel | start | companyLabel |
+----------------------+------------------+------------------------------+
| Mark Rutte | 14 October 2010 | Unilever |
| Mark Rutte | 14 October 2010 | Calvé |
| Jan Peter Balkenende | 22 July 2002 | Erasmus University Rotterdam |
| Jan Peter Balkenende | 22 July 2002 | Vrije Universiteit Amsterdam |
| Ruud Lubbers | 4 November 1982 | United Nations |
| Ruud Lubbers | 4 November 1982 | Harvard University |
| Ruud Lubbers | 4 November 1982 | Tilburg University |
| Ruud Lubbers | 4 November 1982 | Hollandia |
| Dries van Agt | 19 December 1977 | Kyoto University |
| Dries van Agt | 19 December 1977 | Radboud University Nijmegen |
| Dries van Agt | 19 December 1977 | Kwansei Gakuin University |
| Dries van Agt | 19 December 1977 | Ritsumeikan University |
+----------------------+------------------+------------------------------+

Related

Pandas - Pivot and Rearrange Table With Multiple Labels in Same Header

I have an xlsx file with tabs for multiple years of data. Each tab contains a table with many columns and the table is structured like this:
+-----------+-------+-------------------------+----------------------+
| City | State | Number of Drivers, 2019 | Number of Cars, 2019 |
+-----------+-------+-------------------------+----------------------+
| LA | CA | 123 | 10.0 |
| San Diego | CA | 456 | 2345 |
+-----------+-------+-------------------------+----------------------+
I would like to rearrange the table to look like this, and do it for each tab in the xlsx:
+-----------+-------+------+-------------------+---------------+
| City | State | Year | Measure Name | Measure Value |
+-----------+-------+------+-------------------+---------------+
| LA | CA | 2019 | Number of Drivers | 123 |
| San Diego | CA | 2019 | Number of Drivers | 456 |
| LA | CA | 2019 | Number of Cars | 10 |
| San Diego | CA | 2019 | Number of Cars | 2345 |
+-----------+-------+------+-------------------+---------------+
There are a lot of moving pieces to this and has been a little tricky to get the final formatting correct.
We do melt then join with str.split
s=df.melt(['City','State'])
s=s.join(s.variable.str.split(',',expand=True))
Out[120]:
City State variable value 0 1
0 LA CA NumberofDrivers,2019 123.0 NumberofDrivers 2019
1 SanDiego CA NumberofDrivers,2019 456.0 NumberofDrivers 2019
2 LA CA NumberofCars,2019 10.0 NumberofCars 2019
3 SanDiego CA NumberofCars,2019 2345.0 NumberofCars 2019
# if you need change the name adding .rename(columns={}) at the end
This is how I wwas able to apply Yoben's solution to every tab in the xlsx file, append them together and write the full table to a .csv:
sheets_dict = pd.read_excel(r'file.xlsx', sheet_name=None)
full_table = pd.DataFrame()
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
sheet = sheet.melt(['City','State'])
sheet = sheet.join(sheet.variable.str.split(',' , expand=True))
full_table = full_table.append(sheet)
full_table.reset_index(inplace=True, drop=True)
full_table.to_csv('Full Table.csv')

Get all Wikidata items that have more than 10 languages?

I'm trying to get the most famous movies in the world from Wikidata with SPARQL.
I have the following query:
SELECT ?item WHERE {
?item wdt:P31 wd:Q11424.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
which returns ALL movies (about 214143).
I basically only need movies that have, let's say, more than 10 language entries on wikipedia, as I'm guessing these will be the most famous ones.
Is there a way to do this inside the query itself, without checking all entries ?
A naive answer to your question is:
SELECT ?movie (count(?wikipage) AS ?count) WHERE {
hint:Query hint:optimizer "None" .
?movie wdt:P31 wd:Q11424 .
?wikipage schema:about ?movie .
?wikipage schema:isPartOf/wikibase:wikiGroup "wikipedia"
} GROUP BY ?movie HAVING (?count > 10) ORDER BY DESC(?count)
Try it!
Alternatively, you could consider total number of sitelinks. Sitelinks include links to Wikipedia and also links to Wikiquote, Wikivoyage etc. The advantage is that total number of sitelinks is precomputed.
SELECT ?movie ?sitelinks WHERE {
?movie wdt:P31 wd:Q11424 .
?movie wikibase:sitelinks ?sitelinks .
FILTER (?sitelinks > 10)
} ORDER BY DESC(?sitelinks)
Try it!
See also these questions:
Get Wikipedia URLs (sitelinks) in Wikidata SPARQL query
Wikidata results sorted by something similar to a PageRank
As #TallTed and #AKSW have pointed out, the number of labels in different languages may be differ from the number of Wikipedia articles in different languages. Here below a comparison.
Top 5 movies by Wikipedia articles
| title | articles | sitelinks | labels |
|---------------------|----------|-----------|--------|
| Avatar | 92 | 103 | 99 |
| Titanic | 86 | 100 | 101 |
| The Godfather | 79 | 103 | 82 |
| Slumdog Millionaire | 72 | 75 | 80 |
| Forrest Gump | 71 | 101 | 84 |
Top 5 movies by sitelinks
| title | articles | sitelinks | labels |
|---------------|----------|-----------|--------|
| Avatar | 92 | 103 | 99 |
| The Godfather | 79 | 103 | 82 |
| Forrest Gump | 71 | 101 | 84 |
| Titanic | 86 | 100 | 101 |
| The Matrix | 67 | 94 | 77 |
Top 5 movies by labels
| title | articles | sitelinks | labels |
|------------------------------|----------|-----------|--------|
| The 25th Reich | 2 | 2 | 227 |
| Time Is But Brief | 0 | 0 | 224 |
| Michael Moore in TrumpLand | 6 | 6 | 222 |
| Magnus - The Mozart of Chess | 1 | 1 | 221 |
| Lee Chong Wei | 1 | 1 | 196 |

SQL Group by Client Location

Sample of Data I am trying to manipulate
Order | OrderDate | ClientName| ClientAddress | City | State| Zip |
-------|-----------|-----------|---------------|--------|------|-------|
CO101 | 1/5/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0102 | 2/6/2015 | Client ABC| 101 Park Drive| Boston | MA | 02134 |
C0103 | 1/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0104 | 3/7/2015 | Client ABC| 354 Foo Pkwy | Dallas | TX | 75001 |
C0105 | 5/7/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0106 | 1/8/2015 | Client XYZ| 1 Binary Road | Austin | TX | 73301 |
C0107 | 7/9/2015 | Client XYZ| 51 Testing Rd | Austin | TX | 73301 |
I have a database setup in MS-SQL Server with all client orders for the past two year period. Some clients only have one location, others have multiple locations. I would like to write a script that will show me the number of orders a customer placed by location over the total number of weeks there was at least one order.
Based on the results of this script, I would like to be able to deduce every customer location's summary of unique orders (placed at various times). For example:
Client ABC has placed 45 orders over 35 total weeks at location A
Client ABC has placed 35 orders over 15 total weeks at location B
Client ABC has placed 15 orders over 15 total weeks at location C
I would like see this information for each unique location for each client. I am not sure how to aggregate the data in such a way. Here is where I am at with my script:
SELECT t1.ClientName, (SELECT DISTINCT t2.ClientAddress), COUNT(DISTINCT t2.Orders) AS TotalOrders,
DATEPART(week, t1.OrderDate) AS Week
FROM database t1
INNER JOIN database t2 on t1.Orders = t2.Orders
GROUP BY DATEPART(week, t1.OrderDate), t1.ClientAddress, t2.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
The results that I get show me the unique orders by location by week, but I'm not sure how to count the number of weeks in the way that I need; I have tried writing subqueries but I keep running into issues. I realize that in this script I am showing number of order by location by each individual week, I would like to count the total number of weeks within the time frame of where there is at least one order.
The results structure is as followed:
| ClientName| ClientAddress | TotalOrders | Week |
|-----------|---------------|--------------|------|
|Client ABC |101 Park Drive | 30 | 21 |
|Client ABC |101 Park Drive | 29 | 13 |
|Client ABC |101 Park Drive | 28 | 10 |
|Client XYZ |1 Binary Road | 27 | 19 |
|Client XYZ |1 Binary Road | 25 | 7 |
|Client XYZ |51 Testing Rd | 22 | 9 |
Any and all help would be greatly appreciated; thank you in advance.
Isn't this what you want?
SELECT t1.ClientName, ClientAddress, COUNT(DISTINCT t1.Orders) AS TotalOrders,
COUNT(DISTINCT DATEPART(week, t1.OrderDate)) AS Weeks
FROM database t1
GROUP BY t1.ClientName, t1.ClientAddress
HAVING COUNT(DISTINCT t2.SalesOrder) > 1
ORDER BY TotalOrders DESC
I don't really follow why you're doing a self-join. Seems useless to me, but I left it in, just in case, and to focus only on the change I made to get your result.

Get all Countries Name from DBpedia using SPARQL

I want list of countries name from DBpedia.
I am using http://dbpedia.org/snorql/ to execute my query, but till now I have not found all countries name which are available in DBpedia.
For Example : dbr:United_Kingdom, dbr:India, dbr:United_States, etc.
Is the problem
You don't know how to write SPARQL in general? (That's OK, it's hard to get started.)
You don' know about the classes and predicates in DBpedia? (That's OK too. I have to check each time.)
Something else?
This gets all UN member nations. Getting "all countries" is probably just a matter of finding the right class in their ontology.
select distinct ?s
where { ?s a <http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> }
I happen to know that New York City is the largest city in the United States, and that DBpedia has a largestCity predicate and a New_York_City instance. So I wrote a query that should only get the United States as the subject, and then asked for all connected predicates and objects. You should look in that for an object that meets your exceptions for defining "all countries." If you don't find one, you may have to union a few other triple patterns into one query.
I have also filtered out objects that contain either of two terms that be relevant for you: "country" or "nation"
select distinct *
where { ?s a <http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> ;
dbo:largestCity dbr:New_York_City ;
?p ?o
filter(isURI(?o))
filter((regex(lcase(str(?o)), "country")) || (regex(lcase(str(?o)), "nation")))
}
Gives the foloowing, which should help you write a followup question that isn't specific to the United States.
+--------------------------------------------+---------------------------------------------------+------------------------------------------------------------------------------+
| s | p | o |
+--------------------------------------------+---------------------------------------------------+------------------------------------------------------------------------------+
| http://dbpedia.org/resource/United_States | http://dbpedia.org/ontology/wikiPageExternalLink | http://www.ifs.du.edu/ifs/frm_CountryProfile.aspx?Country=US |
| http://dbpedia.org/resource/United_States | http://dbpedia.org/ontology/wikiPageExternalLink | http://nationalatlas.gov/ |
| http://dbpedia.org/resource/United_States | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://dbpedia.org/class/yago/Country108544813 |
| http://dbpedia.org/resource/United_States | http://purl.org/dc/terms/subject | http://dbpedia.org/resource/Category:G7_nations |
| http://dbpedia.org/resource/United_States | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://schema.org/Country |
| http://dbpedia.org/resource/United_States | http://www.w3.org/2000/01/rdf-schema#seeAlso | http://dbpedia.org/resource/Anti-miscegenation_laws |
| http://dbpedia.org/resource/United_States | http://purl.org/dc/terms/subject | http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations |
| http://dbpedia.org/resource/United_States | http://www.w3.org/2002/07/owl#sameAs | http://transparency.270a.info/classification/country/US |
| http://dbpedia.org/resource/United_States | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://dbpedia.org/ontology/Country |
| http://dbpedia.org/resource/United_States | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations |
| http://dbpedia.org/resource/United_States | http://purl.org/dc/terms/subject | http://dbpedia.org/resource/Category:G8_nations |
| http://dbpedia.org/resource/United_States | http://www.w3.org/2002/07/owl#sameAs | http://linked-web-apis.fit.cvut.cz/resource/united_states_of_america_country |
| http://dbpedia.org/resource/United_States | http://dbpedia.org/ontology/wikiPageExternalLink | http://www.nationalcenter.org/HistoricalDocuments.html |
| http://dbpedia.org/resource/United_States | http://dbpedia.org/ontology/wikiPageExternalLink | http://news.bbc.co.uk/2/hi/americas/country_profiles/1217752.stm |
| http://dbpedia.org/resource/United_States | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://umbel.org/umbel/rc/Country |
| http://dbpedia.org/resource/United_States | http://purl.org/dc/terms/subject | http://dbpedia.org/resource/Category:G20_nations |
+--------------------------------------------+---------------------------------------------------+------------------------------------------------------------------------------+

Flattening a One to Many Relationship with SQL Query

I have a very simple data set that I would like to be able to query and get the results as a single record.
Members Table
ID | FirstName | LastName | HeroName
42 | Bruce | Wayne | Batman
1337 | Bruce | Banner | Hulk
1033 | Clark | Kent | Newspaper Boy
Skills Tables
ID | Skill
42 | Martial Arts
42 | Engineering
42 | Intimidation
1337 | Anger Management
1337 | Thermo Nuclear Dynamics
1033 | NULL
I want the result to be
ID | FirstName | LastName | HeroName | Skill1 | Skill2 | Skill3 | ... | Skilln
42 Bruce | Wayne | Batman | Martial Arts | Engineering | Intimidation
The query I have so far is
SELECT m.ID, m.FirstName, m.LastName, m.HeroName, s.Skill
FROM Members m
JOIN Skills s
ON m.ID = s.ID
WHERE m.ID = 42 and s.Skill IS NOT NULL
which returns
ID | FirstName | LastName | HeroName | Skill
42 | Bruce | Wayne | Batman | Martial Arts
42 | Bruce | Wayne | Batman | Engineering
42 | Bruce | Wayne | Batman | Intimidation
Short of iterating over the results and only extracting the fields I want is there a way to return this as a single record? I've seen topics on PIVOT, and XmlPath but from what I've read neither of these does quite what I want it to. I'd like an arbitrary number of Skills to be returned and no nulls are returned.
EDIT:
The problem with PIVOT is that it will turn one of the rows into a column header. If There is a way to fill in a generic column header than it might work.