Retrieving Multiple Property Rows + Labels from Wikidata - sparql

I'm trying to get a list of all of the landmark buildings in NYC / Manhattan. The problem I'm getting is when the building has multiple owners - my query only returns the name of one of the owners, and I need them all. Example is the Chrysler Building - I'm only seeing "Signa Holding" when I should get "Signa Holding", "Tishman Speyer", and "Abu Dhabi Investment Council".
Another question - I'm trying to get the building's architectural height (wdt:P2048/wd:Q24192182) - how do I add that as an optional parameter?
My query:
SELECT DISTINCT ?skyscraperLabel ?skyscraperDescription ?inception ?link ?coord ?lat ?lon ?postalCode ?ownedBy ?ownedByLabel ?floorsAboveGround ?floorsBelowGround ?geonamesID
WHERE
{
?skyscraper wdt:P1435* wd:Q19825927. # NYC Landmark
?skyscraper wdt:P131* wd:Q11299. # Located in Manhattan
OPTIONAL {?skyscraper wdt:P571 ?inception}
OPTIONAL {?skyscraper wdt:P856 ?link.} # official website
OPTIONAL {?skyscraper wdt:P625 ?coord .} # geographic coord
OPTIONAL {
?skyscraper p:P625 ?statement.
?statement psv:P625 ?node.
?node wikibase:geoLatitude ?lat.
?node wikibase:geoLongitude ?lon.
}
OPTIONAL {?skyscraper wdt:P281 ?postalCode.} # Postal Code
OPTIONAL {
?skyscraper wdt:P127 ?ownedBy.
?ownedBy rdfs:label ?ownedByLabel filter (lang(?ownedByLabel) = "en").
} # Owner
OPTIONAL {?skyscraper wdt:P1101 ?floorsAboveGround.} # Floors above ground
OPTIONAL {?skyscraper wdt:P1139 ?floorsBelowGround.} # Floors below ground
OPTIONAL {?skyscraper wdt:P1566 ?geonamesID.} # GeoNamesID
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Related

Return cities in Wikidata SPARQL Query, similar to a Wikipedia page

I'm not sure what I'm doing wrong. I have a nice list, but not only are the cities duplicating, but I'm unsure how they're defined as cities. I would expect to see London in the results and have similar results to this Wikipedia page. These results are quite different to the Wikipedia page.
I want to:
Get a list of cities, with their first-level administrative country subdivision (province/state/region), similar to this Wikipedia page
While avoiding duplicate cities.
SELECT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city wdt:P1082 ?population .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it
Update:
Although not an answer to this specific question, anyone trying to get similar data to this should have a look here.
Update 2:
With help in the comments from #UninformedUser, the query is now:
SELECT DISTINCT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city p:P1082 ?populationStmt .
?populationStmt ps:P1082 ?population ; pq:P585 ?pop_date .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER NOT EXISTS {
?city p:P1082/pq:P585 ?pop_date_ .
FILTER (?pop_date_ > ?pop_date)
}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it

Identify entity of a Wikipedia page

My question is related to a similar question/comment which unfortunately never received an answer.
Given a list of multiple Wikipedia pages, e.g.:
https://en.wikipedia.org/wiki/Donald_Trump
https://en.wikipedia.org/wiki/The_Matrix
https://en.wikipedia.org/wiki/Tiger
...
how can I find out what type of entity these articles refer to. i.e. ideally I would want something on a higher level e.g. person, movie, animal etc.
My best guess so far was the Wikidata API using SPARQL to move back the instance_of or subclass tree. However, this did not lead to meaningful results.
SELECT ?lemma ?item ?itemLabel ?itemDescription ?instance ?instanceLabel ?subclassLabel WHERE {
VALUES ?lemma {
"Donald Trump"#en
"The Matrix"#en
"Tiger" #en
}
?sitelink schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>;
schema:name ?lemma.
?item wdt:P31* ?instance.
?item wdt:P279* ?subclass.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en,da,sv".}
}
The result can be seen here: https://w.wiki/ZmQ
One option would of course also be to look at the itemDescription, but I'm afraid that this is too granular to build meaningful groups from larger lists and count frequencies later on.
Does anyone have a hint/idea on how to get more general entity categories? Maybe also from the mediawiki API?
Any input would be highly appreciated!
Here are three possibilities, side-by-side:
SELECT ?lemma ?item (GROUP_CONCAT(DISTINCT ?instanceLabel; SEPARATOR = " ") AS ?a) (GROUP_CONCAT(DISTINCT ?subclassLabel; SEPARATOR = " ") AS ?b) (GROUP_CONCAT(DISTINCT ?isaLabel; SEPARATOR = " ") AS ?c) WHERE {
VALUES ?lemma {
"Donald Trump"#en
"The Matrix"#en
"Tiger"#en
}
?sitelink schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>;
schema:name ?lemma.
OPTIONAL { ?item (wdt:P31/(wdt:P279*)) ?instance. }
OPTIONAL { ?item wdt:P279 ?subclass. }
OPTIONAL { ?item wdt:P31 ?isa. }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en,da,sv".
?instance rdfs:label ?instanceLabel.
?subclass rdfs:label ?subclassLabel.
?isa rdfs:label ?isaLabel.
}
# Here, you could add: FILTER(?instanceLabel in ("mammal"#en, "movie"#en, "musical"#en (and so on...)))
}
GROUP BY ?lemma ?item
Live here.
If you're looking at labels such as "film" and "mammal", i. e. a couple dozen at most, you could explicitly list them in order of preference, then use the first one that occurs.
Note that you may be running into this bug: https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial#wikibase:Label_and_aggregations_bug

Making items optional in Wikidata SPARQL query

Please forgive me if I am using the wrong terminology to describe my problem.
I want to extract information on the world's island regions via WIKIDATA SPARQL query, including coordinates, the country they belong to, the archipelago they belong to and their GeoNamesIDs. Of course, this information is not provided for each and every island, so if I include it in my query, I am limiting my result list to items that already contain these properties:
SELECT ?item ?itemLabel ?coords ?GeoNamesID
WHERE {
?item wdt:P31 wd:Q23442.
?item wdt:P625 ?coords.
?item wdt:P1566 ?GeoNamesID.
?item wdt:P17 ?country.
?item wdt:P706 ?terrain.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
How can I make some of these properties "optional" to display the values if they exist but still include items that do not have them at all?
I could not find any similar issue in the long list of Wikidata SPARQL examples and would appreciate your help.
Here is my updated query including several optional properties:
SELECT ?item ?itemLabel ?coords ?GeoNamesID ?country ?continent ?terrain ?date ?named ?archipelago
WHERE {
?item wdt:P31 wd:Q23442.
?item wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
OPTIONAL {?item wdt:P1566 ?GeoNamesID.}
OPTIONAL {?item wdt:P17 ?country.}
OPTIONAL {?item wdt:P30 ?continent.}
OPTIONAL {?item wdt:P706 ?terrain.}
OPTIONAL {?item wdt:P571 ?date.}
OPTIONAL {?item wdt:P138 ?named.}
OPTIONAL {?item wdt:P361 ?archipelago.}
}
I should note that I got a "time out" when first querying all optional properties. I had to retry. But it worked at once with a single optional item:
SELECT ?item ?itemLabel ?coords ?GeoNamesID
WHERE {
?item wdt:P31 wd:Q23442.
?item wdt:P625 ?coords.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
OPTIONAL {?item wdt:P1566 ?GeoNamesID.}
}

Wikidata SPARQL - get company entities and the location of their headquarters

I'm having trouble extracting location attributes of company HQ's.
My query: finds all companies or sub-classes, and returns some basic properties such as ISIN and URL, and the Headquarter location.
I have tried to use this example to extend the Headquarter part of the query to return location information such as city, country, and coordinate latitude and longitude. However I am getting stuck on pulling the values or labels through.
Thank you
SELECT
?item ?itemLabel ?web ?isin ?hq ?hqloc ?inception
# valueLabel is only useful for properties with item-datatype
WHERE
{
?item p:P31/ps:P31/wdt:P279* wd:Q783794.
OPTIONAL{?item wdt:P856 ?web.} # get item
OPTIONAL{?item wdt:P946 ?isin.} # get item
OPTIONAL{?item wdt:P571 ?inception.} # get item
OPTIONAL{?item wdt:P159 ?hq.}
OPTIONAL{?item p:P159 ?hqItem. # get property
?hqItem ps:P159 wd:Q515. # get property-statement wikidata-entity
?hqItem pq:P17 ?hqloc. # get country of city
}
?article schema:about ?item .
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 10
A more simplified query to select some of the values you mentioned:
SELECT
?company ?companyLabel ?isin ?web ?country ?countryLabel ?inception
WHERE
{
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
?article schema:about ?company .
?company p:P31/ps:P31/wdt:P279* wd:Q783794.
?company wdt:P946 ?isin.
OPTIONAL {?company wdt:P856 ?web.}
OPTIONAL {?company wdt:P571 ?inception.}
OPTIONAL {?company wdt:P17 ?country.}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
What I changed:
changed some labels to be more explicit (ex: "?item" -> "?company")
usage of P17 to directly select the country
I removed the OPTIONAL on ISIN to show that there exist some values. You did not get a result because it seems that many company instances on Wikidata lack that information.
From here, selecting the other values should be easy.

SPARQL query for finding films originating from and released in the United States

I have the following SPARQL query that appears to correctly produce the films produced in the US (country of origin) and released in the US (place of publication) in 2018. The issue I'm having is that one row is produced for each release even though the other releases are outside of the US. I've added a limit to reduce the size of the response.
Here is the query:
SELECT ?item ?name ?publication_date ?placeLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item rdfs:label ?name;
wdt:P31 wd:Q11424;
wdt:P495 wd:Q30; # -> country of origin US
wdt:P577 ?publication_date.
?item p:P577 ?publication_statement.
?publication_statement pq:P291 ?place.
FILTER(xsd:date(?publication_date) > "2018-01-01"^^xsd:date)
FILTER(
(LANG(?name)) = "en"
&& ?place=wd:Q30) # -> place of publication
}
ORDER BY ?name
LIMIT 10
I would like to change it so that it produces one row per movie IF it had a release in the US in 2018.
Thanks for your help. Comments on the use of FILTER or other non idiomatic SPARQL are also welcome.
You can use GROUP BY:
SELECT ?item (SAMPLE(?name) as ?Name) (SAMPLE(?publication_date) as ?Date) WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item rdfs:label ?name;
wdt:P31 wd:Q11424;
wdt:P495 wd:Q30; # -> country of origin US
wdt:P577 ?publication_date.
?item p:P577 ?publication_statement.
?publication_statement pq:P291 ?place.
FILTER(xsd:date(?publication_date) > "2018-01-01"^^xsd:date)
FILTER(
(LANG(?name)) = "en"
&& ?place=wd:Q30) # -> place of publication
}
GROUP BY ?item
ORDER BY ?Name
LIMIT 10
See this query on Wikidata.
And you need to fix the SELECT line as you can't pass out the indeterminate non-group keys without explicitly saying. See similar question.