Returning all images on a Wikipedia page

Returning all images on a Wikipedia page - sparql

I am trying to write a SPARSQL query that will return all possible image URLs associated with a resource.
I can return the foaf:depiction, if there is one, but often when I visit said page on Wikipedia I see there are other pictures that I cannot 'get at'. For example - for video games - there is a Notion The Game cover and box art (for some games, not all), but I don't know how to get their URLs returned with queries.
An example showing exactly how to return, say a box cover and cartridge picture for a game like Super Mario Bros., would answer this question perfectly.

As far as I know, DBpedia only extracts the first image from each Wikipedia article, and it's not possible to get at the other images through DBpedia.

I agree with Cygri, however there are some extra notes
1) DBpedia extracts the first image from each Wikipedia article unless it has a "non-free" licence template
2) you can also use flickr images (non-wikipedia) from the flickr dataset [1] using the http://dbpedia.org/property/hasPhotoCollection property
[1] http://wiki.dbpedia.org/Downloads37#linkstoflickrwrappr

Related

Why is some information from the Wikipedia infobox missing on DBpedia?

Why is some information from the Wikipedia infobox missing on DBpedia?
For example, star Alpha Librae has property distance-from-earth in the infobox, but it isn't a property of the Alpha Librae dbpedia resource. On the other hand, star Betelgeuse has this piece of information on DBpedia). And many other stars have this distance information in the infobox, but there isn't any matching property in the DBpedia resource.
Is there a way to extract thise missing information from DBpedia using SPARQL or is the only way web scraping of the wiki page?

The DBpedia pages present all the data DBpedia has -- no SPARQL nor other query can get data that isn't there.
DBpedia is updated periodically. It may not reflect the latest changes on Wikipedia.
Also, extractors are a living project, and may not grab every property in which you're interested.
Looking at Betelgeuse on Wikipedia, I see one distance in the infobox. Looking at Alpha_Librae, I see two distances. Which should DBpedia have? Perhaps you have the niche knowledge which can ensure that the extractors do the right thing...
As #JoshuaTaylor suggests, you will probably get more satisfactory answers from the DBpedia discussion list and/or the DBpedia development list.

Look at en.wikipedia.org/wiki/Volkswagen_Golf_Mk3:
In the infobox you have:
height = 1991-95 & Cabrio: {{convert|1422|mm|in|1|abbr=on}}1996-99: {{convert|1428|mm|in|1|abbr=on}}
In dbpedia you get height=1991-95
instead of
height=1442
height=1428
This happens because there is no standard how to define properties in a conditional way. For this reason, dbpedia properties are sometimes wrong/missing

How does Google power the box at the top of my search results?

What powers the little box that sometimes show up at the top of search results with things like: definitions for words, weather, movie times, and sometimes even the precise steps in a cooking recipe?
Because I recently searched for a recipe and google showed me the steps for making the recipe right at the top of my results.
Curious to know how they did this, I checked the source of the content and to my surprise, there was no [structured data / rich snippets][1]. There were no special meta tags either and the page didn't even use HTML5 elements.
There was nothing in the markup that would signify the relationship between a step in making the recipe and the details within the steps - we're talking plain old divs, p's, and h tags. There were also no class or div names that Google could have used to piece it together (eg. , etc)
So, how do they do this?

Google does this using the knowledge graph. You can help get your data in there by using structured data markup (look at http://schema.org)

Searching multiple labels in google CSE

I know it has been asked before: 1, 2. But I was reading the documents and saw it might be possible.
I need to add multiple refinements to the query in google custom search engine. So if I have multiple labels on my CSE, I would like to query a subset of them in a way. Say I have three labels: news, articles and stories. I would like to be able to search for a query like earth in all the pages included in news or stories.
For adding just one label, querying like this works fine: earth more:news. But no like when I add the second label.
According to google's documentation, you can use OR between multiple refinements. But It does not work for me. I'm using javascript and the RESTful APIs and yet I tried many combinations:
earth more:news more:stories
earth more:news OR more:stories
earth [more:news OR more:stories]
earth more:news,stories
Anyone has any ideas how it would work?

It seems AND and OR options are only for pagemap data and meta-tags and they don't work on directory structures and labels.

Filter Wikipedia geosearch per region

I saw Wikipedia API (called MediaWiki GeoData) to search wiki pages around fixed coordinates. An example call is
https://it.wikipedia.org/w/api.php?action=query&list=geosearch&gsradius=10000&gscoord=37.786971|-122.399677
I saw also that GeoData, in its Extra parameters, has also the concept of region, accepting a ISO 3166-2 region code.
How can I search elements, filtering per this region code? For example, if I am searching around some coordinates near the border between two regions, am I able to filter only the elements of one region?

Short answer: you can't.
Longer answer: we currently lack two features which I just filed in the issue tracker for you, i.e.
populating the region parameter in GeoData: most pages do not specify the region for GeoData but only in their free text (which is useless), the only structured data we have is in Wikidata;
adding an option to filter the results by region.
For now, you'll have to do everything yourself client-side: figure out the coordinates of each region and filter by those; or search the region in Wikidata statements and then fetch corresponding articles in the language desired. As you are a developer you could also help import country data in Wikidata ;-).
(Expanded from MaxSem's answer, hence wiki.)

MediaWiki api for Wikipedia - is it possible to search by title on ALL languages?

I know that to search for a page id of a wikipedia with known title, i can do:
https://en.wikipedia.org/w/api.php?action=query&titles=7_Studios
However, in this case, 7_Studios is a french wikipedia article, so the above link would not work. Instead I need to try
https://fr.wikipedia.org/w/api.php?action=query&titles=7_Studios
My question is, if I do not know what language the article is about but only the title itself, how can it make sure i can find it using the api?

As Bergi mentioned, you can use Wikidata for this: it contains the database of interwiki links, so it's possible some article title won't be there, but most should.
To do this, you can use the wbgetentities module: you specify the title to search for and a list of wikis to search. For example:
https://www.wikidata.org/w/api.php?action=wbgetentities&titles=7_Studios&sites=enwiki|frwiki|nlwiki|dewiki
You can specify up to 50 wikis in one query. Currently, there are around 300 Wikipedias, so if you really need to query all of them, you may need up to 6 requests for each title.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Returning all images on a Wikipedia page - sparql

As far as I know, DBpedia only extracts the first image from each Wikipedia article, and it's not possible to get at the other images through DBpedia.

Related

Why is some information from the Wikipedia infobox missing on DBpedia?

How does Google power the box at the top of my search results?

Searching multiple labels in google CSE

Filter Wikipedia geosearch per region

MediaWiki api for Wikipedia - is it possible to search by title on ALL languages?

Categories

Resources