How can I query Wikidata API to get details of all the Korean films? - api

If possible, i want to return the results in Json or XML format. Is there any ways to do so? Earlier I did it using freebase.com but it is now deprecated. Please help.

This query would look a lot like the one to get the list of all films on Wikidata but adding another filter:
instead of http://wdq.wmflabs.org/api?q=claim[31:11424] (return all the entities marked as instances of film), you would do
http://wdq.wmflabs.org/api?q=claimCLAIM[31:11424] AND CLAIM[495:884] (return all the entities marked as instances of film and South Korea (Q884) as country of origin (P495))
http://wdq.wmflabs.org/api?q=claimCLAIM[31:11424] AND CLAIM[495:423] (the same for North Korea (Q423))
Then to parse the results and get the entities data, it would be the same as for the list of all the films
Remarques:
you will probably need to encode those URLs to get something that looks like: http://wdq.wmflabs.org/api?q=CLAIM%5B31%3A11424%5D%20AND%20CLAIM%5B495%3A884%5D
here is the full API documentation. Notice that this is an experimental API, which might be replaced in the coming year

The overview on Wikipedia may be more complete than Wikidata, as you've noticed yourself also. However, I could only find overviews per year, such as on https://en.wikipedia.org/wiki/List_of_South_Korean_films_of_2015.
To get a list of titles from that page, you would first retreive the raw wikicode of the page: https://en.wikipedia.org/w/index.php?action=raw&title=List_of_South_Korean_films_of_2015, and then run a regular expression such as /\{lang\|[^\|]+\|([^\}]+)/g on the code.
This returns a list of 149 titles.

Related

Using an API to Extract All Comments from a Reddit Post

I am using the Reddit API (Pushshift) : https://github.com/pushshift/api
Using the documentation, I understand how I can use this to extract every comment containing the word "covid" that was left in a certain time period:
https://api.pushshift.io/reddit/search/comment?q=covid&after=3h&before=2h&size=1
The output looks something like this:
{"data":[{"subreddit_id":"t5_2qh6p","author_is_blocked":false,"comment_type":null,"edited":false,"author_flair_type":"richtext","total_awards_received":0,"subreddit":"Conservative","author_flair_template_id":null,"id":"j98zf27","gilded":0,"archived":false,"collapsed_reason_code":null,"no_follow":false,"author":"VamboRoolOkay","send_replies":true,"parent_id":41917615743,"score":1,"author_fullname":"t2_7uxkru5f","all_awardings":[],"body":"I will never believe that election fraud wasn't a significant factor. Go ahead - call it a conspiracy theory. But I also maintained that Covid was lab-created. Truth is the Daughter of Time.","top_awarded_type":null,"author_flair_css_class":null,"author_patreon_flair":false,"collapsed":false,"author_flair_richtext":[{"e":"text","t":"Conservative"}],"is_submitter":false,"gildings":{},"collapsed_reason":null,"associated_award":null,"stickied":false,"author_premium":false,"can_gild":true,"link_id":"t3_116l7ct","unrepliable_reason":null,"author_flair_text_color":"dark","score_hidden":true,"permalink":"/r/Conservative/comments/116l7ct/kamala_harris_plans_on_running_with_biden_in_2024/j98zf27/","subreddit_type":"public","locked":false,"author_flair_text":"Conservative","treatment_tags":[],"created_utc":1676866031,"subreddit_name_prefixed":"r/Conservative","controversiality":0,"author_flair_background_color":"","collapsed_because_crowd_control":null,"distinguished":null,"retrieved_utc":1676866047,"updated_utc":1676866048,"body_sha1":"328df3784d15f77b98a84418c4ce720822227cfe","utc_datetime_str":"2023-02-20 04:07:11"}],"error":null,"metadata":{"es":{"took":98,"timed_out":false,"_shards":{"total":828,"successful":828,"skipped":824,"failed":0},"hits":{"total":{"value":573,"relation":"eq"},"max_score":null}},"es_query":{"size":1,"query":{"bool":{"must":[{"bool":{"must":[{"simple_query_string":{"fields":["body"],"query":"covid","default_operator":"and"}},{"range":{"created_utc":{"gte":1676862433000}}},{"range":{"created_utc":{"lt":1676866033000}}}]}}]}},"aggs":{},"sort":{"created_utc":"desc"}},"es_query2":"{\"size\":1,\"query\":{\"bool\":{\"must\":[{\"bool\":{\"must\":[{\"simple_query_string\":{\"fields\":[\"body\"],\"query\":\"covid\",\"default_operator\":\"and\"}},{\"range\":{\"created_utc\":{\"gte\":1676862433000}}},{\"range\":{\"created_utc\":{\"lt\":1676866033000}}}]}}]}},\"aggs\":{},\"sort\":{\"created_utc\":\"desc\"}}","api_launch_time":1673017478.254743,"api_request_start":1676873233.6143198,"api_request_end":1676873233.7406816,"api_total_time":0.12636184692382812}}
My Question: Suppose I identify a post that contains the word "covid" - now, I want to retrieve every comment on this post : Is this possible to do?
For instance, based on the output of these results, I see that :
link_id: t3_116l7ct
parent_id:41917615743
Can I somehow use this information to write an API query to retrieve all comments from this post?
I tried the following query but got an empty result: https://api.pushshift.io/reddit/comment/search/?link_id=t3_116cjib
Thanks!

how to get table info and summary of page using Wikipedia api?

I want to get minimal information of a Wikipedia page using MediaWiki API like DuckDuckGo. For example for Steve Carell: https://duckduckgo.com/?q=steve+carell&t=hp&ia=news&iax=about
How can I get this information with a Wikipedia url (eg https://en.wikipedia.org/wiki/Steve_Carell) in HTML format?
You can use the MediaWiki API for that. There's an extension, TextExtracts, which is exactly for that (and it is installed on Wikipedia).
In your case, e.g.:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts&exsentences=1&titles=Steve%20Carell
will return something like:
<p class=\"mw-empty-elt\">\n</p>\n\n<p class=\"mw-empty-elt\">\n \n</p>\n<p><b>Steven John Carell</b> (<span></span>; born August 16, 1962) is an American actor, comedian, producer, writer and director.</p>
You can customize how many sentences (or characters) the API returns, as well, please consult the API documentation for that.
There's also the way to retrieve the short description, which is saved at Wikidata (and visible in the mobile view of Wikipedia). This call would be:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&titles=Steve_Carell
This returns the following property in the pageprops of the page:
"wikibase-shortdesc": "American actor"
This may fit better depending on your use case.
You can even get both of the results with a single, combined, request:
https://en.wikipedia.org/w/api.php?action=query&prop=extracts|pageprops&exsentences=1&titles=Steve_Carell

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

I have been trying to call Wikipedia API to retrieve page id and wikidata item id using below call and it works fine.
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects=1&format=xml&titles=Cat
but I need to retrieve the same information from other languages of my choice for example if I mention German and French languages in my call, it should look for their translation of word Cat and retrieve their page info. There is langlink property in Wikipedia API but somehow it doesn't work with query action along with pageprop.
So ideally, I want something like this:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&prop=langlinks&lllang=de&lllang=fr&titles=Cat
Any help would be appreciated.
Using lllang twice will just result in the second value overwriting the first one. You'll have to omit the paramter and then you get all the links:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|langlinks&ppprop=wikibase_item&titles=Cat

Query Wikipedia pages with properties

I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.
Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)
A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.
This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.
I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.
Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in.
Prominent images of a page is returned by prop=pageimages, thanks to PageImages.
MobileFrontend adds a property called extracts, that you can use with the directive exintro to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:
http://en.wikipedia.org/w/api.php?action=query&pageids=21482&prop=pageimages|info|extracts&inprop=url&exintro
giving this:
<api>
<query>
<pages>
<page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&action=edit">
<thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
<extract xml:space="preserve">
<p><b>Nairobi</b> /naɪˈroʊbi/ is the [...]
</extract>
</page>
</pages>
</query>
</api>
Here is a multistep process to get a list of Wikipedia page titles and properties for articles, and then getting the page IDs and URLS.
Please note: It does use a portion of a previous answer: "Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in."
If you would like to use the Wikipedia API for your own applications and search Wikipedia for getting a list of articles about a certain topic, and you wanted the answer in JSON format, then you could could use the following URL:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&format=json&callback=?
And if your eyes are having trouble parsing results from that, then replace "format=json&callback=?" with "formatversion=2" like the following example to make it easier for your eyes:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&formatversion=2
The following example will give me a batch list of article titles and properties about/for "Thailand" in JSON format, and after that I will use the resulting titles to find the page IDs and URLS of those articles.
URL step 1:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=thailand&format=json&callback=?
From step 1, I can get the list of titles I need from inside the resulting JSON, with step 2, I use those titles gained in step 1 in another API query (aka step 2) for gaining the page IDs and URLs of those articles in the resulting JSON...results of step2.
Here are the Wikipedia article titles from the resulting JSON of step 1:
Thailand
Outline of Thailand
Geography of Thailand
Economy of Thailand
Football in Thailand
Southern Thailand
Government of Thailand
Northern Thailand
Culture of Thailand
Cinema of Thailand
URL step 2:
https://en.wikipedia.org/w/api.php?action=query&titles=Thailand|Outline%20of%20Thailand|Geography%20of%20Thailand|Economy%20of%20Thailand|Football%20in%20Thailand|Southern%20Thailand|Government%20of%20Thailand|Northern%20Thailand|Culture%20of%20Thailand|Cinema%20of%20Thailand&prop=info&inprop=url&format=json&callback=?

How to get the result of "all pages with prefix" using Wikipedia api?

I wish to use Wikipedia api to extract the result of this page:
http://en.wikipedia.org/wiki/Special:PrefixIndex
When searching "something" on it, for example this:
http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4
Then, I would like to access each of the resulting pages and extract their information.
What api call might I use?
You can use list=allpages and specify apprefix. For example:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apprefix=tal&aplimit=max
This query will give you the id and title of each article that starts with tal. If you want to get more information about each page, you can use this list as a generator:
http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=allpages&gapprefix=tal&gaplimit=max&prop=info
You can give different values to the prop parameter to get different information about the page.