import.io Magic extractor: how to use regionText? - import.io

The API doc and support article are a bit unclear about the exact usage of the regionText parameter.
Is it supposed to be a simple string or an XPath? For example, look at http://www.circlecount.com/community/114481059214254340537 - I would like to extract the table in the middle-right. My current API request looks like this:
https://api.import.io/store/data/_magic?url=http://www.circlecount.com/community/114481059214254340537&regionText=//*[#id=follower_table_114481059214254340537]&_apikey=XXX&_user=YYY

Region text should be simple string, not an xpath

Related

How can I get page id, wikidata id of some title along with multiple languages in a single API call?

I have been trying to call Wikipedia API to retrieve page id and wikidata item id using below call and it works fine.
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects=1&format=xml&titles=Cat
but I need to retrieve the same information from other languages of my choice for example if I mention German and French languages in my call, it should look for their translation of word Cat and retrieve their page info. There is langlink property in Wikipedia API but somehow it doesn't work with query action along with pageprop.
So ideally, I want something like this:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&prop=langlinks&lllang=de&lllang=fr&titles=Cat
Any help would be appreciated.
Using lllang twice will just result in the second value overwriting the first one. You'll have to omit the paramter and then you get all the links:
https://en.wikipedia.org/w/api.php?action=query&prop=pageprops|langlinks&ppprop=wikibase_item&titles=Cat

Date range search using Google Custom Search API

I am using the Google Custom Search API to search for images. My implementation is using Java, and this is how I build my search string:
URL url = new URL("https://ajax.googleapis.com/ajax/services/search/images?"
+ "v=1.0&q=barack%20obama&userip=INSERT-USER-IP");
How would I modify the URL to limit search results, for example, to: 2014-08-15 and 2014-09-31?
You can specify a date range using the sort parameter. For your example, you would add this to your query string: sort=date:r:20140815:20140931.
This is documented at https://developers.google.com/custom-search/docs/structured_data#page_dates
Also if you use Google's Java API you can use the Query class and its setSort() method rather than building the URL by hand.
I think the better way is to put this into query itself. Query parameter contains 'after' flag which can be used like:
https://customsearch.googleapis.com/customsearch/v1?
key=<api_key>&
cx=<search_engine_id>&
q="<your_search_word> after:<YYYY-MM-DD>"

How to specify multiple values on siteSearch in google custom search api?

I'm using the google custom search api and want to create a search using the siteSearch:
https://www.googleapis.com/customsearch/v1?key=k&cx=cx&q=cocos2d&siteSearch=www.cocos2d-iphone.org&siteSearchFilter=i
and it works fine (returns all the result only from the given site).
Then I want to specify TWO sites to search so I tried to change the :
siteSearch=www.cocos2d-iphone.org
to
siteSearch=www.cocos2d-iphone.org www.XXXXXXXX.org
siteSearch=www.cocos2d-iphone.org|www.XXXXXXXX.org
siteSearch=www.cocos2d-iphone.org||www.XXXXXXXX.org
but none of these works.
hope someone can help here, thanks:)
Currently I don't believe you can specify more site through the query param siteSearch.
nevertheless you can configure your Custom Search Engine here: https://www.google.com/cse/manage/all
in the "Site to search" area.
This also works for excluding, as you can read here: https://support.google.com/customsearch/bin/answer.py?hl=en&answer=2631038&topic=2601037&ctx=topic
You cannot do this with the as_sitesearch parameter as that only accepts a single value. But you can achieve what you want with the as_q parameter, setting it to some value like: "site:google.com OR site:microsoft.com" - that will work in a similar way to this search.
The as_q parameter is documented here as:
The as_q parameter provides search terms to check for in a document.
This parameter is also commonly used to allow users to specify
additional terms to search for within a set of search results.
Examples q=president&as_q=John+Adams
Use "space" as seperator
Below is sample PHP code which works for me
$url="https://www.googleapis.com/customsearch/v1?key=k&cx=cx&q=cocos2d&siteSearch=".urlencode("www.cocos2d-iphone.org www.XXXXXXXX.org")."&siteSearchFilter=i"
Thanks,
Ojal Suthar

How to get the result of "all pages with prefix" using Wikipedia api?

I wish to use Wikipedia api to extract the result of this page:
http://en.wikipedia.org/wiki/Special:PrefixIndex
When searching "something" on it, for example this:
http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4
Then, I would like to access each of the resulting pages and extract their information.
What api call might I use?
You can use list=allpages and specify apprefix. For example:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apprefix=tal&aplimit=max
This query will give you the id and title of each article that starts with tal. If you want to get more information about each page, you can use this list as a generator:
http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=allpages&gapprefix=tal&gaplimit=max&prop=info
You can give different values to the prop parameter to get different information about the page.

How can I get the Infobox from a Wikipedia article by the MediaWiki API? [duplicate]

This question already has answers here:
How to get the Infobox data from Wikipedia?
(8 answers)
Closed 3 years ago.
Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes an Infobox.
http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext
I want a query which will return only Infobox data. Is this possible?
You can do it with a URL call to the Wikipedia API like this:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0
Replace the titles= section with your page title, and format=xmlfm to format=json if you want the article in JSON format.
Instead of parsing infoboxes yourself, which is quite complicated, take a look at DBPedia, which has Wikipedia infoboxes extracted out as database objects.
Building on garry's answer, you can have Wikipedia parse the info box into HTML for you via the rvparse parameter like so:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse
Note that neither method will return just the info box. But from the HTML content, you can extract (via, e.g., Beautiful Soup) the table with class infobox.
In Python, you do something like the following
resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# Now parse the HTML
If the page has a right side infobox, then use this URL to obtain it in txt form.
My example is using the element hydrogen. All you need to do is replace "Hydrogen" with your title.
https://en.wikipedia.org/w/index.php?action=raw&title=Template:Infobox%20hydrogen
If you are looking for JSON format use this URL, but it's not pretty.
https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json