How to get more than default 10 results from Wikipedia API? - wikipedia-api

I am using Wikipedia API where I get the images of certain string I input.
It always returns 10 result but I want more than that approx 50.
https://en.wikipedia.org/w/api.php?action=query&prop=pageimages&format=json&piprop=thumbnail&pithumbsize=500&pilimit=50&generator=prefixsearch&gpssearch=game of thrones
How do I get 50 results?

Solved also need to add
&gpslimit=100
it should be in the range 1-100

By default most of the generators and props used in query action are with limit of 10. Always when you need to increase the limit for your query you have to set the corresponding limit value for all them, because the resulting query limit is equal to smallest of all them.
So, if your query uses generator=geosearch with prop=links|extracts|categories|images and you need 20 results, you must set the limit parameters for geosearch, links, extracts, categories and images to 20.
https://en.wikipedia.org/w/api.php?...&ggslimit=20&pllimit=20&exlimit=20&cllimit=20&imlimit=20
Of course this has to comply with the allowed max limit for each parameter. For example, for extracts the allowed max limit is 20 (default: 1), which means that you can't get more than 20 pages in your final response, although others are more than 20. This also means that in your case above the effect of gpslimit=100 will be the same as gpslimit=50, because pilimit=50.

Related

Azure Search Facet Sub Elements Limitation

As I have implemented azure search in my application
I am facing one challenge that is like I am having facet which should be having around 50 sub-elements but it is returning only 10
I am looking for functionality by which I can configure max number of sub-elements
For each faceted field in the navigation tree, there is a default limit of 10 values. This default makes sense for navigation structures because it keeps the values list to a manageable size. You can override the default by assigning a value to count.
For example: &facet=City,count:50
In a facet query, you can set count to a value. You can set it higher or lower. Setting count:50 gets the top 50 matches in facet results by document count.
However, when document counts are high, there is a performance penalty, so use this option judiciously.
For more details, you could refer to this article.

Bing Image Search API returns duplicate reults

The Bing Image Search API returns all duplicate results for offset > 200 or 300. This costs money as api calls are wasted. It should stop returning results if it doesn't have any more.
It would be nice if the Bing Image Search API stopped returning results when the offset value is greater than the number of results available, but that's not how the API works. If you look at the Image Search API Reference, users are expected to check the totalEstimatedMatches parameter from the first request and make sure that the offset value has an acceptable value before making subsequent requests:
The offset should be less than (totalEstimatedMatches - count).
So if you perform this check, you can decide when to stop making new requests. If offset exceeds the number of results, it looks like the API just returns the last count results which would explain the "duplicate results" you're getting.

How can I control the order of results? Lucene range queries in Cloudant

I've got a simple index which outputs a "score" from 1000 to 12000 in increments of 1000. I want to get a range of results from a lo- to high -score, for example;
q=score:[1000 TO 3000]
However, this always returns a list of matches starting at 3000 and depending on the limit (and number of matches) it might never return any 1000 matches, even though they exist. I've tried to use sort:+- and grouping but nothing seems to have any impact on the returned result.
So; how can the order of results returned be controlled?
What I ideally want is a selection of matches from the range but I assume this isn't possible, given that the query just starts filling the results in from the top?
For reference the index looks like this;
function(doc) {
var score = doc.score;
index("score", score, {
"store": "yes"
});
...
I cannot comment on this so posting an answer here:
Based on the cloudant doc on lucene queries, there isn't a way to sort results of a query. The sort options given there are for grouping. And even for grouped results I never saw sort work. In any case it is supposed to sort the sequence of the groups themselves. Not the data within.
#pal2ie you are correct, and Cloudant has come back to me confirming it. It does make sense, in some way, but I was hoping I could at least control the direction (lo->hi, hi->lo). The solution I have implemented to get a better distribution across the range is to not use range queries but instead;
create a distribution of the number of desired results for each score in the range (a simple, discrete, Gaussian for example)
execute individual queries for each score in the range with limit set to the number of desired results for that score
execute step 2 from min to max, filling up the result
It's not the most effective since it means multiple round-trips to the server but at least it gives me full control over the distribution in the range

What is the maximum permitted response data size?

In the API Docs section Browsing Table Data, there is a reference to the "permitted response data size"; however, that link is dead. Experimentation revealed that requests with maxResults=50000 are usually successful, but as I near maxResults=100000 I begin to get errors from the BigQuery server.
This is happening while I page through a large table (or set of query results), so after each page is received, I request the next one; it thus doesn't matter to me what the page size is, but it does affect the communication with BigQuery.
What is the optimal value for this parameter?
Here is some explanations: https://developers.google.com/bigquery/docs/reference/v2/jobs/query?hl=en
The maximum number of rows of data to return per page of results. Setting this flag to a small value such as 1000 and then paging through results might improve reliability when the query result set is large. In addition to this limit, responses are also limited to 10 MB. By default, there is no maximum row count, and only the byte limit applies.
To sum up: max size is 10MB, no row count limit.
You can choose value of maxResult parameter based on your usage of app.
If you want show data on the report, then you need to set low value for fast showing first page.
If you need to load data to other app, then you can use max possible value (record size * row count < 10MB).
As you say, you manually set maxResults = 100000 to page through result set, it will get errors from BigQuery server. What errors you will get? Could you paste the error message?

Search API - HTTP Query Argument Format

I've created a search API for a site that I work on. For example, some of the queries it supports are:
/api/search - returns popular search results
/api/search?q=car - returns results matching the term "car"
/api/search?start=50&limit=50 - returns 50 results starting at offset 50
/api/search?user_id=3987 - returns results owned by the user with ID 3987
These query arguments can be mixed and matched. It's implemented under the hood using Solr's faceted search.
I'm working on adding query arguments that can filter results based on a numeric attribute. For example, I might want to only return results where the view count is greater than 100. I'm wondering what the best practice is for specifying this.
Solr uses this way:
/api/search?views:[100 TO *]
Google seems to do something like this:
/api/search?viewsisgt:100
Neither of these seem very appealing to me. Is there a best practice for specifying this kind of query term? Any suggestions?
Simply use ',' as separator for from/to, it reads the best and in my view is intuitive:
# fixed from/to
/search?views=4,2
# upper wildcard
/search?views=4,
# lower wildcard
/search?views=,4
I take values inclusive. In most circumstances you won't need the exclusive/inclusive additional syntax sugar.
Binding it even works very well in some frameworks out of the box (like spring mvc), which bind ',' separated values to an array of values. You could then wrap the internal array with specific accessors (getMin(), getMax()).
Google's approach is good, why it's not appealing?
Here comes my suggestion:
/api/search?viewsgt=100
I think the mathematical notation for limits are suitable.
[x the lower limit can be atleast x
x] the upper limit can be atmost x
(x the lower limit must be strictly greater than x
x) the upper limit must be strictly lesser than x
Hence,
q=cats&range=(100,200) - the results from 100 to 200, but not including 100 and 200
q=cats&range=[100,200) - the results from 100 to 200, but the lower limit can be greater than 100
q=cats&range=[100 - any number from 100 onwards
q=cats&range=(100 - any number greater than 100
q=cats&range=100,200 - default, same as [100,200]
Sure, its aesthetics are still questionable, but it seems (IMO) the most intuitive for the human eye, and the parser is still easy.
As per http://en.wikipedia.org/wiki/Percent-encoding =,&,[,],(,) are reserved