How to get total number of edits for a given wikipedia page from the API? - wikipedia-api

I actually do not want to list each edit, but to get only the count of it.
this data is available for every article on the left panel in link:
https://en.wikipedia.org/w/index.php?title=Wikipedia&action=info
But this produces complete web page with tables, formatting etc and its exhaustive for wikipedia servers. So I ask if is there a way to only get those few numbers and ommit the whole website scraping.

Probably not the answer you want but there isn't a way to get this information yet.
As a workaround you can use the prop=revisions to get all the revisions contributed to the article. You will be able to count the rev tag from here:
http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Wikipedia&prop=revisions&rvprop=ids&rvlimit=max
Alternatively, you can ask YQL to count it for you with the following command:
SELECT * FROM xml
WHERE url="http://en.wikipedia.org/w/api.php?format=xml&action=query&titles=Wikipedia&prop=revisions&rvprop=ids&rvlimit=max"
AND itemPath="/api/query/pages/page/revisions/rev"
Example output (Link to full output):
{
"query": {
"count": 500, //This is the total amount of edits
"created": "2014-03-04T02:29:42Z",
"lang": "en-US",
"results": {
"rev": [{
"parentid": "597995345",
"revid": "598005528"
}, {
"parentid": "597994174",
"revid": "597995345"
}, {
"parentid": "597891867",
"revid": "597994174"
}]
}
}
}
Unfortunately, the upper limit for users to retrieve revision data is 500 and for bots it's 5000.
To get the exact count, you will have to set up a parser on your server to capture the exact count from the info page whenever a user queries the data from your side.

Related

How to implement "GET" request to work for STRAVA API from Postman?

Further update...
I got this working. Although Strava's documentation does not say any of the arguments in the call are mandatory it seems they all are. You need to put valid before and after arguments in epoch time and... (and this is the part that confused me a bit) you need to give a page number and items per page. The items per page default to 30 but the page number does not default. The way it works is if you say page 1 and 30 items per page you get items 1 - 30. If you say page 3 and 30 items per page you get items 31 - 60 and so on. You have to create a loop that keeps going until it gets a blank page. You then know you have retrieved all the activities. (At least that is how I think it works.)
Adrian
Question update...
After some digging and experimenting I have managed to solve some of my problem (as described below) on my own. When one creates an app on Strava listed under your settings under "My API Application" the token given has scope "read" and seems to be very, very limited.
After following the steps listed here Strava Authentication I was able to get a new token with the following scopes:
scope=read,activity:read,activity:read_all,profile:read_all,read_all
So... I thought I was "golden" as the saying goes.
Well now I am able to get individual activities using:
https://www.strava.com/api/v3/activities/2110745394?include_all_efforts="true"&access_token={{ADR_Strava_API_Key}}
But when I try to get a list of all activities I don't get any error messages but Strava simply returns
[] and this for an athlete that I know has over 1800 activities.
What I really want is to get the list of activities. Any help would be appreciated.
Thank you
Adrian
I can get athlete information back from Strava using postman using the following https request:
https://www.strava.com/api/v3/athletes/19133707?access_token={{ADR_Strava_API_Key}}
The following gets returned:
{
"id": 19133707,
"username": "adrian_geekie",
"resource_state": 2,
"firstname": "Adrian",
"lastname": "Geekie",
"city": "Gauteng, South Africa",
"state": "GP",
"country": "South Africa",
"sex": "M",
"premium": true,
"summit": true,
"created_at": "2017-01-03T16:07:37Z",
"updated_at": "2019-01-28T16:08:07Z",
"badge_type_id": 1,
"profile_medium": "https://dgalywyr863hv.cloudfront.net/pictures/athletes/19133707/5599004/2/medium.jpg",
"profile": "https://dgalywyr863hv.cloudfront.net/pictures/athletes/19133707/5599004/2/large.jpg",
"friend": null,
"follower": null
}
But when I try to get activities using this request:
https://www.strava.com/api/v3/19133707/activities?before=&after=1546293601&page=&per_page=&access_token={{ADR_Strava_API_Key}}
I get this returned:
{
"message": "Record Not Found",
"errors": [
{
"resource": "resource",
"field": "path",
"code": "invalid"
}
]
}
According to me I am asking for all records after the 1st of January 2019 i.e. epoch timestamp 1546293601. I know there are many activities for that athlete after that date. (More than 20).
I have also tried to get a single activity using:
https://www.strava.com/api/v3/activities/2110745394?include_all_efforts="true"&access_token={{ADR_Strava_API_Key}}
and I get the result:
{
"message": "Resource Not Found",
"errors": [
{
"resource": "Activity",
"field": "",
"code": "not found"
}
]
}
On the Strava developer's page the examples are given for HTTPie like this:
https://www.strava.com/api/v3/activities/{id}?include_all_efforts=" "Authorization: Bearer [[token]]
So I am replacing "Authorization: Bearer [[token]] with &access_token=
Perhaps that is my error but access_token works in the first example.
I am sorry if this is a total idiot question. I am a beginner and I would appreciate any help.
Thank you

Cumulocity Inventory API filter by Creation Date

I'm currently trying to implement a simple date filter for the Inventory API using the query language. The filter should return a list of managed objects which were created after a given date. For some reasons I always receive an empty list as result but the example in the query language documentation looks the same as my query:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'
gives me
{
"managedObjects": [],
"next": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=2",
"statistics": {
"currentPage": 1,
"pageSize": 5
},
"self": "{{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.351Z'&pageSize=5&currentPage=1"
}
And if I try this structure for the timestamp I even receive an error:
GET {{url}}/inventory/managedObjects?query=creationTime+gt+'2018-12-01T09:00:53.3512B1:00'
{
"error": "inventory/Invalid Data",
"info": "https://www.cumulocity.com/guides/reference-guide/#error_reporting",
"message": "Find by filter query failed : Query 'creationTime gt '2018-12-01T09:00:00'' could not be understood. Please try again."
}
Try to filter by
creationTime.date
Background is that the timestamps are stored as MongoDb dates.
You can also check the device list filter in device management which has a filter on creationTime as well.

RESTful API to safeguard server and client from large datasets

I am working on designing a RESTful API and need second opinion on the design. I will be abstracting away the problem statement for better understanding.
Consider a URI /search?key1=value1&key2=value2, which can potentially return a huge result set for a given search criteria for key1 and key2.
My mandate is to make sure that the server and client are bounded by limits to prevent performance degradation. If that limit is reached and the intended data is not found in result set, user will be asked to refine the search query to narrow down. (I am not thinking of pagination, that is for a different problem set)
Approach is to allow client specify a limit to server that it(client) can comfortably handle, and to help server set a limit for itself to prevent from generating huge result sets affecting performance.
Client can do /search?key1=value1&key2=value2&maxresults=xxxx to specify it's limit.
Server can set it's own limit as a configuration param for search URI. While serving a request, server will take a min of (client's limit, server's limit) and generate result set satisfying the effective limit.
The JSON generated will have a meta data part which will mention if the result was truncated or not, and the effective limit set. The client can inspect this part and ask the user to refine search if "truncated" is "true". The problem domain actually allows the user to refine to a single item.
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The questions I am trying to answer are:
Is this violating any REST principles?
Is there a standard convention to do the same that I might follow?
Are there good examples on public APIs that you can quote? (Jira RESTful API has a couple of examples)
Is there any gotcha in this design which may affect us in the future?
Any view on this will be appreciated ...
Thanks!
From my point of view this fits REST principles quite well. I would suggest not to add result size meta data values to the response payload but as HTTP headers. So instead of
{
"result": {
"truncated": "true",
"limit": "2000",
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
}
The service would send
{
"data": [
{
"id": "1"
},
{
"id": "2"
}
...
{
"id": "2000"
}
]
}
and add additional custom HTTP headers
x-result-truncated:1
x-result-limit:1000
This approach has the benefit that meta data values that are not a part of the payload from a client's perspective are sent in the meta data section of the your response where for example content-type are transmitted.
An additional benefit is that packing the meta data into HTTP headers is reusable for other services as well and you do not have to change the schema of the returned payload, that means clients keep working as expected (except that some results may be truncated).

How do I return filtering meta data in a REST API search query

I'm currently designing and implementing a RESTful API in PHP.
The API allows users to search for hotels.
A simplified example of the search request is:
GET hotels/searchresults?location=<location> #collection of hotels within location
The response also contains some meta information about the returned collection.
The basic structure of the response is:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
},
“hotels": [
{
“id": 123,
“name": "Hotel A"
},
{
“id": 135,
“name": "Hotel B"
},
...
]
This resource also supports pagination:
GET hotels/searchresults?location=<location>&offset=0&limit=20
Now, there are a few filters that can be applied to the search results, e.g. stars, rating score.
For example, if I want just 2 star hotels, I can query:
GET hotels/searchresults?location=<location>&offset=0&limit=20&stars=2
Now, in the user interface for filtering, it is common to display the number of options available per filter setting:
In my opinion, these numbers can be seen as meta data about the search query. So, we could add an extra field to the meta in the response:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
“filterNrs": {
"stars”: {
“1": 1,
“2”: 9,
“3”: 39,
“4”: 12,
“5”: 11,
“none”: 9
}
}
},
“hotels": [
{“id": 123,
“name": "Hotel A"
},
{“id": 135,
“name": "Hotel B"
},
...
]
So, I have two questions:
Should this “filterNrs” property sit in the meta section, as proposed above? To me, it doesn’t make sense to be a separate resource/request
How can we deal with the fact that this can slow down the query? I’d prefer to make the “filterNrs” field optional. We are thinking of using a “metaFields" parameter to allow the user to specify which fields in the meta she would like to recieve. We already support this for the hotels returned, with a “fields” parameter. (Similar to: https://developers.google.com/youtube/2.0/developers_guide_protocol_partial). Alternatively, we put this field filterNrs (or the full meta info) in a separate resource, something like hotels/searchresults/meta. From a developers perspective would you prefer to have this split into multiple resources or have a single resource with the option to show full or partial meta information?
Does the number rated per star count varies? For example, do I get different "filterNrs" for the queries below?
GET hotels/searchresults?location=1
GET hotels/searchresults?location=2
I would expect such filters to be contextual, so different locations would return different numbers per star count, which indicates this is some form of contextual information related to the query.
Otherwise if the results are global this indicates it's a separate resource. If it's a separate resource scenario, you can use links to access the numbers and other details about it:
“meta": {
“totalNrOfHotels": 100,
"totalNrAvailable": 80
“filterNrs": {
"stars”: {
"options" : ["1", "2", "3", "4", "5", "none"],
"details" : "http://example.com/stars"
}
}
},

How to add content and moreDetailsUrl for Google Search suggest?

I'm using GSA (version 6.14) and we would like to get an auto suggest function on our website. Works fine for basic requests, but it seems the GSA offers more functionality when you would be using user-added results. However, I can find nowhere a reference on how to add user-added results.
This is what the information tells me today :
/suggest?q=<query>&max=<num>&site=<collection>&client=<frontend>&access=p&format=rich
should return a response as below :
{
"query": "<query>",
"results": [
{ "name": "<term 1>", "type": "suggest"},
{ "name": "<term 2>", "type": "suggest"},
{ "name": "<term 3>", "type": "uar", "content": "Title of UAR",
"moreDetailsUrl": "URL of UAR"}
]
}
I am able to get results as the first 2 lines, but would like to get results as the last line also, so with content and a moreDetailsUrl. So maybe a very stupid question but I am not able to find the answer anywhere : How and where do I add this UAR ?
I actually want to understand if it's feasible to get metadata into the content part of the JSON, so if for instance an icon meta is available I'd like to have it included in the JSON so I can enrich my search results.
User Added Results are a OneBox that can be added to multiple frontends. See this: https://developers.google.com/search-appliance/documentation/614/admin_searchexp/ce_improving_search#uar
When done with Suggest, the data is fed from user entering 'keymatches' directly. What's different about them is that they are a direct link versus a suggested query. If you use the out of the box experience, you'll click a link to the url instead of running another query.