REST API pagination from multiple sources - api

I am trying to write a pass through REST API which enumerates data from multiple data sources.
I need to provide seamless pagination across these data sources.
What are my options?
One option I thought is to have continumationToken passed to client which has details about data sources which are already enumerated, so the current call can start from the point where previous call stopped. However, this can make the token large if we have more number of data sources.

Related

Number of API calls used while extracting data from Marketo

Is there anyway to find out how many webapi calls that are used against quota when doing rest API call (or soap api call) reading leads from a list? Please note that this is purely read-only, where we are getting data from a static list of leads, which are added using a smart campaign.
We are bringing in 40 attributes from the Marketo lead record total 1600 chars. Depending on need, we might need to stage anywhere from 200K to 1 million records into a static list. We are successfully extracting all that data, but we would like to find out how many API calls are being utilized.
Each authenticated request to an endpoint counts as a call. You can also the usage API to see your daily usage: http://developers.marketo.com/documentation/rest/get-daily-usage/

How best to notify clients of changes in data via API

I have an internal API for my company that contains a large amount of factual data (80MM records as of right now). I have four clients that connect to me on a regular basis. The main API call adds a new item to the database, verifies its authenticity, and then returns structured, analyzed data based on the item submitted.
Over time, as we identify more data to be associated with an item, I need to be able to let my clients know that records have changed.
Right now I have a /recent endpoint, which returns all of the records that have changed since $timestamp. This is fine for small data sets, but given the large number of transactions, one could easily wind up with a /recent dataset of over a million items, especially if there's a large data import.
Another idea I had was to use web hooks to push data to the clients, but then the problem becomes pushing too much data. My clients don't necessarily need updates for every single item that changed -- maybe they only need ones they've already submitted.
The question is less about code and more about design patterns or code strategies:
What are some optimal strategies for notifying my clients of updated records without flooding my clients with unnecessary requests or providing millions of records on a poll?
I've used 3rd party APIs (such as Amazon) that paginate large requests. If the data set exceeds the page limit the client needs to make another a request for the next page. This would be in combination with the /recent endpoint.
The actual implementation would be something like
{
requestId: "foobar",
page: 0,
pages: 10,
data: {
...
}
}
The client makes the request and gets the first page of data, then sends to an endpoint the requstId and the page number. Somehow you'd want to persist a reference to what data corresponds to a requestId.

How do I store 3rd-party API data after user interaction?

The project that I'm currently on is consuming a large volume of 3rd-party information exposed via APIs. These datasets are constantly changing and in the order of millions of entries for each.
Users are to denote their favorites and recall that data when they need it. An example may be that the user wants to "bookmark" an inventory level to their "analyze later" list.
My current thinking is that during actions like searching users are presented with "live" data from the 3rd parties. If they flag something they're interested in I copy that data to a database I control. Subsequent views of that info are served from my database, not the 3rd party, since the 3rd party entry may change (or cease to exist entirely).
Is this good API practice? What object keys are sent to the client-facing application on search? The 3rd party keys? Or do I preprocess the results of a search and determine which items I have locally, thus returning local keys in those instances? Or do I completely abstract the 3rd party sources and generate unique local keys for every returned item, which is then subsequently used if someone saves [that seems REALLY heavy, tho]? Or do I put that processing off and do the lookup as to whether something exists locally to after someone bookmarks something?

Possible to fetch full hierarchical requirements in single portfolio item web service call?

I'm trying to aggregate some information about the kanban states of my user stories. If query a PifTeam item, I get a summarized collection of UserStories associated with it.
Example query:
https://rally1.rallydev.com/slm/webservice/1.40/portfolioitem/pifteam/99999999999.js
However I then have to run a loop on the UserStories collection, individually querying each one to get at the information I need. This potentially results in a lot of web service calls.
Is there a way to return the full hierarchical requirement information in the original pifteam query so that there is only one webservice call which returns all sub-objects? I read the webservice api and was trying to play with the fetch parameter but had no success.
This functionality will be disabled in WSAPI 2.0 but will continue to be available in the 1.x versions. That said, you should be able to use a fetch the fields on story that you need like this:
/pifteam/9999.js?fetch=UserStories,FormattedID,Name,PlanEstimate,KanbanState
Fetch will hydrate the fields specified on sub objects even if the root object type doesn't have those fields. So by fetching UserStories the returned collection will populated with stories, each having the FormattedID, Name, PlanEstimate and KanbanState fields included.
There is no way to do it from Rally's standard Web Services API (WSAPI) but you can from the new Lookback API (LBAPI). The query would look something like this:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/<ObjectID_for_Workspace>/artifact/snapshot/query.js?find={__At:"current",_TypeHierarchy:"HierarchicalRequirement",Children:null,_ItemHierarchy:<ObjectID_for_PortfolioItem>}&fields=["Name"]
Fill in the ObjectIDs for your Workspace and PortfolioItem. The _ItemHierarchy field will cross work item type boundaries and goes all the way from PortfolioItems down through the Story hierarchy down to Defects and even Tasks, so I added _TypeHierarchy:"HierarchicalRequirement" to limit it to Stories. I have specified Children:null which means you'll only get back leaf Stories. The __At:"current" clause get's the current tree and values. Remember, it's the "Lookback" API, so you can retrieve the state of the object at any moment in history. __At:"current" says to get the current values and tree.
Note, the LBAPI is delayed from current values in the system by anywhere from seconds to minutes. Typically it's about 30 seconds behind. You can see how far behind it is by checking the ETLDate field in the response.
Details about the LBAPI can be found here. Note, that the LBAPI is available in preview now for almost all Rally customers. There are still a number of customers where it is not yet turned on. The best way to tell if it's working for your subscription is to try the query.

Lookback API: Is there a way to include more than one artifact type in a single query?

In traditional Rally Webservices REST, if I wanted to get all Defects and Stories modified since a certain date, I would need to issue two separate REST GET requests against each of these endpoints:
https://rally1.rallydev.com/slm/webservice/1.33/hierarchicalrequirement
and
https://rally1.rallydev.com/slm/webservice/1.33/defect/
Is there a way to leverage the Lookback API to combine these into one REST request?
Sure, just add _Type:{$in:["Defect","HierarchicalRequirement"]} to the query. All of the work item types are stored in the same collection. You can also get back descendent Tasks and TestCases.