In the new ASP.NET Web API, how do I design for "Batch" requests? - asp.net-mvc-4

I'm creating a web API based on the new ASP.NET Web API. I'm trying to understand the best way to handle people submitting multiple data-sets at the same time. If they have 100,000 requests it would be nice to let them submit 1,000 at a time.
Let's say I have a create new Contact method in my Contacts Controller:
public string Put(Contact _contact)
{
//add new _contact to repository
repository.Add(_contact);
//return success
}
What's the proper way to allow users to "Batch" submit new contacts? I'm thinking:
public string BatchPut(IEnumerable<Contact> _contacts)
{
foreach (var contact in _contacts)
{
respository.Add(contact);
}
}
Is this a good practice? Will this parse a GET request with a JSON array of Contacts (assuming they are correctly formatted)?
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
Thanks a million!

When you PUT a collection, you are either inserting the whole collection or replacing an existing collection as if it was a single resource. It is very similar to GET, DELETE or POST a collection. It is an atomic operation. Using is as a substitute for individual calls to PUT a contact may not be very RESTfull (but that is really open for debate).
You may want to look at HTTP pipelining and send multiple PutContact requests of the same socket. With each request you can return standard HTTP status for that single request.
I implemented batch updates in the past with SOAP and we encountered a number of unforeseen issues when the system was under load. I suspect you will run into the same issues if you don't pay attention.
For example, the database may timeout in the middle of the batch update and the all hell broke loose in terms of failures, reliability, transactions etc. And the poor client had to figure out what was actually updated and try again.
When there was too many records to update, the HTTP request would time out because we took too long. That opened another can of worms.
Another concern was how much data would we accept during the update? Was 10MB of contacts enough? Perhaps 1MB? Larger buffers has numerous implications in terms of memory usage and security.
Hence my suggestion to look at HTTP pipelining.
Update
My suggestion would to handle batch creation of contacts as an async process. Just assume that a "job" is the same as a "batch create" process. So the service may look as follows:
public class JobService
{
// Post
public void Create(CreateJobRequest job)
{
// 1. Create job in the database with status "pending"
// 2. Save job details to disk (or S3)
// 3. Submit the job to MSMQ (or SQS)
// 4. For 20 seconds, poll the database to see if the job completed
// 5. If the job completed, return 201 with a URI to "Get" method below
// 6. If not, return 202 (aka the request was accepted for processing, but has not completed)
}
// Get
public Job Get(string id)
{
// 1. Fetch the job from the database
// 2. Return the job if it exists or 404
}
}
The background process that consumes stuff from the queue can update the database or alternatively perform a PUT to the service to update the status of Job to running and completed.
You'll need another service to navigate through the data that was just processed, address errors and so forth.
You background process may be need to be tolerant of validation errors. If not, or if your service does validation (assuming you are not doing database calls etc for which response times cannot be guaranteed), you can return a structure like CreateJobResponse that contains enough information for your client to fix the issue and resubmit the request. If you have to do some validation that is time consuming, do it in the background process, mark the job as failed and update the job with the information that will allow a client to fix the errors and resubmit the request. This assumes that the client can do something with the fact that the job failed.
If the Create method breaks the job request into many smaller "jobs" you'll have to deal with the fact that it may not be atomic and pose numerous challenges to monitor whether jobs completed successfully.

A PUT operation is supposed to replace a resource. Normally you do this against a single resource but when doing it against a collection that would mean you replace the original collection with the set of data passed. Not sure if you are meaning to do that but I am assuming you are just updating a subset of the collection in which case a PATCH method would be more appropriate.
Lastly, any tips on how best to respond to Batch requests? What if 4 out of 300 fail?
That is really up to you. There is only a single response so you can send a 200 OK or a 400 Bad Request and put the details in the body.

Related

How does calling a 3rd party API works in a loop?

I'm building a project that sends data to Zapier's webhook, these data can range to hundreds to thousands, and even hundreds of thousands and will be processed one by one in a loop.
Basically, what I did is used Laravel's Chunking Results, and inside it is a pool of Http requests. So it looks like this...
** Note that I chunked the result by 25 to avoid resource exhaustion when sending the data to zapier.
If you guys have better suggestion besides chunking, that would be great. **
Model::where(...someCondition)->chunk(25, function ($models) {
Http::pool(function (Pool $pool) use ($models) {
foreach ($models as $key=>$model) {
$pool->as($key)->withBody(json_encode($model), 'json')->post('https://the-zapier/update/hook/endpoint/...')
}
return $pool;
})
})
Now, Zapier receives the body request, and processes it, and of course it takes time for it to finish.
Here's what I understand:
Inside the loop, the http request was sent, but in Zapier, it hasn't finished the specific task yet.
Now, I'm curious, does the loop proceeds to iterate to the next item AFTER it sends the data to the endpoint ,BUT Zapier hasn't finished the task for this particular request?
OR
Does it wait for Zapier to finish the particular task before it proceeds to the next iteration?

How to handle custom actions in REST?

My api has a job resource which contains a state attribute. The state of a job can be "scheduled", "processing" or "done". The jobs state is determined server-side. When a jobs state is "processing" and ETL process is run server-side.
I want the client to be able to triggering an action like "start" which would set the jobs state to "processing" or trigger a "stop" action which would reset the jobs state to "scheduled".
My question here is how should I handle these "custom" action in REST?
I see two possible approaches:
I could create two new endpoint
POST /job/:id/start
POST /job/:id/stop.
I could create one endpoint which take an action query-param.
POST /job/:id?action=start
POST /job/:id?action=stop
I'm not sure which is considered more RESTful, if either?
Additional, I'm not sure what the response code should be? I'm guessing one of the following: 200 (OK), 202 (Accepted), 204 (No content) but not sure which be best. For example, if the api returned a 200 response should the response body contain the job resource with the updated state? Or, should the api just return 204 no content?
REST does not care about business logic, only about resources - so the most RESTful approach would be a PUT to /job/:id where the document sent will contain the new status.
Although technically the "correct" solution (don't tell Roy Fielding) I rather prefer explicit "actions" like /job/:id/start because it allows my resource to return links to those actions to tell the client if the appropriate action is possible or not (a GET to a "stopped" job would contain a link to start that job for example).
On the basis of HTTP status codes I usually return a 204 when I am not interested in the result although in your case a 200 with the updated resource would be correct - only when you e. g. "start" an already started job a 304 Not Modified would be correct from the view of the client as the state wouldn't have changed.
I would chose 202 Accepted if starting a job has no direct effect on the resource because the job will be e. g. queued and only updated later so that the client knows that an async action has been started.

Capture start of long running POST VB.net MVC4

I have a subroutine in my Controller
<HttpPost>
Sub Index(Id, varLotsOfData)
'Point B.
'By the time it gets here - all the data has been accepted by server.
What I would like to do it capture the Id of the inbound POST and mark, for example, a database record to say "Id xx is receiving data"
The POST receive can take a long time as there is lots of data.
When execution gets to point B I can mark the record "All data received".
Where can I place this type of "pre-POST completed" code?
I should add - we are receiving the POST data from clients that we do not control - that is, it is most likely a client's server sending the data - not a webbrowser client that we have served up from our webserver.
UPDATE: This is looking more complex than I had imagined.
I'm thinking that a possible solution would be to inspect the worker processes in IIS programatically. Via the IIS Manager you can do this for example - How to use IIS Manager to get Worker Processes (w3wp.exe) details information ?
From your description, you want to display on the client page that the method is executing and you can show also a loading gif, and when the execution completed, you will show a message to the user that the execution is completed.
The answer is simply: use SignalR
here you can find some references
Getting started with signalR 1.x and Mvc4
Creating your first SignalR hub MVC project
Hope this will help you
If I understand your goal correctly, it sounds like HttpRequest.GetBufferlessInputStream might be worth a look. It allows you to begin acting on incoming post data immediately and in "pieces" rather than waiting until the entire post has been received.
An excerpt from Microsoft's documentation:
...provides an alternative to using the InputStream propertywhich waits until the whole request has been received. In contrast, the GetBufferlessInputStream method returns the Stream object immediately. You can use the method to begin processing the entity body before the complete contents of the body have been received and asynchronously read the request entity in chunks. This method can be useful if the request is uploading a large file and you want to begin accessing the file contents before the upload is finished.
So you could grab the beginning of the post, and provided your client-facing page sends the ID towards the beginning of its transmission, you may be able to pull that out. Of course, this would be reading raw byte data which would need to be decoded so you could grab the inbound post's ID. There's also a buffered one that will allow the stream to be read in pieces but will also build a complete request object for processing once it has been completely received.
Create a custom action filter,
Action Filters for executing filtering logic either before or after an action method is called. Action Filters are custom attributes that provide declarative means to add pre-action and post-action behavior to the controller's action methods.
Specifically you'll want to look at the
OnActionExecuted – This method is called after a controller action is executed.
Here are a couple of links:
http://www.infragistics.com/community/blogs/dhananjay_kumar/archive/2016/03/04/how-to-create-a-custom-action-filter-in-asp-net-mvc.aspx
http://www.asp.net/mvc/overview/older-versions-1/controllers-and-routing/understanding-action-filters-vb
Here is a lab, but I think it's C#
http://www.asp.net/mvc/overview/older-versions/hands-on-labs/aspnet-mvc-4-custom-action-filters

Lync 2013 UCMA WCF Web Service

What I want to do is create a WCF service just to get the availability of a user. I have gone through the following quick-start example:
Name: SubscribePresence
http://msdn.microsoft.com/en-us/library/office/dn454835(v=office.15).aspx
I have managed to do this but i feel that its not the most efficient way to just get a users availability.
At the moment I create a end point subscribe to a users presence and wait for the response to come back and from that i get the users availability. (I'm simplifying this down).
What I would ideally like though is just to quickly get a users availability without subscribing to a users presence and close the connection as soon as i have retrieved the availability.
I was wondering if anyone knows of an example that i can have a look at or that they have implemented themselves
any advice would be appreciated
You can also do a one-time presence query. From MSDN:
If a one-time presence query to a remote presentity is desired, creating a view and tearing it down is a suboptimal solution for an application. In addition, the application needs to wait and track whether all presence information has been received.
An alternative is to use the BeginPresenceQuery(IEnumerable<String>, [], EventHandler<RemotePresentitiesNotificationEventArgs>, AsyncCallback, Object) and EndPresenceQuery(IAsyncResult) methods on the endpoint’s PresenceServices property.
See http://msdn.microsoft.com/en-us/library/office/hh383136%28v=office.14%29.aspx
Example
You can call the presence query like this. The null argument on 3rd position is the event handler which will fire when presence is recieved, it's not required since we process the results of the EndPresenceQuery instead. You could also pass an eventhandler and not care about the results of the EndPresenceQuery, thats up to you.
endpoint.PresenceServices.BeginPresenceQuery(
new[] { "sip:user#example.com" }, // Collection of sip addresses to query
new[] { "state" }, // Collection of presence catrgories to query
null, // The eventhandler to call when presence is recieved
(ar) => {
var result = endpoint.PresenceServices.EndPresenceQuery(ar);
// process the recieved containers in 'result' here.
},
null); // The state object
However, when you run a WCF service for presence which will be queried multiple times, I would say it might be better to subscribe to presence than to do single queries every time. I build a similar system once with the following logic:
Get an incoming presence request on WCF.
If this SIP uri presence is known to the WCF service (is subscribed), return immediately the cached presence.
If it is not known, subscribe to the presence.
When presence is recieved, return the result and add the presence to the cache.
Any time a subscribed user updates their presence, an event is fired to update the cache.
If no presence queries are recieved for a single user for a certain period of time, unsubscribe from the presence and remove from cache.
The main advantage here is that for multiple subsequent queries for the same user's presence, you do not query the Lync server each time. Your service responses will be a lot faster, and you get presence pushed rather than having to poll for it each time.

Best practice for initial return in a REST-like image upload api endpoint?

When sending a file, e.g. an image, over HTTP to an API, how should the server respond?
Examples:
respond as soon as file is written to disk
respond only when file is written, processed, checksummed, thumbnailed, watermarked etc.
respond as fast as possible with a link to the resource (even if it's a 404 for a few moments afterwards)
add a 'task' endpoint and respond instantly with a task ID to track the progress before data transfer & processing (eventually including path to resource)
Edit: Added one idea from an answer to a similar question: rest api design and workflow to upload images.
The client doesn't know about disks, processing, checksumming, thumbnailing, etc.
The options then are pretty simple. You either want to return the HTTP request as quickly as possible, or you want the client to wait until you know the operation was successful.
If you want the client to wait, return 201 Created. If you want to return as quickly as possible, return 202 Accepted.
Both are acceptable designs. You should let your own requirements dictate which is better for your case. I would say that by default it's a good idea to 'default' to waiting until the HTTP request was guaranteed to be successful, and only use 202 Accepted if that was a specific requirement.
You could also let the client decide with a Prefer header:
Prefer: respond-async, wait=100
See https://www.rfc-editor.org/rfc/rfc7240#section-4.3