WCF Paged Results & Data Export - wcf

I've walked into a project that is using a WCF service for the data tier. Currently, when data is needed for a grid, all rows are returned and the results are bound to a grid and the dataset is stuffed into a session variable for paging/sorting/rebinding. We've already hit a max message size problem, so I'm thinking it's time to convert from fetch and cache to fetch only the current page.
Face value this seems easy enough, but there's a small catch. The user is allowed to export the entire result set at any point. This means that for grid viewing purposes fetching the current page is fine, but when they want to do an export, I still need to make a call for all data.
This puts me back into the max message size issue. What is the recommended approach for this type of setup?
We are currently using the wsHttpBinding...
Thanks for any assistance.

I think the recommended approach for large files is to use WCF streaming. I'm not sure the exact details for your scenario, but you could take a look at this as a starting point:
http://msdn.microsoft.com/en-us/library/ms789010.aspx

I would probably do something like this in your case
create a service with a "paged" GetData() method - where you specify the page index and the page size as additional parameters. This should give you a nice clean interface for "regular" use, and that should not hit the maxMessageSize limits
create a second service (or method) that would send all data - ideally, you could bundle that up into a ZIP file or something on the server, before sending it. If that ZIP file is still too large, you might want to check out WCF streaming for handling large files, as Andy already pointed out
The maxMessageSizeLimit is in place for a good reason: to avoid Denial of Service attacks where a WCF service would just get flooded with large messages and thus brought to its knees. If you can, always try to keep that in mind and don't just jack up the maxMessageSize to 2 GB - it might come back to bite you :-)

Related

GET vs POST API calls and cache issues

I know that GET is used to retrieve data from the server without modifying anything. Whereas POST is used to add data. I won't get into PUT/PATCH, and assume that POST is always used to update and replace data.
The theory is nice, but in practice I have encountered many situations where my GET calls need to be replaced with POST calls. This is because the response often gets incorrectly cached. Where I work there are proxy servers for security, caching, load balancing, etc., and often times the response for GET calls is directly cached to speed up the call, whereas POST calls never get fully cached.
So for my question, if I have an API call /api/get_orders/month. Theoretically, this should be a GET call, however, the number of orders might update any second. So if I call this API at any moment it may return for example 1000, and calling it just two seconds later should return 1001. However, because of the cache, and although adding a version flag such as ?v=<date_as_int> should ensure that the updated value is returned, there seems to be some caches in the proxy servers that might ignore this.
Basically, I don't feel safe enough using GET unless I know for certain that the data will not be modified or if I know for a fact that the response is always the updated data.
So, would you recommend using POST/GET in the case of retrieving daily/monthly number of orders. And if GET, with all the different and complex layers and server set-ups, how can one be certain that the data is always updated?
If you're doing multiple GET request and something is caching the data in between, you have no idea what it is or how to change it's behavior then POST is a valid workaround.
In any normal situation you would take the time what sits in between your browser and your server, and if there's something that's behaving in a way that doesn't make sense, I would try to investigate and fix that.
So you work at a place where some of that infrastructure exists. Maybe talk to the people that maintain it? But if that's not an option and you just want to find the 'ignore every convention and make my request work'-workaround, then you can use POST.

Asp .Net Core Controller Task timing out

I have an ASP .NET Core 3.1 website that imports data. Prior to importing, I want to run some validation on the import. So the user selects the file to import, clicks 'Validate', then either gets some validation error messages so they can fix the import file, or allows them to import.
The problem I am running into is around the length of time these validation and import processes take. If the file is small, everything works as expected. If the file is larger (Over 1000) records the validation and/or may take several minutes. On my local machine, or a server on my network, this works fine. On my actual public facing website, I am getting:
503 first byte timeout
So, I need some strategies for getting around this. Turning up the timeout time seems like a rabbit hole? It looks like BackgroundService/IHostedService is probably the best way to go? But I cant seem to find an example of how to do this in the way I would like:
Call "Validate" via AJAX
Turn on a loader
Perform validation
Turn off loader
Display either success or list of errors to user
##UPDATE## -
Validation -
Ajax call to controller
Controller calls Business Logic code
a. Check file extension
b. check file size
c. Read in .csv with CsvHelper
d. Check that all required columns are present
c. Check that required columns contain valid data - length, no whitespace, valid zip code, valid phone, etc...
d. Check for internal duplicates
e. If append (as opposed to overwrite) check for duplicates in database - this is the slow step
So, is a better solution to speed up the validation process? Is BackgroundService overkill?

How do multiple versions of a REST API share the same data model?

There is a ton of documentation on academic theory and best practices on how to manage versioning for RESTful Web Services, however I have not seen much discussion on how multiple REST APIs interact with data.
I'd like to see various architectural strategies or documentation on how to handle hosting multiple versions of your app that rely on the same data pool.
For instance, suppose you make a database level destructive change to a database table that causes you to have to increment your major API version to v2.
Now at any given time, users could be interacting with the v1 web service and the v2 web service at the same time and creating data that is visible and editable by both services. How should this be handled?
Most of changes introduced to API affect the content of the response, till changes introduced are incremental this is not a very big problem (note: you should never expose the exact DB model directly to the clients).
When you make a destructive/significant change to DB model and new API version of API is introduced, there are two options:
Turn the previous version off, filter out all queries to reply with 301 and new location.
If 1. is impossible to need to maintain both previous and current version of the API. Since this might time and money consuming it should be done only for some time and finally previous version should be turned off.
What with DB model? When two versions of API are active at the same time I'd try to keep the DB model as consistent as possible - having in mind that running two versions at the same time is just temporary. But as I wrote earlier, DB model should never be exposed directly to the clients - this may help you to avoid a lot of problems.
I have given this a little thought...
One solution may be this:
Just because the v1 API should not change, it doesn't mean the underlying implementation cannot change. You can modify the v1 implementation code to set a default value, omit the saving of a field, return an unchecked exception, or do some kind of computational logic that helps the v1 API to be compatible with the shared datasource. Then, implement a better, cleaner, more idealistic implementation in v2.
when you are going to change any thing in your API structure that can change the response, you most increase you'r API Version.
for example you have this request and response:
request post: a, b, c, d
res: {a,b,c+d}
and your are going to add 'e' in your response fetched from database.
if you don't have any change based on 'e' in current client versions, you can add it on your current API version.
but if you'r new changes are going to change last responses, for example:
res: {a+e, b, c+d}
you most increase API number to prevent crashing.
changing in the request input's are the same.

RESTful way of getting a resource, but creating it if it doesn't exist yet

For a RESTful API that I'm creating, I need to have some functionality that get's a resource, but if it doesn't exist, creates it and then returns it. I don't think this should be the default behaviour of a GET request. I could enable this functionality on a certain parameter I give to the GET request, but it seems a little bit dirty.
The main point is that I want to do only one request for this, as these requests are gonna be done from mobile devices that potentially have a slow internet connection, so I want to limit the requests that need to be done as much as possible.
I'm not sure if this fits in the RESTful world, but if it doesn't, it will disappoint me, because it will mean I have to make a little hack on the REST idea.
Does anyone know of a RESTful way of doing this, or otherwise, a beatiful way that doesn't conflict with the REST idea?
Does the client need to provide any information as part of the creation? If so then you really need to separate out GET and POSTas otherwise you need to send that information with each GET and that will be very ugly.
If instead you are sending a GET without any additional information then there's no reason why the backend can't create the resource if it doesn't already exist prior to returning it. Depending on the amount of time it takes to create the resource you might want to think about going asynchronous and using 202 as per other answers, but that then means that your client has to handle (yet) another response code so it might be better off just waiting for the resource to be finalised and returned.
very simple:
Request: HEAD, examine response code: either 404 or 200. If you need the body, use GET.
It not available, perform a PUT or POST, the server should respond with 204 and the Location header with the URL of the newly created resource.

WCF Catastrophic Failure

I've got a real lemon on my hands. I hope someone who has the same problem or know how to fix it could point me in the right direction.
The Setup
I'm trying to create a WCF data service that uses an ADO Entity Framework model to retrieve data from the DB. I've added the WCF service reference and all seems fine. I have two sets of data service calls. The first one retrieves a list of all "users" and returns (this list does not include any dependent data (eg. address, contact, etc.). The second call is when a "user" is selected, the application request to include a few more dependent information such as address, contact details, messages, etc. given a user id. This also seems to work fine.
The Lemon
After some user selection change, ie. calling for more dependent data from the data service, the application stops to respond.
Crash error:
The request channel timed out while waiting for a reply after 00:00:59.9989999. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout.
I restart the debugging process but the application will not make any data service calls until after about a minute or so, VS 08 displays a message box with error:
Unable to process request from service. 'http://localhost:61768/ConsoleService.svc'. Catastrophic failure.
I've Googled the hell out of this error and related issues but found nothing of use.
Possible Solutions
I've found some leads as to the source of the problem. In the client's app.config:
maxReceivedMessageSize > Set to a higher value, eg. 5242880.
receiveTimeout > Set to a higher value, eg. 00:30:00
I've tried these but all in vain. I suspect there is an underlying problem that cannot be fixed by simply changing some numbers. Any leads would be much appreciated.
I've solved it =P.
Cause
The WCF service works fine. It was the data service calls that was the culprit. Every time I made the call, I instantiated a new reference to the data service, but never closed/disposed the service reference. So after a couple of calls, the data service reaches its maximum connection and halts.
Solution
Make sure to close/dispose of any data service reference properly. Best practice would be to enclose in a using statement.
using(var dataService = new ServiceNS.ServiceClient() )
{
// Use service here
}
// The service will be disposed and connection freed.
Glad to see you fixed your problem.
However, you need to be carefull about using the using statement. Have a look at this article:
http://msdn.microsoft.com/en-us/library/aa355056.aspx