Server API defensive design? - api

For a service REST API, for example, http://service_host/stores/count=30, will return 30 stores extracted from DB.
If someone put count=99999999, the service will spend quite a while to return all the stores. Shall I put a limit on the count parameter from the service side? Or shall I leave it to the client to enforce any limit they need?
Also, is it better to implement the validation of this count parameter (like valid type, positive value etc) on the service side? I tend to make the service strong and safe, but hesitate to put too much checking in it.
Is there any design convention to follow here?

As a general rule, don't rely on clients to behave nicely. Always protect yourself on the server side. Validate types, enforce limits on parameters, check for non-valid input. Otherwise clients can bring your whole system down unintentionally (not to mention malicious attacks).

You should put a limit on the parameter on the service side so that your service is reliable but at the same time you should provide the client with an option to fetch more items with separate requests. This is usually done by accepting 2 parameters - offset and count. The offset is the position of the first item to be returned and the count - the number of items, starting at offset to be returned.
Generally - don't let the client abuse your service. Implement meaningful limits in the service so that it's reliable and let the client do the heavy work (create and send multiple requests). At the same time support those multiple request and document that especially if the service is to be used by a 3rd party developer.

Related

How to know and distinguish data that sended by tcp Connection?

hi guys i'm making a client-server software and this is my first question
i'd like to ask: how to distinguish data that sended by tcp Connection?
Well, my points are:
-we can determine data that sended by tcpconnection.
for example, we have 3 Listviews in our form
the point of the first listview is for Biodata of client.
the point of second listview is for *The value obtained from the clients
n the point of third listview is for The picture obtained from the clients
in this case we have 3 main points that must be processed.
in fact, we only have 1 connection in our system.
Well, here I'm confused..
how to determine that data we received is for the first listview or second listview or third listview?
remember, the data of third listview is a picture that we received from tcpconnection
How do we do that with 1 connection in our system?
do i have to make 3 connection to control third listviews?
With socket communication, both the client and the server must use the same agreed-upon protocol so that they can understand each other. There are many standard protocols that have already been created, so for most tasks, creating your own protocol is unnecessary. However, if you must, you can always define your own protocol. The nature of your protocol will obviously depend completely on your specific needs, so it would be impossible to tell you what the protocol should be. However, generally speaking, the first thing your protocol must define is how to know where each complete message begins and ends. This is often accomplished by separating each message with a delimiter (e.g. new line, EOF, null). As Francois recommended, you could alternatively specify the length of the message at the beginning of each message. Within each message, you then will need a header section which, among other things, would specify the type (the format) of the data stored in the body of the message.
A simple implementation might be to send each message as a JSON or XML document. Doing so makes it very easy to define and modify the format of the message.
However, unless you really need to, I would recommend using one of the built-in communication frameworks that are provided with .NET. For simple tasks, often a simple asmx web service is sufficient. For more complex tasks, often WCF is a good choice. An asmx web service uses SOAP via HTTP via TCP/IP. WCF uses SOAP, but the lower level connection is configurable so it doesn't have to use TCP/IP, but it can easily do so.

Best way to store data between two request

I need one a bit theoretical advice. Here is my situation : I have a search system, which returns a list of found items. But the user is allowed to display only particular amount of items on one page, so when his first request is sent to my WCF service, it gets the whole list, then tests if the list isn't longer then the ammount of items my user is allowed to get and if the list isn't longer, there is no problem and my service returns the whole list, but when it is, then there is problem. I need to let the user choose which page he wants to display, so I let the javascript know that the user should choose page and the "page number dialog" is shown and then user is sending the second request with page number. And based on this request the webservice selects relewant items and sends them back to user. So what I need to do is to store the whole list on the server between first and second request and I 'd appreciate any idehow to store it. I was thinking about session, but I don't know if it is possible to set timeout only to particular sesion (ex. Session["list"]), because the list is used only once and can have thousands of items, so I don't want to keep it on the server to long.
PS. I Can't use standart pagination, the scenario has to be exactly how is described above.
Thanks
This sounds like a classic use-case for memcached. It is a network based key-value store for storing temporary values. Unlike in-memory state, it can be used to share temporary cached values among servers (say you have multiple nodes), and it is a great way to save state across requests (avoiding the latency that would be caused by using cookies, which are transmitted to/from the server on each http request).
The basic approach is to create a unique ID for each request, and associate it with a particular (set of) memcached key for that user's requests. You then save this unique ID in a cookie (or similar mechanism).
A warning, though, the memory is volatile, so can be lost at any point. In practice, this is not frequent, and the memcached algorithm uses a LRU queue. More details http://code.google.com/p/memcached/wiki/NewOverview
http://memcached.org/
Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
I'm not a .net programmer, but there appear to be implementations:
http://code.google.com/p/memcached/wiki/Clients
.Net memcached client
https://sourceforge.net/projects/memcacheddotnet .Net 2.0 memcached
client
http://www.codeplex.com/EnyimMemcached Client developed in .NET 2.0
keeping performance and extensibility in mind. (Supports consistent
hashing.) http://www.codeplex.com/memcachedproviders BeIT Memcached
Client (optimized C# 2.0)
http://code.google.com/p/beitmemcached jehiah
http://jehiah.cz/projects/memcached-win32

LDAP Server side sorting - really a good idea?

I'm toying with using server side sorting in my OpenLDAP server. However as I also get to write the client code I can see that all it buys me is in this case one line of sorting code at the client. And as the client is one of presently 4, soon to be 16 Tomcats, maybe hundreds if the usage balloons, sorting at the client actually makes more sense to me. I'm wondering whether SSS is really considered much of an idea. My search results in the case aren't larger, dozens rather than hundreds. Just wondering whether it might be more of a weapon than a tool.
In OpenLDAP it is bundled with VLV - Virtual List View, which I will need some day, so it is already installed: so it's really a programming question, not just a configuration question, hence SO not SF.
Server-side sorting is intended for use by clients that are unable or unwilling to sort results themselves; this might be useful in hand-held clients with limited memory and CPU mojo.
The advantages of server-side sorting include, but not limited to:
the server can enforce a time limit on the processing of the sorting
clients can specify an ordering rule for the server to use
professional-quality servers can be configured to reject requests with sort controls attached if the client connection is not secure
the server can enforce resource limits, for example, the aforementioned time limit, or administration limits
the server can enforce access restrictions on the attributes and on the sort request control itself; this may not be that effective if the client can retrieve the attributes anyway
the server may indicate it is too busy to perform the sort or simply unwilling to perform the sort
professional-quality servers can be configured to reject search requests for all clients except for clients with the necessary mojo (privilege, bind DN, IP address, or whatever)
The disadvantages include, but not limited to:
servers can be overwhelmed by sorting large result sets from multiple clients if the server software is unable to cap the number of sorts to process simultaneously
client-side APIs have to support the server-side sort request control and response
it might be easier to configure clients to sort by their own 'ordering rules'; although these can be added to professional-quality, extensible servers
To answer my own question, and not to detract from Terry's answer, use of the Virtual List View requires a Server Side Sort control.

Is a status method necessary for an API?

I am building an API and I was wondering is it worth having a method in an API that returns the status of the API whether its alive or not?
Or is this pointless, and its the API users job to be able to just make a call to the method that they need and if it doesn't return anything due to network issues they handle it as needed?
I think it's quite useful to have a status returned. On the one hand, you can provide more statuses than 'alive' or not and make your API more poweful, and on the other hand, it's more useful for the user, since you can tell him exactly what's going on (e.g. 'maintainance').
But if your WebService isn't available at all due to network issues, then, of course, it's up to the user to catch that exception. But that's not the point, I guess, and it's not something you could control with your API.
It's useless.
The information it returns is completely out of date the moment it is returned to you because the service may fail right after the status return call is dispatched.
Also, if you are load balancing the incoming requests and your status request gets routed to a failing node, the reply (or lack thereof) would look to the client like a problem with the whole API service. In the meantime, all the other nodes could be happily servicing requests. Now your client will think that the whole API service is down but subsequent requests would work just fine (assuming your load balancer would remove the failed node or restart it).
HTTP status codes returned from your application's requests are the correct way of indicating availability. Your clients of course have to be coded to tolerate and handle them.
What is wrong with standard HTTP response status codes? 503 Service Unavailable comes to mind. HTTP clients should already be able to handle that without writing any code special to your API.
Now, if the service is likely to be unavailable frequently and it is expensive for the client to discover that but cheap for the server, then it might be appropriate to have a separate 'health check' URL that can quickly let the client know that the service is available (at the time of the GET on the health check URL).
It is not necessary most of the time. At least when it returns simple true or false. It just makes client code more complicated because it has to call one more method. Even if your client received active=true from service, next useful call may still fail. Let you client make the calls that they need during normal execution and have them handle network, timeout and HTTP errors correctly. Very useful pattern for such cases is called Circuit Breaker.
The reasons where status check may be useful:
If all the normal calls are considered to be expensive there may be an advantage in first calling lightweight status-check method (just to avoid expensive call).
Service can have different statuses and client can change its behavior depending on these statuses.
It might also be worth looking into stateful protocols like XMPP.

Two wcf servers vs a wcf server with callback

I have got two applications that need to communicate via WCF:
Called A and B.
A suppose to push values to B for storage/update
B suppose to push a list of values stored in it to A
the senior programmer in my team wants to open a WCF server at A and another WCF server at B.
I claim that one should be the server and the other should be the client and use server call back In order to avoid splitting the interface into two, avoid circular dependency, and duplication of code settings. he doesn't understand it. can anyone help me explain him why his solution is bad code?
It depends on your criteria. Let's assume a client/server model where A is the client and B is the server. You state that B should "push" data to A.
If you truly need push then you should make B into a duplex server. This does put some strain on your bandwith, so if you have a bandwith restriction, this might not be the right choice.
If you can incur some delay at A than you might want to opt for a polling mechanism of your own (maybe based on timing, or some other logic).
If both are not an option, you can try and swap roles. So then make B the client and A the server. It's les intuitive, but it might fit your scenario. If you can incur a delay on storing data, make B poll A for changes in the data and save at an interval.
If there can be no delay in both and bandwith is limited, you do end up with two WCF services. Altough it may look silly at first glance, keep in mind they are services and not servers. It does make things a bit more complex, so I would keep it as a last resort.
A service should encapsulate a set of functionality that other applications can consume. All it does is wait for and respond to requests from other components, it doesn't initiate actions by itself.
If Application B is storing data, then it can of course be provided to Application A as a service. It provides the "service" of storing data without application A having to worry about how or where, and returns successfully stored data. This is exactly the kind of thing that WCF Services are meant to handle.
I am assuming that application A is the one initiating the requests (unless you have an unmentioned 3rd application, one of them must be the initiator). If Application A is initiating actions (for example, it has a UI, or is triggered to do some batch processing etc.) then it should not be modeled as a "service".
I hope that helps :)