Twisted - succes (or failure) callback for LineReceiver sendLine - twisted

I'm still trying to master Twisted while in the midst of finishing an application that uses it.
My question is:
My application uses LineReceiver.sendLine to send messages from a Twisted TCP server.
I would like to know if the sendLine succeeded.
I gather that I need to somehow add a success (and error?) callback to sendLine but I don't know how to do this.
Thanks for any pointers / examples

You need to define "succeeded" in order to come up with an answer to this.
All sendLine does immediately (probably) is add some bytes to a send buffer. In some sense, as long as it doesn't raise an exception (eg, MemoryError because your line is too long or TypeError because your line was the number 3 instead of an actual line) it has succeeded.
That's not a very useful kind of success, though. Unfortunately, the useful kind of success is more like "the bytes were added to the send buffer, the send buffer was flushed to the socket, the peer received the bytes, and the receiving application acted on the data in a persistent way".
Nothing in LineReceiver can tell you that all those things happened. The standard solution is to add some kind of acknowledgement to your protocol: when the receiving application has acted on the data, it sends back some bytes that tell the original sender the message has been handled.
You won't get LineReceiver.sendLine to help you much here because all it really knows how to do is send some bytes in a particular format. You need a more complex protocol to handle acknowledgements.
Fortunately, Twisted comes with a few. twisted.protocols.amp is one: it offers remote method calls (complete with responses) as a basic feature. I find that AMP is suitable for a wide range of applications so it's often safe to recommend for new development. It largely supersedes the older twisted.spread (aka "PB") which also provides both remote method calls and remote object references (and is therefore more complex - in my experience, more complex than most applications need). There are also some options that are a bit more standard: for example, Twisted Web includes an HTTP implementation (HTTP, as you may know, is good at request/response style interaction).

Related

How to handle the application if connection breaks in between a web service call

In several interviews I have been asked about handling of connection, web service calls, server responses and all. Even now I am not clear about many things.Could you please help me to get a better idea about the following scenarios?
What is the advantage of using NSURLSessionDataTask instead of NSURLConnection-I have an idea like data loss will not happen even if the connection breaks for NSURLSessionDataTask but not for the latter.But how it works?
If the connection breaks after sending the request to a server or while connecting to server , How can we handle the code at our end in case of NSURLConnection and NSURLSessionDataTask?-My idea is to use Reachability classes and check when it becomes online.
The data we are sending got updated at the server side. But we don't get the response from server. What can we do at our side to handle this situation?- Incrementing timeOutInterval is the only thing that we can do?
Please help me with these scenarios. Thank you very much in advance!!
That's multiple questions, really, but I'll try to answer them all briefly.
Most failure handling is the same between NSURLConnection and NSURLSession. The main advantages of the latter are support for background downloads and cancelling groups of related requests.
That said, if you're doing a large download that you think might fail, NSURLSession does provide download tasks that let you resume the download if your network connection fails, similar to what NSURLDownload used to do on OS X (never available on iOS). This only helps for downloading large files, though, not for large uploads (which require significant server-side support to resume) or other requests.
Your intuition is correct. When a connection fails, create a reachability object monitoring that particular hostname to see when it would be a good time to try the request again. Then, try the request again.
You might also display some sort of advisory UI to say that you have no Internet connection. (By advisory, I mean something that the user doesn't have to click on and that does not impact offline use of the app any more than necessary; look at the Facebook app for a great example.)
Provide a unique identifier when you make the request, and store that on the server along with the server's response until the client acknowledges receipt of the response (or purge it anyway after some reasonable number of days). When the upload finishes, the server gives you back its response if it can.
If something goes wrong, the client asks the server to resend the response associated with that unique identifier. Once your client has the data, it acknowledges receipt and the server deletes the response. If you ask the server for the response and it doesn't have one, then the upload didn't really complete.
With some additional work, this approach can make it possible to support long-running uploads more reliably. If an upload fails, ask the server how much data it got for that identifier, then tell the server that you're going to upload new data starting at the next byte. On the server side, overwrite the old data starting at that byte (just in case some data was still being written when you asked for the length).
Hope that helps.

About losing HTTP Requests

I have a server to which my client sends a HTTP GET request with some values. The server on its end simply stores these values to a database.
Now, I am observing that sometimes I do not observe these values in the database. One of the following could have happened:
The client never sent it
The server never received it
The server failed in writing to the database
My strongest doubt is that the reason is 2 - but I am unable to explain it completely. Since this is an HTTP request (which means there is TCP underneath) reliable delivery of the GET request should be guaranteed, right? Is it possible that even though I send a GET request to the server - it was never received by the server? If yes, what is TCP doing there?
Or, can I confidently assert that if the server is up and running and everything sent to the server is written to the database, then the absence of the details of the GET request in the database means the client never sent it?
Not sure if the details will help - but I am running a tomcat server and I am just sending a name-value pair through the get request.
There are a few things you seem to be missing. First of all, yes, if TCP finishes successfully, you pretty much have a guarantee that your message (i.e. the TCP payload) has reached the other side: TCP assures that it will take care of lost packages and the order in which packages arrive. However, this is not universially failproof, as there are still things beyond the powers of TCP (think of a physical disconnect by cutting through an ethernet cable). There is also no assertion regarding the syntactical correctness of the protocol "above." Any checks beyond delivering a bit-perfect copy is simply not TCP's concern.
So, there is a chance that the requests issued by your client are faulty or that they are indeed correct but not parsed correctly by your server. Former is striking me as more likely as latter one as Tomcat is a very mature piece of software. I think it would help tremendously if you would record and analyse some of your generated traffic through e.g. Wireshark.
You do not really mention what database you have in use. But there are some sacrificing acid-compliance in favour of increased write speeds. The nature of these databases brings it that you can never be really sure wether something actually got written to disk or is still residing in some buffer in memory. Should you happen to use such a db, this were another line of investigation.
Programmatically, I advise you take the following steps when dealing with HTTP traffic:
Has writing to the socket finishes without error?
Could a response be read from the socket?
Does the response carry a code in the 2xx range (indicating a successful operation)?
If any of these fail, you should really log something.
On a realated note, what you are doing there does not call for the GET method but for POST as you are changing application state. Consider it as a nice-to-have ;)
Without knowing the specifics, you can break it down into two parts. The HTTP request and the DB write. The client will receive a 200 OK response from the server when its GET request has been acknowledged. I've written code under Tomcat to connect to a MySQL DB using DAO. In the case of a failure an exception would be thrown and logged. Which ever method you're using, you'll want to figure out how failures are logged.

Async WCF and Protocol Behaviors

FYI: This will be my first real foray into Async/Await; for too long I've been settling for the familiar territory of BackgroundWorker. It's time to move on.
I wish to build a WCF service, self-hosted in a Windows service running on a remote machine in the same LAN, that does this:
Accepts a request for a single .ZIP archive
Creates the archive and packages several files
Returns the archive as its response to the request
I have to support archives as large as 10GB. Needless to say, this scenario isn't covered by basic WCF designs; we must take additional steps to meet the requirement. We must eliminate timeouts while the archive is building and memory errors while it's being sent. Both of these occur under basic WCF designs, depending on the size of the file returned.
My plan is to proceed using task-based asynchronous WCF calls and streaming mode.
I have two concerns:
Is this the proper approach to the problem?
Microsoft has done a nice job at abstracting all of this, but what of the underlying protocols? What goes on 'under the hood?' Does the server keep the connection alive while the archive is building (could be several minutes) or instead does it close the connection and initiate a new one once the operation is complete, thereby requiring me to properly route the request through the client machine firewall?
For #2, clearly I'm hoping for the former (keep-alive). But after some searching I'm not easily finding an answer. Perhaps you know.
You need streaming for big payloads. That is the right approach. This has nothing at all to do with asynchronous IO. The two are independent. The client cannot even tell that the server is async internally.
I'll add my standard answers for whether to use async IO or not:
https://stackoverflow.com/a/25087273/122718 Why does the EF 6 tutorial use asychronous calls?
https://stackoverflow.com/a/12796711/122718 Should we switch to use async I/O by default?
Each request runs over a single connection that is kept alive. This goes for both streaming big amounts of data as well as big initial delays. Not sure why you are concerned about routing. Does your router kill such connections? That's a problem.
Regarding keep alive, there is nothing going over the wire to do that. TCP sessions can stay open indefinitely without any kind of wire traffic.

Persisting Data in a Twisted App

I'm trying to understand how to persist data in a Twisted application. Let's say I've decided to write a Twisted server that:
Accepts inbound SMTP requests
Sends the message to a 3rd party system for modification
Relays the modified message to its destination
A typical Twisted tutorial would have you build this app using Deferreds and callbacks, roughly:
A Factory handles inbound requests
Each time a full email is received a call is sent to the remote message processor, returning a deferred
Add an errback that substitutes the original message if anything goes wrong in the modify call.
Add a callback to send the message on to the recipient, which again returns a deferred.
A real server would add/include additional call/errbacks to retry or notify the sender or whatnot. Again for simplicity, assume we consider this an acceptable amount of effort and just log errors.
Of course, this persists NO data in the event of a crash/restart/something else. I get that a solution involves a 3rd party persistent datastore (RabbitMQ is often mentioned) and could probably come up with a dozen random ways to achieve the outcome.
However, I imagine there are a few approaches that work best in a Twisted app. What do they look like? How do they store (and restore in the event of a crash) the in-process messages?
If you found this question, you probably already know that Twisted is event-based. It sounds simple, but the "hardest" part of the answer is to get the persistence platform generating the events we need when we need them. Naturally, you can persist the data in a DB or a message queue, but some platforms don't naturally generate events. For example:
ZeroMQ has (or at least had) no callback for new data. It's also relatively poor at persistence.
In other cases, events are easy but reliability is a problem:
pgSQL can be configured to generate events using triggers, but they're one-time things so you can't resume incomplete events
The light at the end of the tunnel seems to be something like RabbitMQ.
RabbitMQ can persist the message in a database to survive a crash
We can use acknowledgements on both legs (incoming SMTP to RabbitMQ and RabbitMQ to outgoing SMTP) to ensure the application is reliable. Importantly, RabbitMQ supports acknowledgements.
Finally, several of the RabbitMQ clients provide full asynchronous support (see for example pika, txampq, and puka)
It's enough for our purposes that the RabbitMQ client provides us an event-based interface.
At a more theoretical level, however, this need not be the case. In fact, despite the "notification" issue above, ZeroMQ has an event-based client. Even if our software is elegantly event-based, we will run into systems that aren't. In these cases, we have no choice but to fall back on polling. In principle, if not in practice, we just query the message provider for messages. When we exhaust the current queue (and immediately if there are no messages), we use a callLater command to check again in the future. It may feel anti-pattern, but it's (to the best of my knowledge anyway) the right way to handle this particular case.

blocked requests in io_service

I have implemented client server program using boost::asio library.
In my implementation there are times when io_service.run() blocks indefinitely. In case I pass another request to io_service, the blocked call begins to execute normally.
Is there any way to see what are the pending requests inside the io_service queue ?
I have not used work object to block the run call!
There are no official ways to query into the io_service to find all pending request. However, there are a few techniques to debug the problem:
Boost 1.47 introduced handler tracking. Simply define BOOST_ASIO_ENABLE_HANDLER_TRACKING and Boost.Asio will write debug output, including timestamps, an identifier, and the operation type, to the standard error stream.
Attach a debugger dig through the layers to find and examine operation queues. This answer covers both understanding handler tracking and using a debugger to examine an operation queue for the epoll_reactor.
Finally, if you believe it is a bug, then it may be worth updating to the latest version or checking the revision history for relevant changes. Regardless, describing the problem in more detail may allow others to help identify the source of the problem and potential solutions.
Now i spent a few hours reading and experimenting (i need more boost::asio functionality for work as well) and it turns out: Kind of.
But it is not as straightforward or readable as one might hope.
Under the hood (well, under the outermost hood) io_service has a bunch of other services registered, which do the work async_ operations of their respective fields require.
These are the "Services" described in the reference.
Now sadly, the services stay registered, wether there is work to do or not. For example if your io_service has a udp socket, it will still have all the corresponding services, even if the socket itself is inactive.
But you can ask your io_service which services it has. Lets say you want to know wether your io_service called m_io_service has an udp datagram_socket_service. Then you can call something like:
if (boost::asio::has_service<boost::asio::datagram_socket_service<boost::asio::ip::udp> >(m_io_service))
{
//Whatever
}
That does not help a lot, because it will be true no matter wether the socket is active or not. But after you know, that you have that service, you can get a ref to it using use_service instead of has_service but with the same elegant amount of <>.
And now you can inspect the service to see what it is up to. Sadly, it will not tell you what the outstanding handlers names are (probably partly because it does not know them) but if it is a socket, you can get its implemention_type and with that check whether it currently is_open or find either the local_endpoint as well as the remote_endpoint.
In case of a deadline_timer_service you can, among other stuff, find out when it expires_at.
See the reference for more information what the service is and is not willing to tell you.
http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/reference.html
This information should then hopefully allow you to determine which async_ operation did not return.
And if not, at the very least you can cancel any unexpectedly active services.