How to send objects over tcp efficiently - objective-c

Okay, so my goal is to build a easy to use protocol for sending data over tcp. basically, it would send a message, and an object(of unknown type) over tcp. To send, it would only require one method call and to receive it would only require one also.
So this is how I was thinking to format the "message".
length_of_message - "A string that is a message" - length_of_Object - object
length_of_message would be a set number of bytes. along with length_of_Object.
the actual message string and the actual object would be of variable length.
If the actual class of the object wouldn't be know, could I just declare it as a "generic object" somehow? and then get its class name from the "generic object" and the message would tell the receiver what to do with the object?
It would be simple if it was a constant object type but i want to be able use one send function and one receive function for ever object that needs to be send/recieved.
Any suggestions?
Thanks,
Andrew

Make sure you aren't reinventing the wheel (unless doing so is your primary goal).
With that in mind, consider:
• Implement and use the NSCoding protocol. It allows for the efficient archival of complexly connected object graphs, including cycles.
• Instead of raw TCP, use HTTP. While it adds a bit over overhead in the headers, the body can be straight encoded data. More importantly, HTTP is ubiquitous. It routes through just about anything whereas other protocols might be blocked (think proxy servers).
• Via HTTP, you can leverage compression. If one side of your communication pipe is an existing web server of some kind, it probably already supports gzip'd communication. Compressing an NSData (that would be the result of NSCoding) is trivial.
• Alternatively, stick with straight plists.
Unless you truly have some requirement that makes the above inviable, you are likely better off leveraging the above technologies instead of rolling a new one.
With that said, what you propose is fine. I would add, possibly, a structure like:
[HEADER][MSGID][LEN][TYPE][DATA of len][POST]
Where the POST is a known sequence of bytes that the receiver can verify to make sure that, maybe, all the data was received correctly. Or you could go whole hog and integrate a checksum. Or sub-pieces could be repeated, as needed (i.e. [LEN][TYPE][DATA] over and over.

Related

Appropriate REST design for data export

What's the most appropriate way in REST to export something as PDF or other document type?
The next example explains my problem:
I have a resource called Banana.
I created all the canonical CRUD rest endpoint for that resource (i.e. GET /bananas; GET /bananas/{id}; POST /bananas/{id}; ...)
Now I need to create an endpoint which downloads a file (PDF, CSV, ..) which contains the representation of all the bananas.
First thing that came to my mind is GET /bananas/export, but in pure rest using verbs in url should not be allowed. Using a more appropriate httpMethod might be cool, something like EXPORT /bananas, but unfortunately this is not (yet?) possible.
Finally I thought about using the Accept header on the same GET /bananas endpoint, which based on the different media type (application/json, application/pdf, ..) returns the corresponding representation of the data (json, pdf, ..), but I'm not sure if I am misusing the Accept header in this way.
Any ideas?
in pure rest using verbs in url should not be allowed.
REST doesn't care what spelling conventions you use in your resource identifiers.
Example: https://www.merriam-webster.com/dictionary/post
Even though "post" is a verb (and worse, an HTTP method token!) that URI works just like every other resource identifier on the web.
The more interesting question, from a REST perspective, is whether the identifier should be the same that is used in some other context, or different.
REST cares a lot about caching (that's important to making the web "web scale"). In HTTP, caching is primarily about re-using prior responses.
The basic (but incomplete) idea being that we may be able to re-use a response that shares the same target URI.
HTTP also has built into it a general purpose mechanism for invalidating stored responses that is also focused on the target URI.
So here's one part of the riddle you need to think about: when someone sends a POST request to /bananas, should caches throw away the prior responses with the PDF representations?
If the answer is "no", then you need a different target URI. That can be anything that makes sense to you. /pdfs/bananas for example. (How many common path segments are used in the identifiers depends on how much convenience you will realize from relative references and dot segments.)
If the answer is "yes", then you may want to lean into using content negotiation.
In some cases, the answer might be "both" -- which is to say, to have multiple resources (each with its own identifier) that return the same representations.
That's a normal thing to do; we even have a mechanism for describing which resource is "preferred" (see RFC 6596).
REST does not care about this, but the HTTP standard does. Using the accept header for the expected MIME type is the standard way of doing this, so you did the right thing. No need to move it to a separate endpoint if the data is the same just the format is different.
Media types are the best way to represent this, but there is a practical aspect of this in that people will browse a rest API using root nouns... I'd put some record-count limits on it, maybe GET /bananas/export/100 to get the first 100, and GET /bananas/export/all if they really want all of them.

GET vs. POST when the request is some arbitrary size

I understand the semantics of GETting vs. POSTing, one endpoint should get data, the other should post it. The latter being a request that you may not wish the user to be able to easily replay.
That said, on the project I'm working on at the moment - the approach has been to POST to endpoints that are clearly responsible for responding with data, and these endpoints do not transform data in any way.
The reasoning behind this has been that the payloads are (potentially) of considerable size and seem more appropriate for a body as opposed to a query string.
Can anybody please shed light on which request would be right for a GET request that takes a large request payload? I'm not asking for opinion, I'm asking for what would be compliant to RESTful deisgn.
Further Context
The request is potentially large due to the fact it's a search DTO from the UI, where users may choose to pass any number of filters or search terms.
Can anybody please shed light on which request would be right for a GET request that takes a large request payload? I'm not asking for opinion, I'm asking for what would be compliant to RESTful deisgn.
Today's answer: It's OK to use POST.
For requests that are fundamentally read-only, we'd like to use standardized HTTP semantics to communicate that to general purpose components, so that they can themselves do intelligent things.
BUT: GET, while being both safe and ubiquitous, isn't an appropriate choice when you need to include a message-body in the request:
content received in a GET request has no generally defined semantics
So if you can't, for whatever reason, copy the information you need into a resource identifier, then GET is not an option.
Now, if your payloads are consistent with WebDAV, then you might be able to use one of the safe methods described in those specifications. But, as far as I can tell, they aren't really appropriate for general use.
Tomorrow's answer: the HTTP-WG accepted a proposal for a safe-method-with-body. So we should eventually expect to see a registered HTTP method that is safe and has defined semantic for the request content.
Then, depending on what those semantics are, we may be able to use it for requests like yours.

How do I design a REST call that is just a data transformation?

I am designing my first REST API.
Suppose I have a (SOAP) web service that takes MyData1 and returns MyData2.
It is a pure function with no side effects, for example:
MyData2 myData2 = transform(MyData myData);
transform() does not change the state of the server. My question is, what REST call do I use? MyData can be large, so I will need to put it in the body of the request, so POST seems required. However, POST seems to be used only to change the server state and not return anything, which transform() is not doing. So POST might not be correct? Is there a specific REST technique to use for pure functions that take and return something, or should I just use POST, unload the response body, and not worry about it?
I think POST is the way to go here, because of the sheer fact that you need to pass data in the body. The GET method is used when you need to retrieve information (in the form of an entity), identified by the Request-URI. In short, that means that when processing a GET request, a server is only required to examine the Request-URI and Host header field, and nothing else.
See the pertinent section of the HTTP specification for details.
It is okay to use POST
POST serves many useful purposes in HTTP, including the general purpose of “this action isn’t worth standardizing.”
It's not a great answer, but it's the right answer. The real issue here is that HTTP, which is a protocol for the transfer of documents over a network, isn't a great fit for document transformation.
If you imagine this idea on the web, how would it work? well, you'd click of a bunch of links to get to some web form, and that web form would allow you to specify the source data (including perhaps attaching a file), and then submitting the form would send everything to the server, and you'd get the transformed representation back as the response.
But - because of the payload, you would end up using POST, which means that general purpose components wouldn't have the data available to tell them that the request was safe.
You could look into the WebDav specifications to see if SEARCH or REPORT is a satisfactory fit -- every time I've looked into them for myself I've decided against using them (no, I don't want an HTTP file server).

HTTPS proxy with support for chunked-encoded requests

I'm developing a simple HTTPS proxy (written in Python) which receives POST/GET requests/responses, applies some transformation and finally forwards the result to the recipient.
I need to handle chunked-encoded requests/responses in a "streaming" fashion, meaning that as soon as a chunk is received the proxy transforms it and forwards it to the recipient.
Before deciding to support chunked-encoded requests, I've been using mitmproxy http://mitmproxy.org/ and it worked perfectly. Unfortunately, I noticed that it waits until the entire body is received before letting me handle the response/request.
How can I implement a proxy supporting chunked-encoded requests/responses? Has anyone of you ever done something like this?
Thanks
EDIT: MORE INFO ON MY USE CASE
I need to handle POST requests and GET responses.
In the POST request I receive a JSON object and I have to encrypt some of its values.
In the GET response I receive a JSON object and I have to decrypt some of its values.
Till now, the following code has worked perfectly:
def handle_request(self, r):
if(r.method=='POST'):
// encryption of r.get_form_urlencoded()
def handle_response(self, r):
if(r.request.method=='GET'):
// decryption of r.content
How can I do the same thing with single chunks?
EDIT: UPDATES
After evaluating different solutions, I decided to go for Squid (proxy) + ICAP (content adaptation).
I've successfully configured Squid and the performance are just great. Unfortunately, I can't find a suitable ICAP server (in Python, if possible) for doing content adaptation (modification). I thought this one https://github.com/netom/pyicap could do the job but looks like it doesn't read the body of myPOST requests.
Do you guys know a Python ICAP server that I can use together with Squid?
Thanks
The answer below is outdated. You can now pass --stream to mitmproxy, whose behaviour is explained in the mitmproxy documentation.
mitmproxy developer here. This is definitely a feature we want for mitmproxy as well, but it's not that trivial and probably not coming very soon. If you really want to implement that yourself, I can recommend two things:
If you have a very specific use case, you can employ libmproxy.protocol.http.HTTPRequest.from_stream for parsing the header and do the body processing yourself.
If you do not want to modify the request/response body, you may find it sufficient to modify mitmproxy itself. In a nutshell, you would need to read the request/response without content (see 1.), modify it to your needs, pass it to the server and then delegate control to the libmproxy.protocol.tcp (see https://github.com/mitmproxy/mitmproxy/blob/master/libmproxy/proxy/server.py#L169)
If you have further questions, don't hesistate to ask here or on mitmproxy's IRC channel.
Re Comment #1:
You can't take too much out of mitmproxy, but at least you get delegate the header parsing & processing.
# ...accept request, socket.makefile() etc...
req = HTTPRequest.from_stream(client_conn.rfile, include_content=False)
# manually forward to the server (req._assemble_head())
# manually receive response body chunk by chunk and forward it to the server, see
# https://github.com/mitmproxy/netlib/blob/master/netlib/http.py#L98
resp = HTTPResponse.from_stream(server_conn.rfile, include_content=False)
# manually forward headers
# manually process body and forward
That being said, this is a fairly complex topic. Eventually, you're better off hacking that directly into libmproxy.protocol.http.HTTPHandler.
Another option, depending on your use case again: Use mitmproxy, set the conntype to tcp and forward traffic as-is and use regex replacements on the content in libmproxy.protocol.tcp . Probably the easiest way, but the most hacky one.
If you can provide some context, I may guide you further in the right direction.
Re Comment #2:
Before we get to the main part: JSON is a really bad choice for streaming/chunking as long as you don't want to encrypt the complete JSON object and treat it as a single string. You should definitely consider something like tnetstrings if you only want to encrypt parts.
Apart from that, hooking into read_chunk works, but first you need to get to the point where you can actually receive chunks over the line. Then, it's as simple as reading the single chunks, encrypting them and forwarding them.

Is it better create a library with several functions or create classes?

I'm developing a software to comunicate with a device.
The software will send commands for the device. The device has to answer using the protocol below:
<STX><STX><COMMAND>[<DATA_1><DATA_2>...<DATA_N>]<CHKSUM><ETX>
where:
<STX> is the Start of TeXt (0x55);
<COMMAND> can be 0x01 for read, 0x02 for write, etc;
<DATA> is any value;
<CHKSUM> is the checksum;
<ETX> is the End of TeXt (0x04).
So, I have to validate the received data.
Then, the received data:
cannot be empty;
must have 3 or more characters;
must have an header in the first two characters of the string data;
must have a "footer" in the last character of the string data;
must hava a valid CheckSum.
If the answer is valid, then I can handle the data. But before I'll have to extract this data from the response received.
Ok, this is a relatively easy task. Beforetime I would do it on a procedural way, using only one function and putting many if's.
Now I'm studying more about good programming practices, things seem to be getting harder to do.
To validate the device answer, is better create a class "ValidateReceivedData" for example and pass the received data in the constructor of this class? And then create a public method called "IsReceivedDataValid" that check all steps given above?
Or maybe would be better create a library with with several functions to validate the received data?
I'd like to use unit test too.
As I said before, I'm studying more to make better code. But I realize that I'm spending more time now to code than before. And there are too many questions that are arising, but in my view they seem easy to solve, but I'm not getting.
For what it's worth, I've done this sort of thing before using object-oriented design. Here's a high level possibility for your design:
ProtocolParser class:
Takes a SerialPort object, or equivalent, in the constructor and listens to it for incoming bytes
Passes received bytes to OnByteReceived, which implements the protocol-specific state machine (with states like Unknown, Stx1Received, Stx2Received, ..., CkSumReceived).
After an entire good message is received, creates an object of type Packet, which accepts a byte list in its constructor. It then raises an event PacketReceived, passing the Packet as an argument.
If a bad byte is received, it raises an event BadDataReceived and passes the bad data (for logging/debugging purposes, perhaps).
Packet class:
Takes a list/array of bytes and stores them as Command and Data properties.
Does not need to save the checksum, as this class is only meant to represent a valid packet.
The above classes are sufficient to implement the receive protocol. You should be able to test it by mocking a SerialPort class (i.e., the ProtocolParser could actually take an IDataSource instead of a SerialPort).
You could then add a higher-level class to implement your device-specific functions, which would listen to the PacketReceived event of the ProtocolParser.
Of course it will better to use OOP design.
By what you explained, I'd make at least 2 classes:
Message
Executer
The message will receive the command from the device, and the Executer will handle the message.
The Message object will initiate with the device's answer. It will parse it, and hold fields as you described:
STX
COMMAND
DATA
CHKSUM
ETX
Then an Executer object will receive the Message object and do the actual execution of the message, and hold the logical code.
I would go a step further than Yochai's answer, and create the following classes:
Command: Actually not a class, but an Enum value so you can check against Command.Read, etc., rather than just "knowing" what 0x01 and 0x02 mean.
Message: Just a plain object (POJO/POCO/whatever) that's intended to hold a data representation of the message. This would contain the following fields:
Command (the enum type mentioned earlier)
Data: List of the data. Depending on how the data is represented, you might create a class for this, or you could just represent each datum as a string.
MessageParser: this would have a function that would parse a string or text stream and create a Message object. If the text is invalid, I'd throw a customized exception (another class), which can be caught by the caller.
MessageExecutor: This would take a Message object and perform the action that it represents.
By making the intermediate representation object (Message), you make it possible to separate the various actions you're performing. For example, if the Powers That Be decide that the message text can be sent as XML or JSON, you can create different MessageParser classes without having to mess with the logic that decides what to do with the message.
This also makes unit testing far easier, because you can test the message parser independently of the executor. First test the message parser by calling the parse function and examining the resulting Message object. Then test the executor by creating a Message object and ensuring that the appropriate action is taken.