How to determine WCF message size at the encoder level - wcf

I am building a custom encoder that compresses WCF responses. It is based on the Gzip encoder in Microsoft's WCF samples and this blog post:
http://frenk.wordpress.com/2009/12/04/gzip-compression-wcfsilverlight/
I've got it all working, but now I would like to apply the compression only if the reply is beyond a certain size, but I am not sure how to retrieve the total size of the actual message from the encoder level.
I would need to get the message size at both the WriteMessage(...) method in the EncoderFactory, so I know whether to compress the message) and at the BeforeSendReply(...) method in the DispatchMessageInspector so that I can add the "gzip" ContentEncoding header to the response. Requests are always small and not compressed, so I don't need to worry about that.
Any help appreciated.
Jon.

I think you would do this in two stages. First, write a custom MessageEncoder that encodes the message to a byte[] just normal. Once you have the encoded byte-array (and this can be any message encoding format... Xml, Json, binary, whatever) you can examine the byte-array size and determine whether you want to create another compressed byte array.
Several resources you may find useful:
MSDN WCF Sample Code for a custom compression message encoder
Nicholas Allen's "Build a Custom Message Encoder" blog series. In this series He creates a "counting encoder" that basically wraps another encoder of any type and allows you to know what the encoded message size is (based on the byte[] size). You could probably adapt this and create a "ThresholdCompressionEncoder".

You can try calculating it based on reply.ToString.Length() and message.ToString.Length()

Related

How can I ensure that ASP.NET Core's IFormFile stream doesn't read more than what's specified in the file's Content-Length?

I have an API endpoint for uploading large files, streaming then directly to DB. I use ASP.NET Core's IFormFeature to do this, calling IFormFile.OpenReadStream() to get a Stream that I pass to SqlClient for streaming.
I want to enforce a a maximum file size to avoid abuse. I know IFormFile has a Length property, but I assume that is based on Content-Length or similar and can not be trusted (please correct me if I'm wrong, but AFAIK the only way to be 100% sure about the file size is to actually read the data; the client could send an incorrect Content-Length.)
I must therefore ensure that when the stream is read, it does not read more than what is specified in IFormFile.Length (ideally it should throw if it encounters additional bytes). I have not found a way to do this. Is this possible, or is there perhaps a better way to ensure the server doesn't read enormous amounts of data from clients sending incorrect Content-Length headers?
(It should go without saying that this must not entail reading the entire file into memory.)

Can I trust the .Length property on IFormFile in ASP.NET Core?

We have an API endpoint that allows users to upload images; one of its parameters is an IFormFileCollection.
We'd like to validate the file size to make sure that the endpoint isn't being abused so I'm checking the Length property of each IFormFile, but I don't know whether I can trust this property or not, i.e. does this come from the request? Is it considered 'input', much like Content-Length is?
If you have an IFormFileCollection parameter, and you send data using a "form-data" content-type in the request, that parameter will be bound by a whole lot of plumbing that's hard to dig through online, but if you just debug the action method that accepts the IFormFileCollection (or any collection of IFormFile, really)and inspect the collection, you'll see that the uploaded files will already have been saved on your server's disk.
That's because the entire multi-part form request's body has to be read to determine how many files there are, if any, and form parameters, and validate the request body's format while it's reading it.
So yes, by the time your code ends up there, you can trust IFormFile.Length, because it's pointing to a local file that exists and contains that many bytes.
You're too late there to reject the request though, as it's been already entirely read. You better fix rate and size limits lower in the stack, like on the web server or firewall.
Content-Length is compressed number of bytes of data in the body , it is not reliable since it may include extra data ,for example , you are sending multipart request . Just use the IFormFile.length for features like calculation or validation .

Implementing basic S3 compatible API with akka-http

I'm trying to implement the file storage ыукмшсу with basic S3 compatible API using akka-http.
I use s3 java sdk to test my service API and got the problem with the putObject(...) method. I can't consume file properly on my akka-http backend. I wrote simple route for the test purposes:
def putFile(bucket: String, file: String) = put{
extractRequestEntity{ ent =>
val finishedWriting = ent.dataBytes.runWith(FileIO.toPath(new File(s"/tmp/${file}").toPath))
onComplete(finishedWriting) { ioResult =>
complete("Finished writing data: " + ioResult)
}
}
}
It saves file, but file is always corrupted. Looking inside the file I found the lines like these:
"20000;chunk-signature=73c6b865ab5899b5b7596b8c11113a8df439489da42ddb5b8d0c861a0472f8a1".
When I try to PUT file with any other rest client it works as fine as expected.
I know S3 uses "Expect: 100-continue" header and may it he causes problems.
I really can't figure out how to deal with that. Any help appreciated.
This isn't exactly corrupted. Your service is not accounting for one of the four¹ ways S3 supports uploads to be sent on the wire, using Content-Encoding: aws-chunked and x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD.
It's a non-standards-based mechanism for streaming an object, and includes chunks that look exactly like this:
string(IntHexBase(chunk-size)) + ";chunk-signature=" + signature + \r\n + chunk-data + \r\n
...where IntHexBase() is pseudocode for a function that formats an integer as a hexadecimal number as a string.
This chunk-based algorithm is similar to, but not compatible with, Transfer-Encoding: chunked, because it embeds checksums in the stream.
Why did they make up a new HTTP transfer encoding? It's potentially useful on the client side because it eliminates the need to either "read your payload twice or buffer [the entire object payload] in memory [concurrently]" -- one or the other of which is otherwise necessary if you are going to calculate the x-amz-content-sha256 hash before the upload begins, as you otherwise must, since it's required for integrity checking.
I am not overly familiar with the internals of the Java SDK, but this type of upload might be triggered by using .withInputStream() or it might be standard behavor for files too, or for files over a certain size.
Your minimum workaround would be to throw an HTTP error if you see x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD in the request headers since you appear not to have implemented this in your API, but this would most likely only serve to prevent storing objects uploaded by this method. The fact that this isn't already what happens automatically suggests that you haven't implemented x-amz-content-sha256 handling at all, so you are not doing the server-side payload integrity checks that you need to be doing.
For full compatibility, you'll need to implement the algorithm supported by S3 and assumed to be available by the SDKs, unless the SDKs specifically support a mechanism for disabling this algorithm -- which seems unlikely, since it serves a useful purpose, particularly (it appears) for streams whose length is known but that aren't seekable.
¹ one of four -- the other three are a standard PUT, a web-based html form POST, and the multipart API that is recommended for large files and mandatory for files larger than 5 GB.

Implementing gzip in BeforeSendReply (IDispatchMessageInspector)

I am trying to gzip encode the contents of my WCF message. Most of the examples I see talk about having an BindingElement and MessageEncodingFactory.
Are there any side effects to doing this in the BeforeSendReply of IDispatchMessageInspector? i.e. I take the message, zip it up and replace the original message.
public void BeforeSendReply(ref System.ServiceModel.Channels.Message reply, object correlationState)
{
HttpResponseMessageProperty httpResponseProperty = new HttpResponseMessageProperty();
httpResponseProperty.Headers.Add(HttpResponseHeader.ContentEncoding, "gzip");
reply.Properties[HttpResponseMessageProperty.Name] = httpResponseProperty;
reply = gzip(reply);
}
gzip being the function to extract the (xml) body out and replace it with a gzipped byte stream.
I'm looking for something along the lines of
Nooo!! That would kill your server.
Nope, that would break messages longer than x.
Not a good idea because the client would see this as a message with a random series of bytes as body, not as a gzipped message.
Yep, this works. And the impact on performance wouldn't be that huge.
Thanks folks!
This may or may not work, depending on which binding you're using. If you're using any SOAP-based bindings (BasicHttpBinding, WSHttpBinding, NetTcpBinding and so on) this would not work, since the encoder used by it wouldn't know how to write the gzipped version of the message to the wire (it uses a XML writer after all).
If you use a non-SOAP binding (such as WebHttpBinding), then it might work (you should try it to confirm). If you're dealing with very large messages you will incur the penalty of buffering it all a couple of times (before GZip and after GZip). You'll need to remember to set the WebBodyFormatMessageProperty to Raw to make sure that the encoder doesn't try to reencode the message (see this post for more information on this) and format the message appropriately.
Also, you need to make sure that the client understands it. With respect specifically to your third point - the client always sees the message as a series of bytes, and it's up to it to "understand" it (for example, by treating it as a HTTP response, separating its header and body, and so on).

SBJson Stream Parser

I'm working in Xcode 4.3.2 + building for an app in iOS 5.
I've decided to use SBJson to parse streams of data from our server. I've verified that I'm receiving a valid JSON response from the server. My question concerns the design behind the classes SBJsonStreamParser and the SBJsonParser.
It appears that in SBJsonParser the method "objectWithData" takes the data received from the JSON response and uses the SBJsonStreamParserAccumulator to append the stream of data into a single JSON document. Once the data stream is gathered into one object, it is then parsed by the "parse" method in SBJsonStreamParser.
I've run into several issues when requesting larger JSON documents. The size of the responses seem to be reasonable (specially 9.4 KB response). It appears that the SBJsonStreamParser breaks when getting a data stream greater than a certain size. The parser succeeds when the response is small (~3KB), but fails when the response is larger (~10KB).
I used NSLog to verify that in both cases, pulling a small & large stream, the methods are successfully receiving the full json document - because it looks like [{"id": .... 123}]. I'm convinced that the issue is that the data stream is too long.
I'm wondering if I'm using SBJson incorrectly or is this simply a limitation of the parser? Is there anything that I can configure that allows SBJsonStreamParser to not throw an error for larger (but reasonable) data streams & continue to parse the full response?
Thanks in advance!
Actually you have the workings of objectWithData: backwards. SBJsonStreamParserAccumulator is used to accumulate the parsed output, not the unparsed data stream.