How does ChunkedInputStream work internally with Apache HttpClient 4.x? - amazon-s3

I am bit new to Apache HC API. I am trying to download huge file from server (10 GB) on cloud environment and then I have to upload on to Amazon S3.
As file is too big, it comes up with transfer encoding as chunked and gzip format. Cloud env neither have enough disk space to store this file as temp file nor such file can be accommodated in memory.
Mainly I have 2 interfaces,
ResourceDownloader {
InputStream download(AbstractChannel channel);
}
ResourceUploader{
void upload(AbstractChannel channel, InputStream inputStream);
}
Part1:
While using Apache Httpclient lib, I am seeing return http response with following structure,
ResponseEntityProxy contains >> {
- BasicHttpEntity [WrappedEntity]
- content as ChunkeInputStream
}
So does this response mean whole 10GB would be available in memory byte buffer at client side after completion of client.execute(getMethod) call?
Or is it like as soon as I invoke call to read as below, it would get chunks from server?
[In real case, disk would not be available , but below is just for demo]
try {
FileOutputStream fos = (FileOutputStream) outputStream;
if(inputStream instanceof GZIPInputStream) {
byte[] buffer = new byte[1024];
int len;
while((len = inputStream.read(buffer)) != -1){
fos.write(buffer, 0, len);
}
//close resources
fos.close();
inputStream.close();
}
} catch (IOException e) {
logger.error("Exception occurred while processing file on disk", e);
}
Part2:
I know multi part upload if I have content length or full file available, but in case of chunked input stream, how should we upload it to Amazon S3?
Thanks,
Dharam

HttpClient always streams request and response entities unless specifically instructed to do otherwise.

Related

Design Minimal API and use HttpClient to post a file to it

I have a legacy system interfacing issue that my team has elected to solve by standing up a .NET 7 Minimal API which needs to accept a file upload. It should work for small and large files (let's say at least 500 MiB). The API will be called from a legacy system using HttpClient in a .NET Framework 4.7.1 app.
I can't quite seem to figure out how to design the signature of the Minimal API and how to call it with HttpClient in a way that totally works. It's something I've been hacking at on and off for several days, and haven't documented all of my approaches, but suffice it to say there have been varying results involving, among other things:
4XX and 500 errors returned by the HTTP call
An assortment of exceptions on either side
Calls that throw and never hit a breakpoint on the API side
Calls that get through but the Stream on the API end is not what I expect
Errors being different depending on whether the file being uploaded is small or large
Text files being persisted on the server that contain some of the HTTP headers in addition to their original contents
On the Minimal API side, I've tried all sorts of things in the signature (IFormFile, Stream, PipeReader, HttpRequest). On the calling side, I've tried several approaches (messing with headers, using the Flurl library, various content encodings and MIME types, multipart, etc).
This seems like it should be dead simple, so I'm trying to wipe the slate clean here, start with an example of something that partially works, and hope someone might be able to illuminate the path forward for me.
Example of Minimal API:
// IDocumentStorageManager is an injected dependency that takes an int and a Stream and returns a string of the newly uploaded file's URI
app.MapPost(
"DocumentStorage/CreateDocument2/{documentId:int}",
async (PipeReader pipeReader, int documentId, IDocumentStorageManager documentStorageManager) =>
{
using var ms = new MemoryStream();
await pipeReader.CopyToAsync(ms);
ms.Position = 0;
return await documentStorageManager.CreateDocument(documentId, ms);
});
Call the Minimal API using HttpClient:
// filePath is the path on local disk, uri is the Minimal API's URI
private static async Task<string> UploadWithHttpClient2(string filePath, string uri)
{
var fileStream = File.Open(filePath, FileMode.Open);
var content = new StreamContent(fileStream);
var httpRequestMessage = new HttpRequestMessage(HttpMethod.Post, uri);
var httpClient = new HttpClient();
httpRequestMessage.Content = content;
httpClient.Timeout = TimeSpan.FromMinutes(5);
var result = await httpClient.SendAsync(httpRequestMessage);
return await result.Content.ReadAsStringAsync();
}
In the particular example above, a small (6 bytes) .txt file is uploaded without issue. However, a large (619 MiB) .tif file runs into problems on the call to httpClient.SendAsync which results in the following set of nested Exceptions:
System.Net.Http.HttpRequestException - "Error while copying content to a stream."
System.IO.IOException - "Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.."
System.Net.Sockets.SocketException - "An existing connection was forcibly closed by the remote host."
What's a decent way of writing a Minimal API and calling it with HttpClient that will work for small and large files?
Kestrel allows uploading 30MB per default.
To upload larger files via kestrel you might need to increase the max size limit. This can be done by adding the "RequestSizeLimit" attribute. So for example for 1GB:
app.MapPost(
"DocumentStorage/CreateDocument2/{documentId:int}",
[RequestSizeLimit(1_000_000_000)] async (PipeReader pipeReader, int documentId) =>
{
using var ms = new MemoryStream();
await pipeReader.CopyToAsync(ms);
ms.Position = 0;
return "";
});
You can also remove the size limit globally by setting
builder.WebHost.UseKestrel(o => o.Limits.MaxRequestBodySize = null);
This answer is good but the RequestSizeLimit filter doesn't work for minimal APIs, it's an MVC filter. You can use the IHttpMaxRequestBodySizeFeature to limit the size (assuming you're not running on IIS). Also, I made a change to accept the body as a Stream. This avoids the memory stream copy before calling the CreateDocument API:
app.MapPost(
"DocumentStorage/CreateDocument2/{documentId:int}",
async (Stream stream, int documentId, IDocumentStorageManager documentStorageManager) =>
{
return await documentStorageManager.CreateDocument(documentId, stream);
})
.AddEndpointFilter((context, next) =>
{
const int MaxBytes = 1024 * 1024 * 1024;
var maxRequestBodySizeFeature = context.HttpContext.Features.Get<IHttpMaxRequestBodySizeFeature>();
if (maxRequestBodySizeFeature is not null and { IsReadOnly: true })
{
maxRequestBodySizeFeature.MaxRequestBodySize = MaxBytes;
}
return next(context);
});
If you're running on IIS then https://learn.microsoft.com/en-us/iis/configuration/system.webserver/security/requestfiltering/requestlimits/#configuration

Java AWS SDK v1 - S3 API - Not able to upload multiple files parallely using multipart api

I have 5 files of size 200 MB each. I am uploading these files parallel y using Executor service and
using TransferManager with multipart threashold = 50 MB and waiting for upload to finish by using blocking method call to upload.waitForCompletion() (which says current thread suspends until
upload succeeds or throws error).
Please find below code excerpts:
private static final ExecutorService executor = Executors.newFixedThreadPool(10);
executor.execute(() -> upload("bucketName", new File(fullFilePath)));
public static void upload(final String bucketName, final File filePath) {
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(getCredentals()))
.withRegion(Regions.DEFAULT_REGION).build();
TransferManager tm = TransferManagerBuilder.standard().withS3Client(amazonS3)
.withMultipartUploadThreshold((long) (50 * 1024 * 1025)).build();
final String fileName = filePath.getName();
try {
Upload upload = tm.upload(s3Bucket, fileName, filePath);
upload.waitForCompletion();
log.info("Successfully uploaded file = " + fileName);
} catch (Exception e) {
log.info("Upload failed for file = " + fileName);
log.error(e);
}
}
Main thread does not exits until "Successfully uploaded" is there for all 5 files.
Now this program does not throw any error and prints success for all 5 files but when i open the bucket in aws console, nothing is there.
Can anyone suggest what might be happening here or how to further debug it ?
Code is working fine i was looking into the wrong bucket.

Amazon S3 Response DotNetZip MVC

My application (MVC) needs to download, zip and return one or many files from Amazon S3. I am using the .NET SDK and GetObject to receive the files, and want to use DotNetZip to then zip them up and return the generated zip file as a file stream result for the user to download.
Can anyone suggest the most efficient way of doing this, I am seeing OutOfMemory exceptions when downloading large files from S3, they could be up to 1gb in size for example.
My code so far;
using (
var client = AWSClientFactory.CreateAmazonS3Client(
"apikey",
"apisecret",
new AmazonS3Config { RegionEndpoint = RegionEndpoint.EUWest1 })
)
{
foreach (var file in files)
{
var request = new GetObjectRequest { BucketName = "bucketname", Key = file };
using (var response = client.GetObject(request))
{
}
}
}
If I copy the response into a memory stream and add that to the zip, all works ok (on small files), but with large files assume I cannot store the entire thing in memory?

Replacing files throws UnauthorizedAccessException in WinRT / Win8

Every once in a while, StorageFiles get locked and I get an UnauthorizedAccessException when trying to overwrite them. I cannot replicate this, it only happens randomly. This is the code for creating files:
using (var stream = new MemoryStream())
{
// ...populate stream with serialized data...
StorageFile file;
Stream fileStream;
try
{
file = await folder.CreateFileAsync(fileName, CreationCollisionOption.ReplaceExisting);
}
catch (UnauthorizedAccessException ex)
{
Debug.WriteLine("Access denied on file {0}", fileName);
return;
}
fileStream = await file.OpenStreamForWriteAsync();
using (fileStream)
{
stream.Seek(0, SeekOrigin.Begin);
await stream.CopyToAsync(fileStream);
await fileStream.FlushAsync();
}
}
Once a file starts throwing UnauthorizedAccessException, it will always throw it. As if the system has the file locked and I cannot touch it. I have to uninstall the application and rebuild.
When I open the file in my document, I can see that data there. Everything is fine. It was written successfully.
Can anyone see a problem with my code?
Are you saving the file token in the future access list? I ran into this problem when loading files and trying to save updates later. Once I started using the future access list, the problems went away.
http://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.accesscache.storageitemaccesslist
It might be the case when the same file is being accessed from two different points in the code at the same time.

Interrupt reading incoming request in WCF REST

I have a WCF REST service running in streaming (unbuffered) mode that is receiving a file upload as raw bytes in the body of an HTTP request. Before reading the incoming stream (a MessageBodyStream), I check the request headers and make sure that the Content-Length is the right size for that particular operation.
If the Content-Length is larger than the allowed size, I'd like to immediate return an error response (by throwing a WebFaultException) without waiting for the rest of the request to be transferred.
However, it seems that WCF tries to read the stream to the end even after the exception is thrown -- if the client is sending a 50 MB file, all 50 MB will be transferred before a response is sent.
Is there any way to avoid this, and to interrupt receiving the HTTP request?
Related question: Why is WCF reading input stream to EOF on Close()?
EDIT: Added code excerpt
The OperationContract and the upload helper method:
[OperationContract]
[WebInvoke(UriTemplate = /* ... */, Method = "POST",
ResponseFormat = WebMessageFormat.Json, BodyStyle = WebMessageBodyStyle.Bare)]
public void UploadMyFile(string guid, Stream fileStream)
{
string targetPath = /* targetPath */;
UploadFile(fileStream, targetPath, /* targetFileName */);
}
private bool UploadFile(Stream stream, string targetDirectory,
string targetFileName = null, int maximumSize = 1024 * 500,
Func<string, bool> tempFileValidator = null)
{
int size = 0;
int.TryParse(IncomingRequest.Headers[HttpRequestHeader.ContentLength], out size);
if (size == 0)
{
ThrowJsonException(HttpStatusCode.LengthRequired, "Valid Content-Length required");
}
else if (size > maximumSize)
{
ThrowJsonException(HttpStatusCode.RequestEntityTooLarge, "File is too big");
}
if (!FileSystem.SaveFileFromStream(stream, targetDirectory, targetFileName, tempFileValidator))
{
ThrowJsonException(HttpStatusCode.InternalServerError, "Saving file failed");
}
return true;
}
You can probably write the message inspector component which can intercept the body and context of the request.
You can then throw exception (if you like) after inspecting the message/context.
HTH
HttpWebRequest and generally HTTP Request does not support streaming which underneath is implemented by chunked encoding which is a server side concept.
I just answered this question too which is related:
IIS7 refuses chunked-encoded file upload
When you are writing to request stream, it is being buffered locally until sent since it has to pupate the content-length header. You can verify this in fiddler.