AmazonS3: Getting warning: S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection - amazon-s3

Here's the warning that I am getting:
S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
I tried using try with resources but S3ObjectInputStream doesn't seem to close via this method.
try (S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah blah
}
I also tried below code and explicitly closing but that doesn't work either:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah
s3ObjectInputStream.close();
s3object.close();
}
Any help would be appreciated.
PS: I am only reading two lines of the file from S3 and the file has more data.

Got the answer via other medium. Sharing it here:
The warning indicates that you called close() without reading the whole file. This is problematic because S3 is still trying to send the data and you're leaving the connection in a sad state.
There's two options here:
Read the rest of the data from the input stream so the connection can be reused.
Call s3ObjectInputStream.abort() to close the connection without reading the data. The connection won't be reused, so you take some performance hit with the next request to re-create the connection. This may be worth it if it's going to take a long time to read the rest of the file.

Following option #1 of Chirag Sejpal's answer I used the below statement to drain the S3AbortableInputStream to ensure the connection can be reused:
com.amazonaws.util.IOUtils.drainInputStream(s3ObjectInputStream);

I ran into the same problem and the following class helped me
#Data
#AllArgsConstructor
public class S3ObjectClosable implements Closeable {
private final S3Object s3Object;
#Override
public void close() throws IOException {
s3Object.getObjectContent().abort();
s3Object.close();
}
}
and now you can use without warning
try (final var s3ObjectClosable = new S3ObjectClosable(s3Client.getObject(bucket, key))) {
//same code
}

To add an example to Chirag Sejpal's answer (elaborating on option #1), the following can be used to read the rest of the data from the input stream before closing it:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
try (S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent()) {
try {
// Read from stream as necessary
} catch (Exception e) {
// Handle exceptions as necessary
} finally {
while (s3ObjectInputStream != null && s3ObjectInputStream.read() != -1) {
// Read the rest of the stream
}
}
// The stream will be closed automatically by the try-with-resources statement
}

I ran into the same error.
As others have pointed out, the /tmp space in lambda is limited to 512 MB.
And if the lambda context is re-used for a new invocation, then the /tmp space is already half-full.
So, when reading the S3 objects and writing all the files to the /tmp directory (as I was doing),
I ran out of disk space somewhere in between.
Lambda exited with error, but NOT all bytes from the S3ObjectInputStream were read.
So, two things one need to keep in mind:
1) If the first execution causes the problem, be stingy with your /tmp space.
We have only 512 MB
2) If the second execution causes the problem, then this could be resolved by attacking the root problem.
Its not possible to delete the /tmp folder.
So, delete all the files in the /tmp folder after the execution is finished.
In java, here is what I did, which successfully resolved the problem.
public String handleRequest(Map < String, String > keyValuePairs, Context lambdaContext) {
try {
// All work here
} catch (Exception e) {
logger.error("Error {}", e.toString());
return "Error";
} finally {
deleteAllFilesInTmpDir();
}
}
private void deleteAllFilesInTmpDir() {
Path path = java.nio.file.Paths.get(File.separator, "tmp", File.separator);
try {
if (Files.exists(path)) {
deleteDir(path.toFile());
logger.info("Successfully cleaned up the tmp directory");
}
} catch (Exception ex) {
logger.error("Unable to clean up the tmp directory");
}
}
public void deleteDir(File dir) {
File[] files = dir.listFiles();
if (files != null) {
for (final File file: files) {
deleteDir(file);
}
}
dir.delete();
}

This is my solution. I'm using spring boot 2.4.3
Create an amazon s3 client
AmazonS3 amazonS3Client = AmazonS3ClientBuilder
.standard()
.withRegion("your-region")
.withCredentials(
new AWSStaticCredentialsProvider(
new BasicAWSCredentials("your-access-key", "your-secret-access-key")))
.build();
Create an amazon transfer client.
TransferManager transferManagerClient = TransferManagerBuilder.standard()
.withS3Client(amazonS3Client)
.build();
Create a temporary file in /tmp/{your-s3-key} so that we can put the file we download in this file.
File file = new File(System.getProperty("java.io.tmpdir"), "your-s3-key");
try {
file.createNewFile(); // Create temporary file
} catch (IOException e) {
e.printStackTrace();
}
file.mkdirs(); // Create the directory of the temporary file
Then, we download the file from s3 using transfer manager client
// Note that in this line the s3 file downloaded has been transferred in to the temporary file that we created
Download download = transferManagerClient.download(
new GetObjectRequest("your-s3-bucket-name", "your-s3-key"), file);
// This line blocks the thread until the download is finished
download.waitForCompletion();
Now that the s3 file has been successfully transferred into the temporary file that we created. We can get the InputStream of the temporary file.
InputStream input = new DataInputStream(new FileInputStream(file));
Because the temporary file is not needed anymore, we just delete it.
file.delete();

Related

Very slow AWS presignedURL download. How to speed it up?

I am reading 20 - 30 different objects in varying size using the S3 IAM and a unique presignedURL for each file. The download of all files occur at once. Each phase occurs in sequence. Unfortunatly the S3Client is not thread safe so we cannot use async operations. Some files transfer rapidly while others lag. The total operation can take between 7 to > 15 seconds. I expected greater performance from S3 since AWS advertises that it has high throughput.
I see several posts that are unanswered about the download performance from S3. However the problem seems to have increased once we introduced link ambiguation using the IAM and presignedURL.
FYI my internet connection is broadband. It is unlikely the cause of the performance issue.
The tests are performed only a few hundred miles from S3 storage and eliminates distance as a factor of performance issues.
There is no server between the client and the S3 for downloading objects and is not the cause of performance issues.
One caveat. We tried using async forAllChunked using the Rice.edu habenero api. When we did not have any errors due to threading problems, the download performance was still very slow. This seemingly should eleminate the idea that download performance is slow due to it's serialization in the for loop. Albiet performance should be far better if we can download files simultainiously.
Code attatched.
public void cloudGetMedia(ArrayList<MediaSyncObj> mediaObjs, ArrayList<String> signedUrls) {
long getTime = System.currentTimeMillis();
// Ensure media directory exists or create it
String toDiskDir = DirectoryMgr.getMediaPath('M');
File diskFile = new File(toDiskDir);
FileOpsUtil.folderExists(diskFile);
// Process signedURLs
for(String signedurl : signedUrls) {
LOGGER.debug("cloudGetMedia called. signedURL is null: {}", signedurl == null);
URI fileToBeDownloaded = null;
try {
fileToBeDownloaded = new URI(signedurl);
} catch (URISyntaxException e) {
e.printStackTrace();
}
// get the file name from the presignedURL
AmazonS3URI s3URI = new AmazonS3URI(fileToBeDownloaded);
String localURL = toDiskDir + "/" + s3URI.getKey();
File file = new File(localURL);
AmazonS3 client = AmazonS3ClientBuilder.standard()
.withRegion(s3URI.getRegion())
.build();
try{
URL url = new URL(signedurl);
PresignedUrlDownloadRequest req = new PresignedUrlDownloadRequest(url);
client.download(req, file);
}
catch (MalformedURLException e) {
LOGGER.warn(e.getMessage());
e.printStackTrace();
}
}
getTime = (System.currentTimeMillis() - getTime);
LOGGER.debug("Total get time in syncCloudMediaAction: {} milliseconds, numElement: {}", getTime, signedUrls.size());
}

UWP App: `Image.SetSource` hangs computer on `StorageFiles` outside of `KnownPlaces`

This one is hard to explain, so I give you some actual and pseudo code:
try
{
// If source (a string) points towards a file that is available with
// StorageFile.GetFileFromPathAsync(), just open the file that way.
// If that is not possible, use the path to look up an Access Token
// and use the file from the StorageFolder gotten via that token.
StorageFile file = await GetFileFromAccessList(source);
if (file != null)
{
bitmap = new BitmapImage();
using (IRandomAccessStream fileStream = await file.OpenAsync(FileAccessMode.Read))
{
await bitmap.SetSourceAsync(fileStream);
}
}
}
catch (Exception e)
{
string s = e.Message;
bitmap = null;
}
with the following method:
public async Task<StorageFile> GetFileFromAccessList(string path)
{
StorageFile result = null;
if (String.IsNullOrEmpty(path) == false)
try
{
// Try to access to file directly...
result = await StorageFile.GetFileFromPathAsync(path);
}
catch (Exception)
{
result = null;
try
{
// See if the folder this thing is in is in the access list...
StorageFolder folder = await GetFolderFromAccessList(Path.GetFullPath(path));
// If there is a folder, try that.
if (folder != null)
result = await folder.GetFileAsync(Path.GetFileName(path));
}
catch (Exception)
{
result = null;
}
}
return result;
}
The resulting bitmap is used in Image.SetSource() as an ImageSource.
Now what kills me: this call works perfectly, fast and rock solid for files stored within the apps folder or KnownFolders. So it works like a charm when I don't need an Access Token. Windows.Storage.AccessCache.StorageApplicationPermissions.FutureAccessList.GetFolderAsync(token)
However, it breaks if I have to use an access token, just not all the time
This code does not break immediately: it breaks when I try to open more than 5-7 source files at the same time.
Repeat that: this works if I display 5-7 images. If I try to open more, it freezes the PC. No such problem occurs when I open StorageFiles without tokens.
I can access such files using normal file operations. I can create bitmaps from them, process them, the work.
I just cannot make them a source of an XAML Image.
Any thoughts?
Ah clarity.
So it turns out that using the DataContextChanged event to refresh the bitmap through Image.SetSource() is the murder weapon.
The solution: declare a property of type BitmapSource. Bind the Image.Source to that property. Update the property with the loaded bitmap upon Image.Loaded and Image.DataContextChanged. Works stable and fast now in all conditions I was able to test.

Copy Table from Database from Dropbox to App

I had written following code to copy database file from dropbox account to app database.But i want to copy only specific table from Dropbox to App database table.Is it possible.
protected Boolean doInBackground(Void... params) {
final File tempDir = context.getCacheDir();
File tempFile;
FileWriter fr;
try {
File data = Environment.getDataDirectory();
String currentDBPath = "//data//"+ "loginscreen.example.com.girviapp" +"//databases//"+DATABASE_NAME;
File currentDB = new File(data, currentDBPath);
FileInputStream fileInputStream = new FileInputStream(currentDB);
Entry existingentry= dropbox.metadata( path ,1000,null,true,null);
if (existingentry.contents.size() != 0)
{
for (Entry ent :existingentry.contents)
{
String name = ent.fileName();
if(name.equals(DATABASE_NAME))
{ FileOutputStream outputStream = new FileOutputStream(currentDB);
DropboxFileInfo info = dropbox.getFile(path + DATABASE_NAME, null, outputStream, null);
return true;
}
}
}
} catch (IOException e) {
e.printStackTrace();
} catch (DropboxException e) {
e.printStackTrace();
}
return false;
}
The answer here depends on exactly what you're looking to do.
If you want to be able to download only a specific table from the database file stored on Dropbox, in order to save bandwidth by not downloading all of the data, then no, this isn't possible. The unit of storage in the Dropbox API you're using is the file, and Dropbox doesn't know what the individual tables in your database file are, so it doesn't have a way for you to specify a particular table.
On the other hand, if you don't mind downloading the entire file from Dropbox, then yes, this is possible. Just download the file as you're doing, and then load it up as a database locally. Then query out just the data you want, that is, the desired table, and do what you need with it.

context path for file upload without HttpRequest in REST application

I am building REST application. I want to upload a file and I want to save it for example in /WEB-INF/resource/uploads
How can I get path to this directory ? My Controller looks like this
#RequestMapping(value = "/admin/house/update", method = RequestMethod.POST)
public String updateHouse(House house, #RequestParam("file") MultipartFile file, Model model) {
try {
String fileName = null;
InputStream inputStream = null;
OutputStream outputStream = null;
if (file.getSize() > 0) {
inputStream = file.getInputStream();
fileName = "D:/" + file.getOriginalFilename();
outputStream = new FileOutputStream(fileName);
int readBytes = 0;
byte[] buffer = new byte[10000];
while ((readBytes = inputStream.read(buffer, 0, 10000)) != -1) {
outputStream.write(buffer, 0, readBytes);
}
outputStream.close();
inputStream.close();
}
} catch(Exception ex) {
ex.printStackTrace();
}
model.addAttribute("step", 3);
this.houseDao.update(house);
return "houseAdmin";
}
Second question...what is the best place to upload user files ?
/WEB-INF is a bad place to try to store file uploads. There's no guarantee that this is an actual directory on the disk, and even if it is, the appserver may forbid write access to it.
Where you should store your files depends on what you want to do with them, and what operating system you're running on. Just pick somewhere outside of the webapp itself, is my advice. Perhaps create a dedicated directory
Also, the process of transferring the MultipartFile to another location is much simpler than you're making it out to be:
#RequestMapping(value = "/admin/house/update", method = RequestMethod.POST)
public String updateHouse(House house, #RequestParam("file") MultipartFile srcFile, Model model) throws IOException {
File destFile = new File("/path/to/the/target/file");
srcFile.transferTo(destFile); // easy!
model.addAttribute("step", 3);
this.houseDao.update(house);
return "houseAdmin";
}
You shouldn't store files in /WEB-INF/resource/uploads. This directory is either inside your WAR (if packaged) or exploded somewhere inside servlet container. The first destination is read-only and the latter should not be used for user files.
There are usually two places considered when storing uploaded files:
Some dedicated folder. Make sure users cannot access this directory directly (e.g. anonymous FTP folder). Note that once your application runs on more than one machine you won't have access to this folder. So consider some form of network synchronization or a shared network drive.
Database. This is controversial since binary files tend to occupy a lot of space. But this approach is a bit simpler when distributing your application.

Can I have simultaneous streams on one physical file

I have a wcf service that allow clients to download some files. Although there is a new instance of service for every client's request, if two clients try to download same file at the same time, first request to arrive locks the file until it is finished with it. So the other client is actually waiting for first client to finish as there is no multiple services. There must be a way to avoid this.
Is there anyone who knows how I can avoid this without having multiple files on servers hard disk? Or am I doing something totally wrong?
this is server side code:
`public Stream DownloadFile(string path)
{
System.IO.FileInfo fileInfo = new System.IO.FileInfo(path);
// check if exists
if (!fileInfo.Exists) throw new FileNotFoundException();
// open stream
System.IO.FileStream stream = new System.IO.FileStream(path, System.IO.FileMode.Open, System.IO.FileAccess.Read);
// return result
return stream;
}`
this is client side code:
public void Download(string serverPath, string path)
{
Stream stream;
try
{
if (System.IO.File.Exists(path)) System.IO.File.Delete(path);
serviceStreamed = new ServiceStreamedClient("NetTcpBinding_IServiceStreamed");
SimpleResult<long> res = serviceStreamed.ReturnFileSize(serverPath);
if (!res.Success)
{
throw new Exception("File not found: \n" + serverPath);
}
// get stream from server
stream = serviceStreamed.DownloadFile(serverPath);
// write server stream to disk
using (System.IO.FileStream writeStream = new System.IO.FileStream(path, System.IO.FileMode.CreateNew, System.IO.FileAccess.Write))
{
int chunkSize = 1 * 48 * 1024;
byte[] buffer = new byte[chunkSize];
OnTransferStart(new TransferStartArgs());
do
{
// read bytes from input stream
int bytesRead = stream.Read(buffer, 0, chunkSize);
if (bytesRead == 0) break;
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
// report progress from time to time
OnProgressChanged(new ProgressChangedArgs(writeStream.Position));
} while (true);
writeStream.Close();
stream.Dispose();
}
}
catch (Exception ex)
{
throw ex;
}
finally
{
if (serviceStreamed.State == System.ServiceModel.CommunicationState.Opened)
{
serviceStreamed.Close();
}
OnTransferFinished(new TransferFinishedArgs());
}
}
I agree with Mr. Kjörling, it's hard to help without seeing what you're doing. Since you're just downloading files from your server, why are you opening it as R/W (causing the lock). If you open it as read only, then it won't lock. Please don't mod down if my suggestion is lacking as it is only my interpretation of the problem w/o a lot of information.
Try this, it should enable two threads to read the file concurrently and independently:
System.IO.FileStream stream = new System.IO.FileStream(path, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read);