deleteObjects using AWS SDK V2? - amazon-s3

I'm trying to migrate from AWS SDK V1.x to V2.2. I can't figure out the deleteObjects method though. I've found a bunch of examples - all the same one :-( that doesn't appear to ever use the list of objects to delete (i.e. the list is present, but never set in the DeleteObjectsRequest object - I assume that is where it should be set, but don't see where). How/where do I provide the object list? The examples I find are:
System.out.println("Deleting objects from S3 bucket: " + bucket_name);
for (String k : object_keys) {
System.out.println(" * " + k);
}
Region region = Region.US_WEST_2;
S3Client s3 = S3Client.builder().region(region).build();
try {
DeleteObjectsRequest dor = DeleteObjectsRequest.builder()
.bucket(bucket_name)
.build();
s3.deleteObjects(dor);
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}

The previously accepted answer was very helpful even though it wasn't 100% complete. I used it to write the following method. It basically works though I haven't tested its error handling particularly well.
An array of String keys is passed in, which are converted into the array of
ObjectIdentifier's that deleteObjects requires.
s3Client and log are assumed to have been initialized elsewhere. Feel free to remove the logging if you don't need it.
The method currently returns the number of successfully deletes.
public int deleteS3Objects(String bucketName, String[] keys) {
List<ObjectIdentifier> objIds = Arrays.stream(keys)
.map(key -> ObjectIdentifier.builder().key(key).build())
.collect(Collectors.toList());
try {
DeleteObjectsRequest dor = DeleteObjectsRequest.builder()
.bucket(bucketName)
.delete(Delete.builder().objects(objIds).build())
.build();
DeleteObjectsResponse delResp = s3client.deleteObjects(dor);
if (delResp.errors().size() > 0) {
String err = String.format("%d errors returned while deleting %d objects",
delResp.errors().size(), objIds.size());
log.warn(err);
}
if (delResp.deleted().size() < objIds.size()) {
String err = String.format("%d of %d objects deleted",
delResp.deleted().size(), objIds.size());
log.warn(err);
}
return delResp.deleted().size();
}
catch(AwsServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it, so it returned an error response.
log.error("Error received from S3 while attempting to delete objects", e);
}
catch(SdkClientException e) {
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
log.error("Exception occurred while attempting to delete objects from S3", e);
}
return 0;
}
(Gratuitous commentary: I find it odd that in AWS SDK v2.3.9, deleteObjects requires Delete.Builder and ObjectIdentifier keys but getObject and putObject accept String keys. Why doesn't DeleteObjectsRequest.Builder just have a keys() method? They haven't officially said the SDK is production-ready AFAIK so some of this may be subject to change.)

Looks some more objects are needed to assign the key of the object in s3. This is untested, I put the links to the methods at the end.
System.out.println("Deleting objects from S3 bucket: " + bucket_name);
for (String k : object_keys) {
System.out.println(" * " + k);
}
Region region = Region.US_WEST_2;
S3Client s3 = S3Client.builder().region(region).build();
try {
ObjectIdentifier objectToDelete = ObjectIdentifier.Builder()
.key(KEY_OBJECT_TO_DELETE);
Delete delete_me Delete.Builder.objects(objectToDelete)
DeleteObjectsRequest dor = DeleteObjectsRequest.builder()
.bucket(bucket_name)
.delete(delete_me)
.build();
s3.deleteObjects(dor);
} catch (S3Exception e) {
System.err.println(e.awsErrorDetails().errorMessage());
System.exit(1);
}
Key to delete
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/ObjectIdentifier.html#key--
Delete object
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/ObjectIdentifier.Builder.html

Related

Unable to connect to AWS S3 bucket through my DLL Project (written in C++)

I’m trying to integrate AWS S3 bucket into my project (a DLL written in C++). I was successful in creating an independent console application based on the AWS SDK provided for C++ that connects to the bucket and uploads a specific file as per requirement.
When I used the same code, and added all the dependencies (dlls library and header files) the DLL project compiled with no error. However, the same could not connect to the s3 instance while running. There’s no visible error yet I could figure out that the code could not retrieve any bucket list. I’m sure I missed something but not sure what it is.
FYI, for simplicity, I tried to connect to the AWS bucket just once using the:
InitAPI(options)
{
………..
………..
}
Aws::ShutdownAPI( options)
The simplistic overview of my code:
bool FindTheBucket(const Aws::S3::S3Client& s3Client,
const Aws::String& bucketName)
{
Aws::S3::Model::ListBucketsOutcome outcome = s3Client.ListBuckets();
if (outcome.IsSuccess())
{
Aws::Vector<Aws::S3::Model::Bucket> bucket_list =
outcome.GetResult().GetBuckets();
for (Aws::S3::Model::Bucket const& bucket : bucket_list)
{
if (bucket.GetName() == bucketName)
return true;
}
return false;
}
else
return false;
}
And the snippet in my main function where the SDK is called....
Aws::SDKOptions options;
options.loggingOptions.logLevel = Aws::Utils::Logging::LogLevel::Info;
Aws::InitAPI(options);
{
Aws::String bucket_name = "testbucket"; // a bucket of the same name exists
Aws::String region = "us-west-2";
Aws::Client::ClientConfiguration config;
config.region = region;
Aws::S3::S3Client s3_client(config);
if (FindTheBucket(s3_client, bucket_name))
{
//do the needful
std::ofstream myFile(str.c_str());
myFile.close();
PutObjectBuffer(bucket_name, str.c_str(),
layerstr, region);
}
else
{
//some code…
}
}
Aws::ShutdownAPI(options);

Very slow AWS presignedURL download. How to speed it up?

I am reading 20 - 30 different objects in varying size using the S3 IAM and a unique presignedURL for each file. The download of all files occur at once. Each phase occurs in sequence. Unfortunatly the S3Client is not thread safe so we cannot use async operations. Some files transfer rapidly while others lag. The total operation can take between 7 to > 15 seconds. I expected greater performance from S3 since AWS advertises that it has high throughput.
I see several posts that are unanswered about the download performance from S3. However the problem seems to have increased once we introduced link ambiguation using the IAM and presignedURL.
FYI my internet connection is broadband. It is unlikely the cause of the performance issue.
The tests are performed only a few hundred miles from S3 storage and eliminates distance as a factor of performance issues.
There is no server between the client and the S3 for downloading objects and is not the cause of performance issues.
One caveat. We tried using async forAllChunked using the Rice.edu habenero api. When we did not have any errors due to threading problems, the download performance was still very slow. This seemingly should eleminate the idea that download performance is slow due to it's serialization in the for loop. Albiet performance should be far better if we can download files simultainiously.
Code attatched.
public void cloudGetMedia(ArrayList<MediaSyncObj> mediaObjs, ArrayList<String> signedUrls) {
long getTime = System.currentTimeMillis();
// Ensure media directory exists or create it
String toDiskDir = DirectoryMgr.getMediaPath('M');
File diskFile = new File(toDiskDir);
FileOpsUtil.folderExists(diskFile);
// Process signedURLs
for(String signedurl : signedUrls) {
LOGGER.debug("cloudGetMedia called. signedURL is null: {}", signedurl == null);
URI fileToBeDownloaded = null;
try {
fileToBeDownloaded = new URI(signedurl);
} catch (URISyntaxException e) {
e.printStackTrace();
}
// get the file name from the presignedURL
AmazonS3URI s3URI = new AmazonS3URI(fileToBeDownloaded);
String localURL = toDiskDir + "/" + s3URI.getKey();
File file = new File(localURL);
AmazonS3 client = AmazonS3ClientBuilder.standard()
.withRegion(s3URI.getRegion())
.build();
try{
URL url = new URL(signedurl);
PresignedUrlDownloadRequest req = new PresignedUrlDownloadRequest(url);
client.download(req, file);
}
catch (MalformedURLException e) {
LOGGER.warn(e.getMessage());
e.printStackTrace();
}
}
getTime = (System.currentTimeMillis() - getTime);
LOGGER.debug("Total get time in syncCloudMediaAction: {} milliseconds, numElement: {}", getTime, signedUrls.size());
}

AmazonS3: Getting warning: S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection

Here's the warning that I am getting:
S3AbortableInputStream:Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
I tried using try with resources but S3ObjectInputStream doesn't seem to close via this method.
try (S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah blah
}
I also tried below code and explicitly closing but that doesn't work either:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(s3ObjectInputStream, StandardCharsets.UTF_8));
){
//some code here blah blah
s3ObjectInputStream.close();
s3object.close();
}
Any help would be appreciated.
PS: I am only reading two lines of the file from S3 and the file has more data.
Got the answer via other medium. Sharing it here:
The warning indicates that you called close() without reading the whole file. This is problematic because S3 is still trying to send the data and you're leaving the connection in a sad state.
There's two options here:
Read the rest of the data from the input stream so the connection can be reused.
Call s3ObjectInputStream.abort() to close the connection without reading the data. The connection won't be reused, so you take some performance hit with the next request to re-create the connection. This may be worth it if it's going to take a long time to read the rest of the file.
Following option #1 of Chirag Sejpal's answer I used the below statement to drain the S3AbortableInputStream to ensure the connection can be reused:
com.amazonaws.util.IOUtils.drainInputStream(s3ObjectInputStream);
I ran into the same problem and the following class helped me
#Data
#AllArgsConstructor
public class S3ObjectClosable implements Closeable {
private final S3Object s3Object;
#Override
public void close() throws IOException {
s3Object.getObjectContent().abort();
s3Object.close();
}
}
and now you can use without warning
try (final var s3ObjectClosable = new S3ObjectClosable(s3Client.getObject(bucket, key))) {
//same code
}
To add an example to Chirag Sejpal's answer (elaborating on option #1), the following can be used to read the rest of the data from the input stream before closing it:
S3Object s3object = s3Client.getObject(new GetObjectRequest(bucket, key));
try (S3ObjectInputStream s3ObjectInputStream = s3object.getObjectContent()) {
try {
// Read from stream as necessary
} catch (Exception e) {
// Handle exceptions as necessary
} finally {
while (s3ObjectInputStream != null && s3ObjectInputStream.read() != -1) {
// Read the rest of the stream
}
}
// The stream will be closed automatically by the try-with-resources statement
}
I ran into the same error.
As others have pointed out, the /tmp space in lambda is limited to 512 MB.
And if the lambda context is re-used for a new invocation, then the /tmp space is already half-full.
So, when reading the S3 objects and writing all the files to the /tmp directory (as I was doing),
I ran out of disk space somewhere in between.
Lambda exited with error, but NOT all bytes from the S3ObjectInputStream were read.
So, two things one need to keep in mind:
1) If the first execution causes the problem, be stingy with your /tmp space.
We have only 512 MB
2) If the second execution causes the problem, then this could be resolved by attacking the root problem.
Its not possible to delete the /tmp folder.
So, delete all the files in the /tmp folder after the execution is finished.
In java, here is what I did, which successfully resolved the problem.
public String handleRequest(Map < String, String > keyValuePairs, Context lambdaContext) {
try {
// All work here
} catch (Exception e) {
logger.error("Error {}", e.toString());
return "Error";
} finally {
deleteAllFilesInTmpDir();
}
}
private void deleteAllFilesInTmpDir() {
Path path = java.nio.file.Paths.get(File.separator, "tmp", File.separator);
try {
if (Files.exists(path)) {
deleteDir(path.toFile());
logger.info("Successfully cleaned up the tmp directory");
}
} catch (Exception ex) {
logger.error("Unable to clean up the tmp directory");
}
}
public void deleteDir(File dir) {
File[] files = dir.listFiles();
if (files != null) {
for (final File file: files) {
deleteDir(file);
}
}
dir.delete();
}
This is my solution. I'm using spring boot 2.4.3
Create an amazon s3 client
AmazonS3 amazonS3Client = AmazonS3ClientBuilder
.standard()
.withRegion("your-region")
.withCredentials(
new AWSStaticCredentialsProvider(
new BasicAWSCredentials("your-access-key", "your-secret-access-key")))
.build();
Create an amazon transfer client.
TransferManager transferManagerClient = TransferManagerBuilder.standard()
.withS3Client(amazonS3Client)
.build();
Create a temporary file in /tmp/{your-s3-key} so that we can put the file we download in this file.
File file = new File(System.getProperty("java.io.tmpdir"), "your-s3-key");
try {
file.createNewFile(); // Create temporary file
} catch (IOException e) {
e.printStackTrace();
}
file.mkdirs(); // Create the directory of the temporary file
Then, we download the file from s3 using transfer manager client
// Note that in this line the s3 file downloaded has been transferred in to the temporary file that we created
Download download = transferManagerClient.download(
new GetObjectRequest("your-s3-bucket-name", "your-s3-key"), file);
// This line blocks the thread until the download is finished
download.waitForCompletion();
Now that the s3 file has been successfully transferred into the temporary file that we created. We can get the InputStream of the temporary file.
InputStream input = new DataInputStream(new FileInputStream(file));
Because the temporary file is not needed anymore, we just delete it.
file.delete();

Apache Tika Api consuming given stream

I use Apache Tika bundle dependency for a Project to find out MimeTypes for Files. due to some issues we have to find out through InputStream. it is actually guaranteed to mark / reset given InputStream. Tika-Bundle includes core and parser api and uses PoifscontainerDetector , ZipContainerDetector, OggDetector, MimeTypes and Magic for detection. I have been debugging for 3 hours and all of Detectors mark and reset after detection. I did it in following way.
TikaInputStream tis = null;
try {
TikaConfig config = new TikaConfig();
tikaDetector = config.getDetector();
tis = TikaInputStream.get(in);
MediaType mediaType = tikaDetector.detect(tis, new Metadata());
if (mediaType != null) {
String[] types = mediaType.toString().split(",");
for (int i = 0; i < types.length; i++) {
mimeTypes.add(new MimeType(types[i]));
}
}
} catch (Exception e) {
logger.error("Mime Type for given Stream could not be resolved: ", e);
}
But Stream is consumed. Does anyone know how to find out MimeTypes without consuming Stream?
This problem bugged me for a while too before I finally solved it. The problem is that, while Detector.detect() methods are required to mark and reset the stream, this resetting will have no effect on your original stream (the in variable) if marking is not supported in that stream.
In order to get this to work, I had to first convert my stream to a BufferedInputStream before doing anything else. I would then pass that buffered stream to the detect algorithm, and I would use that same buffered stream later for parsing, reading, or whatever I needed to do.
BufferedInputStream buffStream = new BufferedInputStream(in);
TikaInputStream tis = null;
try {
TikaConfig config = new TikaConfig();
tikaDetector = config.getDetector();
tis = TikaInputStream.get(buffStream);
MediaType mediaType = tikaDetector.detect(tis, new Metadata());
if (mediaType != null) {
String[] types = mediaType.toString().split(",");
for (int i = 0; i < types.length; i++) {
mimeTypes.add(new MimeType(types[i]));
}
}
} catch (Exception e) {
logger.error("Mime Type for given Stream could not be resolved: ", e);
}
// further along in my code...
doSomething(buffStream); // rather than doSomething(in)

Returning binary content from a JPF action with Weblogic Portal 10.2

One of the actions of my JPF controller builds up a PDF file and I would like to return this file to the user so that he can download it.
Is it possible to do that or am I forced to write the file somewhere and have my action forward a link to this file? Note that I would like to avoid that as much as possible for security reasons and because I have no way to know when the user has downloaded the file so that I can delete it.
I've tried to access the HttpServletResponse but nothing happens:
getResponse().setContentLength(file.getSize());
getResponse().setContentType(file.getMimeType());
getResponse().setHeader("Content-Disposition", "attachment;filename=\"" + file.getTitle() + "\"");
getResponse().getOutputStream().write(file.getContent());
getResponse().flushBuffer();
We have something similar, except returning images instead of a PDF; should be a similar solution, though, I'm guessing.
On a JSP, we have an IMG tag, where the src is set to:
<c:url value="/path/getImage.do?imageId=${imageID}" />
(I'm not showing everything, because I'm trying to simplify.) In your case, maybe it would be a link, where the href is done in a similar way.
That getImage.do maps to our JPF controller, obviously. Here's the code from the JPF getImage() method, which is the part you're trying to work on:
#Jpf.Action(forwards = {
#Jpf.Forward(name = FWD_SUCCESS, navigateTo = Jpf.NavigateTo.currentPage),
#Jpf.Forward(name = FWD_FAILURE, navigateTo = Jpf.NavigateTo.currentPage) })
public Forward getImage(final FormType pForm) throws Exception {
final HttpServletRequest lRequest = getRequest();
final HttpServletResponse lResponse = getResponse();
final HttpSession lHttpSession = getSession();
final String imageIdParam = lRequest.getParameter("imageId");
final long header = lRequest.getDateHeader("If-Modified-Since");
final long current = System.currentTimeMillis();
if (header > 0 && current - header < MAX_AGE_IN_SECS * 1000) {
lResponse.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
try {
if (imageIdParam == null) {
throw new IllegalArgumentException("imageId is null.");
}
// Call to EJB, which is retrieving the image from
// a separate back-end system
final ImageType image = getImage(lHttpSession, Long
.parseLong(imageIdParam));
if (image == null) {
lResponse.sendError(404, IMAGE_DOES_NOT_EXIST);
return null;
}
lResponse.setContentType(image.getType());
lResponse.addDateHeader("Last-Modified", current);
// public: Allows authenticated responses to be cached.
lResponse.setHeader("Cache-Control", "max-age=" + MAX_AGE_IN_SECS
+ ", public");
lResponse.setHeader("Expires", null);
lResponse.setHeader("Pragma", null);
lResponse.getOutputStream().write(image.getContent());
} catch (final IllegalArgumentException e) {
LogHelper.error(this.getClass(), "Illegal argument.", e);
lResponse.sendError(404, IMAGE_DOES_NOT_EXIST);
} catch (final Exception e) {
LogHelper.error(this.getClass(), "General exception.", e);
lResponse.sendError(500);
}
return null;
}
I've actually removed very little from this method, because there's very little in there that I need to hide from prying eyes--the code is pretty generic, concerned with images, not with business logic. (I changed some of the data type names, but no big deal.)