Handling Streaming TarArchiveEntry to S3 Bucket from a .tar.gz file - amazon-s3

I am use aws Lamda to decompress and traverse tar.gz files then uploading them back to s3 deflated retaining the original directory structure.
I am running into an issue streaming a TarArchiveEntry to a S3 bucket via a PutObjectRequest. While first entry is successfully streamed, upon trying to getNextTarEntry() on the TarArchiveInputStream a null pointer is thrown due to the underlying GunzipCompress inflator being null, which had an appropriate value prior to the s3.putObject(new PutObjectRequest(...)) call.
I have not been able to find documentation on how / why the gz input stream inflator attribute is being set to null after partially being sent to s3.
EDIT Further investigation has revealed that the AWS call appears to be closing the input stream after completing the upload of specified content length... haven't not been able to find how to prevent this behavior.
Below is essentially what my code looks like. Thank in advance for your help, comments, and suggestions.
public String handleRequest(S3Event s3Event, Context context) {
try {
S3Event.S3EventNotificationRecord s3EventRecord = s3Event.getRecords().get(0);
String s3Bucket = s3EventRecord.getS3().getBucket().getName();
// Object key may have spaces or unicode non-ASCII characters.
String srcKey = s3EventRecord.getS3().getObject().getKey();
System.out.println("Received valid request from bucket: " + bucketName + " with srckey: " + srcKeyInput);
String bucketFolder = srcKeyInput.substring(0, srcKeyInput.lastIndexOf('/') + 1);
System.out.println("File parent directory: " + bucketFolder);
final AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(getObjectContent(s3Client, bucketName, srcKeyInput)));
TarArchiveEntry currentEntry = tarInput.getNextTarEntry();
while (currentEntry != null) {
String fileName = currentEntry.getName();
System.out.println("For path = " + fileName);
// checking if looking at a file (vs a directory)
if (currentEntry.isFile()) {
System.out.println("Copying " + fileName + " to " + bucketFolder + fileName + " in bucket " + bucketName);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(currentEntry.getSize());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, tarInput, metadata)); // contents are properly and successfully sent to s3
System.out.println("Done!");
}
currentEntry = tarInput.getNextTarEntry(); // NPE here due underlying gz inflator is null;
}
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(tarInput);
}
}

That's true, AWS closes an InputStream provided to PutObjectRequest, and I don't know of a way to instruct AWS not to do so.
However, you can wrap the TarArchiveInputStream with a CloseShieldInputStream from Commons IO, like that:
InputStream shieldedInput = new CloseShieldInputStream(tarInput);
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + fileName, shieldedInput, metadata));
When AWS closes the provided CloseShieldInputStream, the underlying TarArchiveInputStream will remain open.
PS. I don't know what ByteArrayInputStream(tarInput.getCurrentEntry()) does but it looks very strange. I ignored it for the purpose of this answer.

Related

How to retain original Last Modified/Created date of the file on FTP when we try to download them using FTPClient Class and Selenium [duplicate]

I am using org.apache.commons.net.ftp.FTPClient for retrieving files from a ftp server. It is crucial that I preserve the last modified timestamp on the file when its saved on my machine. Do anyone have a suggestion for how to solve this?
This is how I solved it:
public boolean retrieveFile(String path, String filename, long lastModified) throws IOException {
File localFile = new File(path + "/" + filename);
OutputStream outputStream = new FileOutputStream(localFile);
boolean success = client.retrieveFile(filename, outputStream);
outputStream.close();
localFile.setLastModified(lastModified);
return success;
}
I wish the Apache-team would implement this feature.
This is how you can use it:
List<FTPFile> ftpFiles = Arrays.asList(client.listFiles());
for(FTPFile file : ftpFiles) {
retrieveFile("/tmp", file.getName(), file.getTimestamp().getTime());
}
You can modify the timestamp after downloading the file.
The timestamp can be retrieved through the LIST command, or the (non standard) MDTM command.
You can see here how to do modify the time stamp: that: http://www.mkyong.com/java/how-to-change-the-file-last-modified-date-in-java/
When download list of files, like all files returned by by FTPClient.mlistDir or FTPClient.listFiles, use the timestamp returned with the listing to update timestemp of local downloaded files:
String remotePath = "/remote/path";
String localPath = "C:\\local\\path";
FTPFile[] remoteFiles = ftpClient.mlistDir(remotePath);
for (FTPFile remoteFile : remoteFiles) {
File localFile = new File(localPath + "\\" + remoteFile.getName());
OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(localFile));
if (ftpClient.retrieveFile(remotePath + "/" + remoteFile.getName(), outputStream))
{
System.out.println("File " + remoteFile.getName() + " downloaded successfully.");
}
outputStream.close();
localFile.setLastModified(remoteFile.getTimestamp().getTimeInMillis());
}
When downloading a single specific file only, use FTPClient.mdtmFile to retrieve the remote file timestamp and update timestamp of the downloaded local file accordingly:
File localFile = new File("C:\\local\\path\\file.zip");
FTPFile remoteFile = ftpClient.mdtmFile("/remote/path/file.zip");
if (remoteFile != null)
{
OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(localFile));
if (ftpClient.retrieveFile(remoteFile.getName(), outputStream))
{
System.out.println("File downloaded successfully.");
}
outputStream.close();
localFile.setLastModified(remoteFile.getTimestamp().getTimeInMillis());
}

Download from Azure Storage adding single quotes around filename

I'm working on a site hosted in Azure that has a download functionality. To reduce the load on our servers, the download is done use Shared Access Signatures. However, in Safari when downloading the file, the filename is wrapped in single quotes, as in myFile.txt downloads as 'myFile.txt'. This has made it so zips being downloaded have to be renamed by the client so the contents can be extracted.
Code for generated the Shared Access Signature is as follows:
CloudBlockBlob blob = container.GetBlockBlobReference(Helpers.StringHelper.TrimIfNotNull(blobName));
if (!blob.Exists())
{
return string.Empty;
}
var sasConstraints = new SharedAccessBlobPolicy();
sasConstraints.SharedAccessStartTime = DateTime.UtcNow.AddSeconds(-5);
sasConstraints.SharedAccessExpiryTime = DateTime.UtcNow.Add(duration);
sasConstraints.Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Write;
var headers = new SharedAccessBlobHeaders();
string filename = blobName;
if (filename.Contains("/"))
{
filename = blobName.Substring(blobName.LastIndexOf("/") + 1, blobName.Length - blobName.LastIndexOf("/") - 1);
}
headers.ContentDisposition = "attachment; filename='" + filename + "'";
//Generate the shared access signature on the blob, setting the constraints directly on the signature.
string sasBlobToken = blob.GetSharedAccessSignature(sasConstraints, headers);
//Return the URI string for the container, including the SAS token.
return blob.Uri + sasBlobToken;
This code has worked fine in Chrome, Firefox, and IE. Is there something I'm missing with the headers? The only one I'm modifying is content-disposition.
You should use double quotes for quoted strings in HTTP headers, as outlined in RFC2616.
So replace
headers.ContentDisposition = "attachment; filename='" + filename + "'";
with
headers.ContentDisposition = "attachment; filename=\"" + filename + "\"";

Detecting file size with MultipartFormDataStreamProvider before file is saved?

We are using the MultipartFormDataStreamProviderto save file upload by clients. I have a hard requirement that file size must be greater than 1KB. The easiest thing to do would of course be the save the file to disk and then look at the file unfortunately i can't do it like this. After i save the file to disk i don't have the ability to access it so i need to look at the file before its saved to disk. I've been looking at the properties of the stream provider to try to figure out what the size of the file is but unfortunately i've been unsuccessful.
The test file i'm using is 1025 bytes.
MultipartFormDataStreamProvider.BufferSize is 4096
Headers.ContentDisposition.Size is null
ContentLength is null
Is there a way to determine file size before it's saved to the file system?
Thanks to Guanxi i was able to formulate a solution. I used his code in the link as the basis i just added a little more async/await goodness :). I wanted to add the solution just in case it helps anyone else:
private async Task SaveMultipartStreamToDisk(Guid guid, string fullPath)
{
var user = HttpContext.Current.User.Identity.Name;
var multipartMemoryStreamProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in multipartMemoryStreamProvider.Contents)
{
using (content)
{
if (content.Headers.ContentDisposition.FileName != null)
{
var existingFileName = content.Headers.ContentDisposition.FileName.Replace("\"", string.Empty);
Log.Information("Original File name was {OriginalFileName}: {guid} {user}", existingFileName, guid,user);
using (var st = await content.ReadAsStreamAsync())
{
var ext = Path.GetExtension(existingFileName.Replace("\"", string.Empty));
List<string> validExtensions = new List<string>() { ".pdf", ".jpg", ".jpeg", ".png" };
//1024 = 1KB
if (st.Length > 1024 && validExtensions.Contains(ext, StringComparer.OrdinalIgnoreCase))
{
var newFileName = guid + ext;
using (var fs = new FileStream(Path.Combine(fullPath, newFileName), FileMode.Create))
{
await st.CopyToAsync(fs);
Log.Information("Completed writing {file}: {guid} {user}", Path.Combine(fullPath, newFileName), guid, HttpContext.Current.User.Identity.Name);
}
}
else
{
if (st.Length < 1025)
{
Log.Warning("File of length {FileLength} bytes was attempted to be uploaded: {guid} {user}",st.Length,guid,user);
}
else
{
Log.Warning("A file of type {FileType} was attempted to be uploaded: {guid} {user}", ext, guid,user);
}
var responseMessage = new HttpResponseMessage(HttpStatusCode.BadRequest)
{
Content =
st.Length < 1025
? new StringContent(
$"file of length {st.Length} does not meet our minumim file size requirements")
: new StringContent($"a file extension of {ext} is not an acceptable type")
};
throw new HttpResponseException(responseMessage);
}
}
}
}
}
You can also read the request contents without using MultipartFormDataStreamProvider. In that case all of the request contents (including files) would be in memory. I have given an example of how to do that at this link.
In this case you can read header for file size or read stream and check the file size. If it satisfy your criteria then only write it to desire location.

how to take log file backup automatically

How to take automatically backup of a log file(.txt) when it's size reached a threshold level, say 5MB. The backup file name should be like (log_file_name)_(system_date) and original log file should be cleaned(0 KB).
Please help. Thanks in advance.
Check your log file size using lenght().Then check if its bigger then 5mb call extendLogFile() func.
This is c# code u can easly convert to java
Size check:
if (size > 400 * 100 * 100)
{
extendLogFile(Path);
}
Copy old log file in archive directory and create new log file:
private static void extendLogFile(string lPath)
{
string name = lPath.Substring(0, lPath.LastIndexOf("."));
string UniquName = GenerateUniqueNameUsingDate(); // create a unique name for old log files like '12-04-2013-12-43-00'
string ArchivePath = System.IO.Path.GetDirectoryName(lPath) + "\\Archive";
if (!string.IsNullOrEmpty(ArchivePath) && !System.IO.Directory.Exists(ArchivePath))
{
System.IO.Directory.CreateDirectory(ArchivePath);
}
string newName = ArcivePath + "\\" + UniquName;
if (!File.Exists(newName))
{
File.Copy(lPath, newName + ".txt");
using (FileStream stream = new FileStream(lPath, FileMode.Create))
using (TextWriter writer = new StreamWriter(stream))
{
writer.WriteLine("");
}
}
}

ASP MVC 2 Uploading file to database (blob)

I am trying to upload a file via a form and then save in in SQL as a blob.
I already have my form working fine, my database is fully able to take the blob and I have a controller that take the file, saves it in a local directory:
[AcceptVerbs(HttpVerbs.Post)]
public ActionResult FileUpload(int id, HttpPostedFileBase uploadFile)
{
//allowed types
string typesNonFormatted = "text/plain,application/msword,application/pdf,image/jpeg,image/png,image/gif";
string[] types = typesNonFormatted.Split(',');
//
//Starting security check
//checking file size
if (uploadFile.ContentLength == 0 && uploadFile.ContentLength > 10000000)
ViewData["StatusMsg"] = "Could not upload: File too big (max size 10mb) or error while transfering the file.";
//checking file type
else if(types.Contains(uploadFile.ContentType) == false)
ViewData["StatusMsg"] = "Could not upload: Illigal file type!<br/> Allowed types: images, Ms Word documents, PDF, plain text files.";
//Passed all security checks
else
{
string filePath = Path.Combine(HttpContext.Server.MapPath("../Uploads"),
Path.GetFileName(uploadFile.FileName)); //generating path
uploadFile.SaveAs(filePath); //saving file to final destination
ViewData["StatusMsg"] = "Uploaded: " + uploadFile.FileName + " (" + Convert.ToDecimal(uploadFile.ContentLength) / 1000 + " kb)";
//saving file to database
//
//MISSING
}
return View("FileUpload", null);
}
Now all I am missing is putting the file in the database. I could not find anything on the subject... I found some way to do it in a regular website but nothing in MVC2.
Any kind of help would be welcome!
Thank you.
This could help: http://byatool.com/mvc/asp-net-mvc-upload-image-to-database-and-show-image-dynamically-using-a-view/
Since you have HttpPostedFileBase in your controllers method, all you need to do is:
int length = uploadFile.ContentLength;
byte[] tempImage = new byte[length];
myDBObject.ContentType = uploadFile.ContentType;
uploadFile.InputStream.Read(tempImage, 0, length);
myDBObject.ActualImage = tempImage ;
HttpPostedFileBase has a InputStream property
Hope this helps.
Alright thanks to kheit, I finaly got it working. Here's the final solution, it might help someone out there.
This script method takes all the file from a directory and upload them to the database:
//upload all file from a directory to the database as blob
public void UploadFilesToDB(long UniqueId)
{
//directory path
string fileUnformatedPath = "../Uploads/" + UniqueId; //setting final path with unique id
//getting all files in directory ( if any)
string[] FileList = System.IO.Directory.GetFiles(HttpContext.Server.MapPath(fileUnformatedPath));
//for each file in direcotry
foreach (var file in FileList)
{
//extracting file from directory
System.IO.FileStream CurFile = System.IO.File.Open(file, System.IO.FileMode.Open);
long fileLenght = CurFile.Length;
//converting file to a byte array (byte[])
byte[] tempFile = new byte[fileLenght];
CurFile.Read(tempFile, 0, Convert.ToInt32(fileLenght));
//creating new attachment
IW_Attachment CurAttachment = new IW_Attachment();
CurAttachment.attachment_blob = tempFile; //setting actual file
string[] filedirlist = CurFile.Name.Split('\\');//setting file name
CurAttachment.attachment_name = filedirlist.ElementAt(filedirlist.Count() - 1);//setting file name
//uploadind attachment to database
SubmissionRepository.CreateAttachment(CurAttachment);
//deleting current file fromd directory
CurFile.Flush();
System.IO.File.Delete(file);
CurFile.Close();
}
//deleting directory , it should be empty by now
System.IO.Directory.Delete(HttpContext.Server.MapPath(fileUnformatedPath));
}
(By the way IW_Attachment is the name of one of my database table)