Reading an XLSX file in Azure File Storage - azure-storage

I need to read an Excel file that is located in a folder in Azure File Storage. This is not in blob storage.
I cannot download the file to a local drive, since there is none.
I cannot seem to get started on how to access the file or read it in place.
Can someone help to get me started?
thanks

This is the basic code that works!
ShareClient share = new ShareClient(connectionString, shareName);
ShareDirectoryClient directory = share.GetDirectoryClient(dirName);
ShareFileClient file = directory.GetFileClient(filename);
OpenSettings openSettings = new OpenSettings();
using (Stream stream = file.OpenRead())
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(stream, false, openSettings))
{
readRow();
}
}

Related

FileSystemApi and writableStream

I'm trying to use the FileSystem API to write an uploaded file on a SPA to a Local sandboxed FileSystem using the FileSystem API.
The File Is uploaded with drop acion and I can get the File object array in the call back.
From the File I can get the ReadableStream calling the stream method (yes, it return only readable sream).
Considering that the uploaded file could be big enough, I would go for a streaming than loading entirely into a blob and then writing into FileSystem api.
So, following the docs the steps are:
get a FileSystem (DOMFileSystem) through the async webkitRequestFileSystem call.
get the prop root that is a FileSystemDirectoryEntry
create a file through getFile (with flag create:true) that returns (async) a FileSystemFileEntry
Now from the FileEntry I can get a FileWriter using createWriter but it is obsolete (in MDN), and in any case it is a FileWriter while I would look to obtain a WritableStream instead in order to use the pipeTo from the uploaded file Handler->ReadableStream.
So, I see that in the console the class (interface) FileSystemFileHandler is defined but I cannot understand how to get an instance from the FileSystemFileEntry. If I can obtain a FileSystemFileHandler I can call the createWritable to obtain a FileSystemWritableFileStream that I can "pipe" with the ReadStream.
Anyone who can clarify this mess ?
references:
https://web.dev/file-system-access/
https://wicg.github.io/file-system-access/#filesystemhandle
https://developer.mozilla.org/en-US/docs/Web/API/FileSystemFileEntry
You have the solution in your "references" links at the bottom. Specifically, this is the section to read. You can create files or directories like so:
// In an existing directory, create a new directory named "My Documents".
const newDirectoryHandle = await existingDirectoryHandle.getDirectoryHandle('My Documents', {
create: true,
});
// In this new directory, create a file named "My Notes.txt".
const newFileHandle = await newDirectoryHandle.getFileHandle('My Notes.txt', { create: true });
Once you have a file handle, you can then pipe to it or write to it:
async function writeFile(fileHandle, contents) {
// Create a FileSystemWritableFileStream to write to.
const writable = await fileHandle.createWritable();
// Write the contents of the file to the stream.
await writable.write(contents);
// Close the file and write the contents to disk.
await writable.close();
}
…or…
async function writeURLToFile(fileHandle, url) {
// Create a FileSystemWritableFileStream to write to.
const writable = await fileHandle.createWritable();
// Make an HTTP request for the contents.
const response = await fetch(url);
// Stream the response into the file.
await response.body.pipeTo(writable);
// pipeTo() closes the destination pipe by default, no need to close it.
}

How do I download an S3 file only if it has changed?

I have a 900 MB file that I'd like to download to disk from S3 if it isn't already in place downloaded. Is there an easy way for me to only download the file if it isn't already in place? I know S3 supports querying MD5 checksum of files, but I'm hoping not to have to build this logic myself.
You can use AWS CLI's s3 sync command.
Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination.
According to this forum thread, you can use sync to synchronize only one file:
aws s3 sync s3://bucket/path/ local/path/ --exclude "*" --include "File.txt"
It says: sync the given paths, exclude all files, but include "File.txt" - so it will sync only "File.txt" under those given paths.
Or with the Java SDK:
According to the javadoc, there is a getObjectMetadata method which will return information about an S3 object (file) without downloading it's contents.
The method returns an ObjectMetadata object which can give you some useful information:
getLastModified method:
Gets the value of the Last-Modified header, indicating the date and time at which Amazon S3 last recorded a modification to the associated object.
getContentMD5 method:
Gets the base64 encoded 128-bit MD5 digest of the associated object (content - not including headers) according to RFC 1864.
getETag method:
Gets the hex encoded 128-bit MD5 digest of the associated object according to RFC 1864.
I have used below code to download S3 files which have timestamp greater than the local folder timestamp. First it's check if any of the files in S3 folder have timestamp greater than the local folder timestamp. If yes then download those files only.
TransferManager transferManager = TransferManagerBuilder.standard().build();
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard().build();
Path location = Paths.get("/data/test/");
FileTime lastModifiedTime = null;
try {
lastModifiedTime = Files.getLastModifiedTime(location, LinkOption.NOFOLLOW_LINKS);
} catch (IOException e) {
e.printStackTrace();
}
Date lastUpdatedTime = new Date(lastModifiedTime.toMillis());
ObjectListing listing = amazonS3.listObjects("bucket", "test-folder");
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
for (S3ObjectSummary os: summaries) {
if(os.getLastModified().after(lastUpdatedTime)) {
try {
String fileName="/data/test/"+os.getKey();
Download multipleFileDownload = transferManager.download(bucket, os.getKey(), new File(fileName));
while (multipleFileDownload.isDone() == false) {
Thread.sleep(1000);
}
}catch(InterruptedException i){
LOG.error("Exception Occurred while downloading the file ",i);
}
}
}

File upload checking REAL file mime type when uploading directly to S3

I'm using DropzoneJS to upload files directly to S3. When they add I file I am using my backend to check the mime type and create the S3 signature. When I say adding I file, I just means it's added it to the Dropzone queue so the file isn't uploaded yet it's just sending metadata about the file to the /upload/sign url.
this.on('addedfile', function (file) {
$.get('/upload/sign', {
name: file.name,
size: file.size,
type: file.type,
}).done(function (response) {
myDropzone.options.url = response.attributes.action;
file.additionalData = response.additionalData;
myDropzone.processFile(file);
}).fail(function (response) {
var data = JSON.parse(response.responseText);
myDropzone.emit('error', file, data);
});
});
This is all good! The problem is the file's mime type is only determined by the file extension, so I can happily rename a file from image.jpg to image.mp3 and file.type will be audio/mp3. This I guess is fine for browser warnings, but not if I want that mp3 to play or if I eventually want to process the audio!
Is there any way of telling the REAL mime type of the file, without having to pass the upload directly to the servers file system? I need to upload directly to S3 so passing it through an EC2 is not an option.

Files downloaded from Amazon S3 using Knox and Node.js are corrupt

I'm using knox to access my Amazon S3 bucket for file storage. I'm storing all kinds of files - mostly MS Office and pdfs but could be binary or any other kind. I'm also using express 4.13.3 and busboy with connect-busboy for streaming support; when uploading file I'm handling with busboy and thence direct to S3 via knox, so avoiding having to write them to local disk first.
The files upload fine (I can browse and download them manually using Transmit) but I'm having problems downloading.
For clarity I don't want to write the file to local disk, instead keeping it in an in-memory buffer. Here's the code I'm using to handle the GET request:
// instantiate a knox object
var s3client = knox.createClient({
key: config.AWS.knox.key,
secret: config.AWS.knox.secret,
bucket: config.AWS.knox.bucket,
region: config.AWS.region
});
var buffer = undefined;
s3client.get(path+'/'+fileName)
.on('response', function(s3res){
s3res.setEncoding('binary');
s3res.on('data', function(chunk){
buffer += chunk;
});
s3res.on('end', function() {
buffer = new Buffer(buffer, 'binary');
var fileLength = buffer.length;
res.attachment(fileName);
res.append('Set-Cookie', 'fileDownload=true; path=/');
res.append('Content-Length', fileLength);
res.status(s3res.statusCode).send(buffer);
});
}).end();
The file downloads to the browser - I'm using John Culviner's jquery.fileDownload.js - but what is downloaded is corrupt and can't be opened. As you can see I'm using express' .attachment to set the headers for mime type and .append for the additional headers (using .set instead makes no difference).
When the file downloads in Chrome I see the message 'Resource interpreted as Document but transferred with MIME type application/vnd.openxmlformats-officedocument.spreadsheetml.sheet:' (for an Excel file), so express is setting the header correctly, and the size of the file downloaded matches that I see when examining the bucket.
Any ideas what's going wrong?
Looks like the contents might not be being sent to the browser as binary. Try something like the following:
if (s3Res.headers['content-type']) {
res.type( s3Res.headers['content-type'] );
}
res.attachment(fileName);
s3Res.setEncoding('binary');
s3Res.on('data', function(data){
res.write(data, 'binary');
});
s3Res.on('end', function() {
res.send();
});
It will also send the data one chunk at a time as it comes in, so it should be a bit more memory efficient.

Unable to load SWF from application storage directory

While publishing my AIR application(CurrentFile), I have also included chatFile.swf with the installation files.
In my AIR settings panel [AIR 3.7 for Desktop], under 'Include Files' I have the following:
CurrentFile.swf
CurrentFile-app.xml
chatFile.swf
Here is the AS3 code in my CurrentFile.swf:
import flash.net.URLRequest;
import flash.events.Event;
import flash.display.Loader;
import flash.filesystem.File;
var chatLoaderWindow:Loader;
function loadchat(m:MouseEvent):void
{
chatLoaderWindow = new Loader();
chatLoaderWindow.contentLoaderInfo.addEventListener(Event.COMPLETE, chatLoadComplete);
chatLoaderWindow.contentLoaderInfo.addEventListener(Event.INIT, chatInitLoad);
chatLoaderWindow.contentLoaderInfo.addEventListener(IOErrorEvent.IO_ERROR, chatErrorLoad);
chatLoaderWindow.contentLoaderInfo.addEventListener(HTTPStatusEvent.HTTP_STATUS, chatHttpStatus);
myclip.chatwindow.addChild(chatLoaderWindow);
var f:File = File.applicationStorageDirectory.resolvePath("chatFile.swf");
chatLoaderWindow.load(new URLRequest(f.url));
tracebox.text = "Chat URL" + f.url;
}
function chatLoadComplete(e:Event):void
{
tracebox.text = "chat loaded";
}
function chatErrorLoad(io:IOErrorEvent):void
{
tracebox.text = "chat IO Error: "+io;
}
function chatInitLoad(i:Event):void
{
tracebox.text = "chat INIT";
}
function chatHttpStatus(e:HTTPStatusEvent):void
{
tracebox.text = "chat Http"+e;
}
myclip.chatbut.addEventListener(MouseEvent.CLICK,loadchat);
/*
Output:
chat IO Error: [IOErrorEvent type="ioError" bubbles=false cancelable=false eventPhase=2 text="Error #2035" errorID=2035]
EDIT: I figured it out. It was really simple
This is not required:
var f:File = File.applicationStorageDirectory.resolvePath("chatFile.swf");
chatLoaderWindow.load(new URLRequest(f.url));
Insert this:
chatLoaderWindow.load(new URLRequest("app:/chatFile.swf"));
So now my question is:
What is the purpose of File.applicationStorageDirectory.resolvePath?
There are two directories here. One is the "application" directory, where your install files are placed. One is the "application-storage" directory, which is a convenient place to write files to at runtime. To access these directories you can either use the File.resolvePath() function or use the URI-scheme shortcuts, app: or app-storage:. In your initial attempt, you were just looking in the wrong directory for your file.
File.applicationStorageDirectory.resolvePath("somefile.swf").url will equal "app-storage:/somefile.swf"
File.applicationDirectory.resolvePath("somefile.swf").url will equal "app:/somefile.swf"
The application directory is where your app was installed. The app storage directory is a folder your app can save files to.
resolvePath() returns a file object. You can use it for purposes other than getting the cross-platform url for the file location, such as fileObj.exists and fileObj.parent.createDirectory(). fileObj.url is just the url you would use with URLLoader to access the file in a platform-independent manner.