when to check for file size/mimetype in node.js upload script? - file-upload

I created an upload script in node.js using express/formidable. It basically works, but I am wondering where and when to check the uploaded file e. g. for the maximum file size or if the file´s mimetype is actually allowed.
My program looks like this:
app.post('/', function(req, res, next) {
req.form.on('progress', function(bytesReceived, bytesExpected) {
// ... do stuff
});
req.form.complete(function(err, fields, files) {
console.log('\nuploaded %s to %s', files.image.filename, files.image.path);
// ... do stuff
});
});
It seems to me that the only viable place for checking the mimetype/file size is the complete event where I can reliably use the filesystem functions to get the size of the uploaded file in /tmp/ – but that seems like a not so good idea because:
the possibly malicious/too large file is already uploaded on my server
the user experience is poor – you watch the upload progress just to be told that it didnt work afterwards
Whats the best practice for implementing this? I found quite a few examples for file uploads in node.js but none seemed to do the security checks I would need.

With help from some guys at the node IRC and the node mailing list, here is what I do:
I am using formidable to handle the file upload. Using the progress event I can check the maximum filesize like this:
form.on('progress', function(bytesReceived, bytesExpected) {
if (bytesReceived > MAX_UPLOAD_SIZE) {
console.log('### ERROR: FILE TOO LARGE');
}
});
Reliably checking the mimetype is much more difficult. The basic Idea is to use the progress event, then if enough of the file is uploaded use a file --mime-type call and check the output of that external command. Simplified it looks like this:
// contains the path of the uploaded file,
// is grabbed in the fileBegin event below
var tmpPath;
form.on('progress', function validateMimetype(bytesReceived, bytesExpected) {
var percent = (bytesReceived / bytesExpected * 100) | 0;
// pretty basic check if enough bytes of the file are written to disk,
// might be too naive if the file is small!
if (tmpPath && percent > 25) {
var child = exec('file --mime-type ' + tmpPath, function (err, stdout, stderr) {
var mimetype = stdout.substring(stdout.lastIndexOf(':') + 2, stdout.lastIndexOf('\n'));
console.log('### file CALL OUTPUT', err, stdout, stderr);
if (err || stderr) {
console.log('### ERROR: MIMETYPE COULD NOT BE DETECTED');
} else if (!ALLOWED_MIME_TYPES[mimetype]) {
console.log('### ERROR: INVALID MIMETYPE', mimetype);
} else {
console.log('### MIMETYPE VALIDATION COMPLETE');
}
});
form.removeListener('progress', validateMimetype);
}
});
form.on('fileBegin', function grabTmpPath(_, fileInfo) {
if (fileInfo.path) {
tmpPath = fileInfo.path;
form.removeListener('fileBegin', grabTmpPath);
}
});

The new version of Connect (2.x.) has this already baked into the bodyParser using the limit middleware: https://github.com/senchalabs/connect/blob/master/lib/middleware/multipart.js#L44-61
I think it's much better this way as you just kill the request when it exceeds the maximum limit instead of just stopping the formidable parser (and letting the request "go on").
More about the limit middleware: http://www.senchalabs.org/connect/limit.html

Related

recursive lambda function never seems to run

I'm not familiar enough with node.js or lambda to see an obvious solution to a dilemma I have. I'm writing some utilities on lambda to manipulate images in an S3 bucket and make them accessible via the GatewayAPI to rest calls.
BACKGROUND DETAILS:
One of the utilities I have retrieves the headObject information such as the mtime, size and metadata. The images themselves will likely be coming in from various means and I won't always have control over adding metadata to them when they arrive/are-created. But I don't really need it until it's necessary to view details about the image from a web interface. And when I do that, I use a thumbnail instead so I created a lambda create-event triggered script (and also have a fall back variation of it via the gatewayAPI) that will create a thumbnail (either when the image is first uploaded to S3 or whenever I make the gateway CreateThumbbnail call) at which time it adds metadata to the thumbnail for the image with things like the original image mimetype, pixel width and height.
What I would like to be able to do, is to create a 'GetObjectInfo' that firsts pulls the headObject data, then checks to see if the bucket specified is or is not the bucket with the associated thumbnail files. (e.g. if it is or is not a thumbnail object) If it is 'not' a thumbnail, I want to then go retrieve -- or at least attempt to retrieve -- the headObject for the associated thumbnail file and attach the thumbnail file's metadata (if the thumbnail exists) onto the data from the original head request before returning the information.
The problem is, that when I set up an async callback scheme, the first headObject request completes, the second never seems to get out of the starting gate.
The method in my class is:
getHeadObject(bucket,object,callback) {
console.log(bucket, "CLASS-head#1")
this.s3.headObject({"Bucket":bucket,"Key":object}, function(err,data){
console.log(bucket, "CLASS-head#2")
callback(err,data)
})
}
getObjectInfo(bucket,object,callback) {
let scope = this
console.log(bucket,"CLASS-object#1")
this.getHeadObject(bucket,object,function(err,data) {
console.log(bucket,"CLASS-object#2")
if(err)
callback(err,data)
else
callback(null,data)
})
}
The lambda code that calls it recursively is:
var cInst = new myClass()
cInst.getObjectInfo(srcBucket,filePath,function(err,data) {
if(data.status == 1) { // if parent request success
// if parent is not thumbnail
if(srcBucket != THUMB_BUCKET) { // see if a thumbnail exists
let thumbPath = myClass.getThumbPath(srcBucket,userId,directory,targetObject)
console.log('---- thumbPath', thumbPath)
cInst.getObjectInfo(THUMB_BUCKET,thumbPath, function(err,thumbData) {
console.log("thumbData #1",thumbData)
if(thumbData.status == 1) { // thumbnail exists
console.log("thumbData")
}
})
}
context.succeed(myClass.createResponse(1, data, api))
} else {
context.fail(myClass.createResponse(data.status, data, api))
}
})
First call on the parent is see
{bucket} "CLASS-object#1"
{bucket} "CLASS-head#1"
{bucket} "CLASS-head#2"
{bucket} "CLASS-object#2"
on the second I only see:
image-thumbnails "CLASS-object#1"
image-thumbnails "CLASS-head#1"
(getThumbPath is just a static utility function that builds the thumbnail path based on the parameters related to the original file. It is already tested as working and produces something like {original-bucket-name}/{userid}/{subdirectory}/{file-basename_150x150.jpg} for any given image - I confirmed that in this instance, the thumbnail exists and matches the path returned by getThumbPath and the acl appears to have permission to read the bucket and the object)
UPDATE: More weirdness
I tried setting the permissions to publicly readable on the thumbnail and it worked. So I started messing with the acl. For the time being since I am still testing, I just gave the role for the scripts full S3 permissions.
But I noticed now that it's working and not working intermittently. One time it completes, the next time it doesn't. WTF is going on here?
I would bet that this is the most common problem that people see when using Node.js with Lambda.
When a Node.js Lambda reaches the end of the main thread, it ends all other threads. When it reaches the end of the handler, it stops all concurrent promises or async calls that are running.
To make sure that the lambda does not prematurely terminate those threads, wait until those promises are complete by using await.
In your case, the following will work: wrap any async calls in a promise and then await them.
await new Promise(async (resolve, reject) => {
cInst.getObjectInfo(srcBucket,filePath,function(err,data) {
if(data.status == 1) {
if(srcBucket != THUMB_BUCKET) {
...
...
await new Promise((resolve2, reject2) => {
cInst.getObjectInfo(THUMB_BUCKET,thumbPath, function(err,thumbData) {
...
...
resolve2();
})
})
}
context.succeed(myClass.createResponse(1, data, api))
resolve();
} else {
context.fail(myClass.createResponse(data.status, data, api))
reject();
}
})
})

Windows Azure Storage Blobs to zip file with Express

I am trying to use this pluggin (express-zip). At the Azure Storage size we have getBlobToStream which give us the file into a specific Stream. What i do now is getting image from blob and saving it inside the server, and then res.zip it. Is somehow possible to create writeStream which will write inside readStream?
Edit: The question has been edited to ask about doing this in express from Node.js. I'm leaving the original answer below in case anyone was interested in a C# solution.
For Node, You could use a strategy similar to what express-zip uses, but instead of passing a file read stream in this line, pass in a blob read stream obtained using createReadStream.
Solution using C#:
If you don't mind caching everything locally while you build the zip, the way you are doing it is fine. You can use a tool such as AzCopy to rapidly download an entire container from storage.
To avoid caching locally, you could use the ZipArchive class, such as the following C# code:
internal static void ArchiveBlobs(CloudBlockBlob destinationBlob, IEnumerable<CloudBlob> sourceBlobs)
{
using (Stream blobWriteStream = destinationBlob.OpenWrite())
{
using (ZipArchive archive = new ZipArchive(blobWriteStream, ZipArchiveMode.Create))
{
foreach (CloudBlob sourceBlob in sourceBlobs)
{
ZipArchiveEntry archiveEntry = archive.CreateEntry(sourceBlob.Name);
using (Stream archiveWriteStream = archiveEntry.Open())
{
sourceBlob.DownloadToStream(archiveWriteStream);
}
}
}
}
}
This creates a zip archive in Azure storage that contains multiple blobs without writing anything to disk locally.
I'm the author of express-zip. What you are trying to do should be possible. If you look under the covers, you'll see I am in fact adding streams into the zip:
https://github.com/thrackle/express-zip/blob/master/lib/express-zip.js#L55
So something like this should work for you (prior to me adding support for this in the interface of the package itself):
var zip = zipstream(exports.options);
zip.pipe(express.response || http.ServerResponse.prototype); // res is a writable stream
var addFile = function(file, cb) {
zip.entry(getBlobToStream(), { name: file.name }, cb);
};
async.forEachSeries(files, addFile, function(err) {
if (err) return cb(err);
zip.finalize(function(bytesZipped) {
cb(null, bytesZipped);
});
});
Apologize if I've made horrible errors above; I haven't been on this for a bit.

Multiple image attachments to couchdb (nano) with Node.js Express 4.0 & formidable

I am trying to insert multiple images to couchdb via nano using express 4 and formidable. I can access and insert individual files without difficulty using formidable and nano, however, when I try to insert one file after another, I get conflict errors. I am very new to js and node, and I know my understanding of callbacks and asynchronous functions is limited. This is what I have at the moment. Any help would be greatly appreciated.
function uploadImage(req, res) {
var form = new formidable.IncomingForm(),
files = [],
fields = [];
uploadcount = 1;
form.on('field', function(field, value) {
fields.push([field, value]);
})
form.on('file', function(field, file) {
files.push([field, file]);
var docid = fields[0][1];
getrevision();
function getRevision(){
dbn.get(docid, { revs_info: true }, function(err,body, file){
if (!err) {
exrev = body._rev;
insertImage(exrev);
}else{
console.log(err);
}
});
}
function insertImage(exrevision){
var exrev = exrevision;
fs.readFile(file.path, function (err, data) {
if (err){
console.log(err);}else{
var imagename = docid + "_" + uploadcount + ".png";
dbn.attachment.insert(docid,imagename,data,'image/png',
{ rev: exrev }, function(err,body){
if (!err) {
uploadcount++;
}else{
console.log(err);
}
});
};
});
};
});
form.on('end', function() {
console.log('done');
res.redirect('/public/customise.html');
});
form.parse(req);
};
I found a solution by dumping the files first into a temporary directory, then proceeding to insert the files into couchdb via nano all within a single function. I couldn't find a way to pause the filestream to wait for the couchdb response, so this sequential method seems adequate.
This is a problem handling asynchronous calls. Because each attachment insert requires the doc's current rev number, you can't do the inserts in parallel. You must insert a new attachment only after you get a response from the the previous one.
You may use the promise and deferred mechanism to do this. But, I personally have solved a similar problem using a package called "async". In async, you can use async.eachSeries() to make these async calls in series.
Another point is about the revision number, you may just use the lighter weight db.head() function, instead of db.get(). The rev number is presented under the "etag" header. You can get the rev like this:
// get Rev number
db.head(bookId, function(err, _, headers) {
if (!err) {
var rev = eval(headers.etag);
// do whatever you need to do with the rev number
......
}
});
In addition, after each attachment insert, the response from couchdb will look something like this:
{"ok":true,"id":"b2aba1ed809a4395d850e65c3ff2130c","rev":"4-59d043853f084c18530c2a94b9a2caed"}
The rev property will give the new rev number which you may use for inserting to the next attachment.

How can i encode a string in base64 using meteor

I am trying to use a form to upload files to a s3 bucket using Meteor. I am following this amazon article. At "Sign Your S3 POST Form", near the end, I need to encode a string to base64 but I've been unable to find a way to do this. Can anyone tell me how to do this? Notice that the string first needs to be encoded and then signed. This is how it's done in python:
import base64
import hmac, hashlib
policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_ACCESS_KEY, policy, hashlib.sha1).digest())
You can do this without the NodeJS crypto module, creating a package looked a bit like breaking a fly on the wheel to me so I figured out this:
if (Meteor.isServer) {
Meteor.methods({
'base64Encode':function(unencoded) {
return new Buffer(unencoded || '').toString('base64');
},
'base64Decode':function(encoded) {
return new Buffer(encoded || '', 'base64').toString('utf8');
},
'base64UrlEncode':function(unencoded) {
var encoded = Meteor.call('base64Encode',unencoded);
return encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
},
'base64UrlDecode':function(encoded) {
encoded = encoded.replace(/-/g, '+').replace(/_/g, '/');
while (encoded.length % 4)
encoded += '=';
return Meteor.call('base64Decode',encoded);
}
console.log(Meteor.call('base64Encode','abc'));
});
This is based on the base64.js by John Hurliman found at https://gist.github.com/jhurliman/1250118 Note that this will work like a charm on the server but for porting it to the client you have call the methods with a callback function that stores the result as a session variable.
You need NodeJS crypto module to perform these tasks.
First create a "packages" directory at the root of your meteor project, then create a "my-package" directory.
Inside it, you need two files : a "package.js" and "my-package.js".
package.js should look like :
Package.describe({
summary:"MyPackage doing amazing stuff with AWS."
});
Package.on_use(function(api){
// add your package file to the server app
api.add_files("my-package.js","server");
// what we export outside of the package
// (this is important : packages have their own scope !)
api.export("MyPackage","server");
});
my-package.js should look like :
var crypto=Npm.require("crypto");
MyPackage={
myFunction:function(arguments){
// here you can use crypto functions !
}
};
The function you will probably need is crypto.createHmac.
Here is an example code of how I encode a JSON security policy in base64 then use it to generate a security signature in my own app :
encodePolicy:function(jsonPolicy){
// stringify the policy, store it in a NodeJS Buffer object
var buffer=new Buffer(JSON.stringify(jsonPolicy));
// convert it to base64
var policy=buffer.toString("base64");
// replace "/" and "+" so that it is URL-safe.
return policy.replace(/\//g,"_").replace(/\+/g,"-");
},
encodeSignature:function(policy){
var hmac=crypto.createHmac("sha256",APP_SECRET);
hmac.update(policy);
return hmac.digest("hex");
}
This will allow you to call MyPackage.myFunction in the server-side of your Meteor app.
Last but not last, don't forget to "meteor add my-package" in order to use it !
You can use meteor-crypto-base64 package.
CryptoJS.enc.Base64.stringify(CryptoJS.enc.Utf8.parse('Hello, World!'));
//"SGVsbG8sIFdvcmxkIQ=="

Write file specific folder inside localFolder

I'm trying to write an image file to the localFolder storage. I want to write it specifically to the images/heroes/ folder inside the localFolder. Is there some way to do this with my existing code?
return WinJS.xhr({ url: fileToDownloadURL, responseType: 'blob' }).then(
function (response) {
var input = response.response.msDetachStream();
var filename = 'ms-appx:///local/images/heroes/image_name.png';
Windows.Storage.ApplicationData.current.localFolder.createFileAsync(filename,
Windows.Storage.CreationCollisionOption.replaceExisting).then(
function (file) {
return file.openAsync(Windows.Storage.FileAccessMode.readWrite).then(
function (output) {
return Windows.Storage.Streams.RandomAccessStream.copyAsync(input, output).then(
function () {
output.flushAsync().then(function () {
input.close();
output.close();
});
});
});
}
}
)};
I get this error message:
0x8007007b - JavaScript runtime error: The filename, directory name, or volume label syntax is incorrect.
WinRT information: The specified name (ms-appx:///local/images/heroes/image_name.png) contains one or more invalid characters.
I figured it out. For anyone wondering, you can call
getFolderAsync('SomeFolderName')
from the
Windows.Storage.ApplicationData.current.localFolder
instance. getFolderAsync() is like any promise function and will return with an error if the folder does not exist. From there you can run
createFolderAsync('SomeFolderName')
to create the folder, and then continue what you were doing. In my case it was better to create a function like writeFileAsync() and then call it depending on whether or not the folder exists first.
Keep in mind getFolderAsync() can only get one folder at a time, so something like
getFolderAsync('along/list/of/folders')
won't work. You'll have to chain together the getFolderAsync() calls together, I believe (correct me if I'm wrong!).