Downloading PDF from Dropbox and then uploading to S3 results in larger, corrupt file [duplicate]

Downloading PDF from Dropbox and then uploading to S3 results in larger, corrupt file [duplicate] - pdf

I have been struggling to succeed in downloading an image without piping it to fs. Here's what I have accomplished:
var Promise = require('bluebird'),
fs = Promise.promisifyAll(require('fs')),
requestAsync = Promise.promisify(require('request'));
function downloadImage(uri, filename){
return requestAsync(uri)
.spread(function (response, body) {
if (response.statusCode != 200) return Promise.resolve();
return fs.writeFileAsync(filename, body);
})
.then(function () { ... })
// ...
}
A valid input might be:
downloadImage('http://goo.gl/5FiLfb', 'c:\\thanks.jpg');
I do believe the problem is with the handling of body.
I have tried casting it to a Buffer (new Buffer(body, 'binary') etc.) in several encodings, but all failed.
Thanks from ahead for any help!

You have to tell request that the data is binary:
requestAsync(uri, { encoding : null })
Documented here:
encoding - Encoding to be used on setEncoding of response data. If null, the body is returned as a Buffer. Anything else (including the default value of undefined) will be passed as the encoding parameter to toString() (meaning this is effectively utf8 by default).
So without that option, the body data is interpreted as UTF-8 encoded, which it isn't (and yields an invalid JPEG file).

Related

How does node-redis method setex() convert Buffer to string?

I'm using node-redis and I was hoping that someone could help me to figure out how this library converts Buffer to string. I gzip my data before I store it in redis with node-gzip and this call returns Promise<Buffer>
const data = JSON.stringify({ data: 'test' });
const compressed = await gzip(data, { level: 9 });
I tested following 2 approaches of saving buffer data into redis
Without .toString() - I pass the Buffer to the library and it will take care of the conversion
const result = await redisClient.setex('testKey', 3600, compressed);
and with .toString()
const result = await redisClient.setex('testKey', 3600, compressed.toString());
When I try these 2 approaches I don't get the same value saved in redis. I tried to use different params for .toString() to match the output of 1) but it didn't work
Reason why I need the saved value in 1) format is that I'm matching value format that what one of php pages generates
My code is working fine without .toString() but I would like to know how node-redis handles it internally
I've tried to find the answer in the source code and to debug and step into library calls but I didn't find the answer that I was looking for and I hope that someone can help me with this

It looks like happens in utils.js file:
utils.js
if (reply instanceof Buffer) {
return reply.toString();
}
Also use the proper options (i.e. return_buffers):
node-redis README
redis.createClient({ return_buffers: true });

HapiJS reply with readable stream

For one call, I am replying with a huge JSON object which sometimes causes the Node event loop to become blocked. As such, I'm using Big Friendly JSON package to stream JSON instead. My issue is I cannot figure out how to actually reply with the stream
My original code was simply
let searchResults = s3Access.getSavedSearch(guid)).Body;
searchResults = JSON.parse(searchResults.toString());
return reply(searchResults);
Works great but bogs down on huge payloads
I've tried things like, using the Big Friendly JSON package https://gitlab.com/philbooth/bfj
const stream = bfj.streamify(searchResults);
return reply(stream); // according to docs it's a readable stream
But then my browser complained about an empty response. I then tried to add the below to the reply, same result.
.header('content-encoding', 'json')
.header('Content-Length', stream.length);
I also tried return reply(null, stream); but that produced a ton of node errors
Is there some other way I need to organize this? My understanding was I could just reply a readable stream and Hapi would take care of it, but the response keeps showing up as empty.

Did you try to use h.response, here h is reply.
Example:
handler: async (request, h) => {
const { limit, sortBy, order } = request.query;
const queryString = {
where: { status: 1 },
limit,
order: [[sortBy, order]],
};
let userList = {};
try {
userList = await _getList(User, queryString);
} catch (e) {
// throw new Boom(e);
Boom.badRequest(i18n.__('controllers.user.fetchUser'), e);
}
return h.response(userList);
}

How to determine a file's Unicode character encoding in IOS?

In our application I have to open a text file which will be sum time UTF-8 format or UTF-16 format .
Is there any way to determine the file format of a file? Or Is it possible to check the readied 'NSString' is valid ?

You can use the following do-catch blocks as stated in the documentation if you are forced to guess the encoding of your text file, which works for Swift 4.0:
do {
let str = try String(contentsOf: url, usedEncoding: &encodingType)
print("Used for encoding: \(encodingType)")
} catch {
do {
let str = try String(contentsOf: url, encoding: .utf8)
print("Used for encoding: UTF-8")
} catch {
do {
let str = try String(contentsOf: url, encoding: .isoLatin1)
print("Used for encoding: Windows Latin 1")
} catch {
// Error handling
}
}
}

Apple's documentation has some guidance on how to proceed: String Programming Guide: Reading data with an unknown encoding:
If you are forced to guess the encoding (and note that in the absence of explicit information, it is a guess):
Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or the URL-based equivalents). These methods try to determine the encoding of the resource, and if successful return by reference the encoding used.
If (1) fails, try to read the resource by specifying UTF-8 as the encoding.
If (2) fails, try an appropriate legacy encoding. "Appropriate" here depends a bit on circumstances; it might be the default C string encoding, it might be ISO or Windows Latin 1, or something else, depending on where your data is coming from.

when to check for file size/mimetype in node.js upload script?

I created an upload script in node.js using express/formidable. It basically works, but I am wondering where and when to check the uploaded file e. g. for the maximum file size or if the file´s mimetype is actually allowed.
My program looks like this:
app.post('/', function(req, res, next) {
req.form.on('progress', function(bytesReceived, bytesExpected) {
// ... do stuff
});
req.form.complete(function(err, fields, files) {
console.log('\nuploaded %s to %s', files.image.filename, files.image.path);
// ... do stuff
});
});
It seems to me that the only viable place for checking the mimetype/file size is the complete event where I can reliably use the filesystem functions to get the size of the uploaded file in /tmp/ – but that seems like a not so good idea because:
the possibly malicious/too large file is already uploaded on my server
the user experience is poor – you watch the upload progress just to be told that it didnt work afterwards
Whats the best practice for implementing this? I found quite a few examples for file uploads in node.js but none seemed to do the security checks I would need.

With help from some guys at the node IRC and the node mailing list, here is what I do:
I am using formidable to handle the file upload. Using the progress event I can check the maximum filesize like this:
form.on('progress', function(bytesReceived, bytesExpected) {
if (bytesReceived > MAX_UPLOAD_SIZE) {
console.log('### ERROR: FILE TOO LARGE');
}
});
Reliably checking the mimetype is much more difficult. The basic Idea is to use the progress event, then if enough of the file is uploaded use a file --mime-type call and check the output of that external command. Simplified it looks like this:
// contains the path of the uploaded file,
// is grabbed in the fileBegin event below
var tmpPath;
form.on('progress', function validateMimetype(bytesReceived, bytesExpected) {
var percent = (bytesReceived / bytesExpected * 100) | 0;
// pretty basic check if enough bytes of the file are written to disk,
// might be too naive if the file is small!
if (tmpPath && percent > 25) {
var child = exec('file --mime-type ' + tmpPath, function (err, stdout, stderr) {
var mimetype = stdout.substring(stdout.lastIndexOf(':') + 2, stdout.lastIndexOf('\n'));
console.log('### file CALL OUTPUT', err, stdout, stderr);
if (err || stderr) {
console.log('### ERROR: MIMETYPE COULD NOT BE DETECTED');
} else if (!ALLOWED_MIME_TYPES[mimetype]) {
console.log('### ERROR: INVALID MIMETYPE', mimetype);
} else {
console.log('### MIMETYPE VALIDATION COMPLETE');
}
});
form.removeListener('progress', validateMimetype);
}
});
form.on('fileBegin', function grabTmpPath(_, fileInfo) {
if (fileInfo.path) {
tmpPath = fileInfo.path;
form.removeListener('fileBegin', grabTmpPath);
}
});

The new version of Connect (2.x.) has this already baked into the bodyParser using the limit middleware: https://github.com/senchalabs/connect/blob/master/lib/middleware/multipart.js#L44-61
I think it's much better this way as you just kill the request when it exceeds the maximum limit instead of just stopping the formidable parser (and letting the request "go on").
More about the limit middleware: http://www.senchalabs.org/connect/limit.html

how to upload a file from node.js

I found many posts when I queried for this problem, but they all refer to how to upload a file from your browser to a node.js server. I want to upload a file from node.js code to another server. I tried to write it based on my limited knowledge of node.js, but it doesn't work.
function (data) {
var reqdata = 'file='+data;
var request = http.request({
host : HOST_NAME,
port : HOST_PORT,
path : PATH,
method : 'POST',
headers : {
'Content-Type' : 'multipart/form-data',
'Content-Length' : reqdata.length
}
}, function (response) {
var data = '';
response.on('data', function(chunk) {
data += chunk.toString();
});
response.on('end', function() {
console.log(data);
});
});
request.write(reqdata+'\r\n\r\n');
request.end();
})
The above function is called by other code that generates data.
I tried to upload same data file using curl -F "file=#<filepath>" and the upload is successful. But my code fails. The server returns an application specific error which hints that the uploaded file was invalid/corrupt.
I collected tcpdump data and analysed it in wireshark. The packet sent from my node.js code lacks the boundary required for the multipart data. I see this message in wireshark packet
The multipart dissector could not find the required boundary parameter.
Any idea how to accomplish this in node.js code?

jhcc's answer is almost there.
Having to come up with support for this in our tests, I tweaked it slightly.
Here's the modified version that works for us:
var boundaryKey = Math.random().toString(16); // random string
request.setHeader('Content-Type', 'multipart/form-data; boundary="'+boundaryKey+'"');
// the header for the one and only part (need to use CRLF here)
request.write(
'--' + boundaryKey + '\r\n'
// use your file's mime type here, if known
+ 'Content-Type: application/octet-stream\r\n'
// "name" is the name of the form field
// "filename" is the name of the original file
+ 'Content-Disposition: form-data; name="my_file"; filename="my_file.bin"\r\n'
+ 'Content-Transfer-Encoding: binary\r\n\r\n'
);
fs.createReadStream('./my_file.bin', { bufferSize: 4 * 1024 })
.on('end', function() {
// mark the end of the one and only part
request.end('\r\n--' + boundaryKey + '--');
})
// set "end" to false in the options so .end() isn't called on the request
.pipe(request, { end: false }) // maybe write directly to the socket here?
Changes are:
ReadableStream.pipe returns the piped-to stream, so end never gets called on that. Instead, wait for end on the file read stream.
request.end puts the boundary on a new line.

Multipart is pretty complex, if you want to make it look like how a client usually handles "multipart/form-data", you have to do a few things. You first have to select a boundary key, this is usually a random string to mark the beginning and end of the parts, (in this case it would be only one part since you want to send a single file). Each part (or the one part) will need a header (initialized by the boundary key), setting the content-type, the name of the form field and the transfer encoding. Once the part(s) are completed, you need to mark the end of each part with the boundary key.
I've never worked with multipart, but I think this is how it could be done. Someone please correct me if I'm wrong:
var boundaryKey = Math.random().toString(16); // random string
request.setHeader('Content-Type', 'multipart/form-data; boundary="'+boundaryKey+'"');
// the header for the one and only part (need to use CRLF here)
request.write(
'--' + boundaryKey + '\r\n'
// use your file's mime type here, if known
+ 'Content-Type: application/octet-stream\r\n'
// "name" is the name of the form field
// "filename" is the name of the original file
+ 'Content-Disposition: form-data; name="my_file"; filename="my_file.bin"\r\n'
+ 'Content-Transfer-Encoding: binary\r\n\r\n'
);
fs.createReadStream('./my_file.bin', { bufferSize: 4 * 1024 })
// set "end" to false in the options so .end() isnt called on the request
.pipe(request, { end: false }) // maybe write directly to the socket here?
.on('end', function() {
// mark the end of the one and only part
request.end('--' + boundaryKey + '--');
});
Again, I've never done this before, but I think that is how it could be accomplished. Maybe someone more knowledgable could provide some more insight.
If you wanted to send it as base64 or an encoding other than raw binary, you would have to do all the piping yourself. It will end up being more complicated, because you're going to have to be pausing the read stream and waiting for drain events on the request to make sure you don't use up all your memory (if it's not a big file you generally wouldn't have to worry about this though). EDIT: Actually, nevermind that, you could just set the encoding in the read stream options.
I'll be surprised if there isn't a Node module that does this already. Maybe someone more informed on the subject can help with the low-level details, but I think there should be a module around somewhere that does this.

As the error message states you are missing the boundary parameter. You need to add a random string to separate each file from the rest of the files/form-data.
Here is how a request could look like:
The content type:
Content-Type:multipart/form-data; boundary=----randomstring1337
The body:
------randomstring1337
Content-Disposition: form-data; name="file"; filename="thefile.txt"
Content-Type: application/octet-stream
[data goes here]
------randomstring1337--
Note that the -- in the beginning and end of of the random string in the body is significant. Those are part of the protocol.
More info here http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html

The fastest way I was able to do this, that worked, was using the request package. The code was well documented and it just worked.
(For my testing I wanted a JSON result and non-strict SSL - there are many other options...)
var url = "http://"; //you get the idea
var filePath = "/Users/me/Documents/file.csv"; //absolute path created elsewhere
var r = request.post( {
url: url,
json: true,
strictSSL: false
}, function( err, res, data ) {
//console.log( "Finished uploading a file" );
expect( err ).to.not.be.ok();
expect( data ).to.be.ok();
//callback(); //mine was an async test
} );
var form = r.form();
form.append( 'csv', fs.createReadStream( filePath ) );

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Downloading PDF from Dropbox and then uploading to S3 results in larger, corrupt file [duplicate] - pdf

Related

How does node-redis method setex() convert Buffer to string?

HapiJS reply with readable stream

How to determine a file's Unicode character encoding in IOS?

when to check for file size/mimetype in node.js upload script?

how to upload a file from node.js

Categories

Resources