Easiest way to convert from TIFF to PDF on AWS Lambda - pdf

I'm working on an AWS Lambda application that needs to take a TIFF file and convert it to a PDF. I'm using ImageMagick exensively, so the easiest thing to do was: convert input.tif output.pdf. That works fine in my Mac environment, but fails in to convert to a true PDF in the Lambda environment.
The ImageMagick build on Lambda seems to not support PDFs. If I run convert -list format in the Lambda environment, there's no entry for PDF. Here's my test Lambda function:
const im = require('imagemagick');
const fs = require('fs');
exports.handler = (event, context, callback) => {
var inputFileName = 'input.tif';
var imagesPath = 'assets/images';
var outputFile = '/tmp/output.pdf';
var args = [
imagesPath+'/'+inputFileName,
'-format',
'pdf',
outputFile
];
im.convert(args,
function(err, stdout, stderr){
if (err) throw err;
console.log('stdout:', stdout);
var imageRef = fs.readFileSync(outputFile);
callback(null, {
statusCode: 200,
headers: {
'Content-Type': 'application/pdf',
'Content-Disposition': 'attachment; filename=output.pdf'
},
body: imageRef.toString('base64'),
isBase64Encoded: true
});
});
}
When I run identify output.pdf (i.e. the downloaded file), the file is reported as a TIFF file:
/Users/myuser/Downloads/output.pdf TIFF 517x243 517x243+0+0 8-bit CMYK 1.1314MiB 0.000u 0:00.009
So ImageMagick seems to just be passing it through as a TIFF file.
I've tried using tiff2pdf - which is installed locally; not sure about Lambda - but that doesn't even work on my Mac. I get an error saying:
tiff2pdf: No support for /path/to/input.tif with 5 samples per pixel.

Related

Axios prepending content to start of file making it unreadable

I am trying to upload a file to an s3 presigned using axios from an expo managed mobile app FE. I have found that the following code works perfectly:
const file = await fetch(fileRef.uri);
const blob = await file.blob();
await fetch(uploadUrl, { method: 'PUT', body: blob });
here fileRef is an object like:
Object {
"height": 1920,
"uri": "file:///....jpg",
"width": 1080,
}
and uploadUrl is a presignedURL
I want to port this over to axios to take advantage of the onUploadProgress event. I've written the following:
const body = new FormData()
body.append('file', fileRef)
await axios.put(uploadUrl, body);
This uploads the file, however it prepends additional information to the start of the file that makes it so the image or video uploaded is not readable. The information it prepends looks like:
--9V.XUQuQ1DIG8HFMzJO-veI4JbmI7j_WawYPxtMUG2NhK_7eGnlL.kVNSXyH_sAQ2897mg^M
content-disposition: form-data; name="file"^M
content-type: image/jpeg^M
^M
I found that if i delete these lines, the file can now be opened (ex. by Quicktime).
I'd like to know how i can not have this information added to the start of the file?

How to get a pre-signed URL that downloads file with http compression

Here is my code in node.js:
const downloadURL = await s3.getSignedUrlPromise('getObject', {
Bucket: BUCKET_NAME,
Key: 'key to a large json file',
});
One got the URL, I want to download a very large JSON file stored in S3 from browser. Since it is large, I would like to use HTTP compression which would compress a 20MB JSON to less than 1MB. I could not find anywhere how to do it or whether it is at all possible with S3 APIs.
I also tried to do below when using the signed URL to download file and it seems not work.
const dataRes = await fetch(downloadURL, {
headers: {
'Accept-Encoding': 'gzip, deflate',
},
method: 'GET',
});
Hope somebody could help me out. Thanks a lot!
After doing some study, I have resolved this. Post here and hope it is helpful to others.
You cannot ask S3 to compress file on the fly when getObject or using signed URL to getObject
You would have to save the zipped file into S3 in the first place. In Linux, using below command to do it:
gzip -9 <file to compress>
Upload the zipped file to S3
Use below code to generate the signed URL:
const downloadURL = await s3.getSignedUrlPromise('getObject', {
Bucket: BUCKET_NAME,
Key: 'key to a large zipped json file',
ResponseContentEncoding: 'gzip',
ResponseContentType: 'application/json',
});
Use below code to download from the signed URL:
const res = await fetch(downloadurl);
const jsonData = await res.json();

Google cloud speech syncrecognize "INVALID_ARGUMENT"

I have managed the "overview tutorial" : https://cloud.google.com/speech/docs/getting-started
Then I tried to use my own audio file . I uploaded a .flac file with a sample rate of 16000Hz.
I only changed the sync-request.json file below with my own audio file hosted on google cloud storage (gs://my-bucket/test4.flac)
{
"config": {
"encoding":"flac",
"sample_rate": 16000
},
"audio": {
"uri":"gs://my-bucket/test4.flac"
}
}
The file is well recognized but the request return an "INVALID_ARGUMENT" error
{
"error": {
"code": 400,
"message": "Unable to recognize speech, code=-73541, possible error in recognition config. Please correct the config and retry the request.",
"status": "INVALID_ARGUMENT"
}
}
As per this answer, all encodings support only 1 channel (mono) audio
I was creating the FLAC file with this command:
ffmpeg -i test.mp3 test.flac
Sample rate in request does not match FLAC header
But adding the -ac 1 (setting number of audio channels to 1) fixed this issue.
ffmpeg -i test.mp3 -ac 1 test.flac
Here is my full Node.js code
const Speech = require('#google-cloud/speech');
const projectId = 'EnterProjectIdGeneratedByGoogle';
const speechClient = Speech({
projectId: projectId
});
// The name of the audio file to transcribe
var fileName = '/home/user/Documents/test/test.flac';
// The audio file's encoding and sample rate
const options = {
encoding: 'FLAC',
sampleRate: 44100
};
// Detects speech in the audio file
speechClient.recognize(fileName, options)
.then((results) => {
const transcription = results[0];
console.log(`Transcription: ${transcription}`);
}, function(err) {
console.log(err);
});
Sample rate could be 16000 or 44100 or other valid ones, and encoding can be FLAC or LINEAR16. Cloud Speech Docs
My bad, as the doc "https://cloud.google.com/speech/docs/basics", the .flac file have to be a 16-bit PCM
Sumup:
Encoding: FLAC
Channels: 1 # 16-bit
Samplerate: 16000Hz
/!\ pay attention to not export a stereo file (2 channels) file which throw an other error (only one channel accepted) Google speech API internal server error -83104

Objc-C Mongodb Driver to Meteor CollectionFS

I have written an objective-c app that can write files to my meteor mongodb database. Using RadMongoDB (https://github.com/timburks/RadMongoDB) I write an image over to my mongo's gridfs .files and .chunks.
//Defining RadMongoDB
RadMongoDB *rad = [[RadMongoDB alloc] init];
//Connection Dictionary
NSDictionary *connection = #{
#"host" : #"127.0.0.1",
#"port" : [NSNumber numberWithInt:3002]};
int num =[rad connectWithOptions:connection];
[rad writeFile:path2 withMIMEType:#"image/png" inCollection:#"contacts" inDatabase:#"meteor"]
The image (path2) is successfully written to the gridfs. Im my meteor mondgodb shell I can see the files successfully written.
.chunks:
.files:
These gridfs files are linked to a collectionfs (https://github.com/CollectionFS/Meteor-CollectionFS) collection containing a variety of other pictures that have been inserted via a meteor app. The problem is pulling the image out written by the driver using collectionfs. It is apparent that the files written to gridfs by the driver do not get worked on by the file handlers. Therefore I tried re-forcing all of the files through (collectionfs filehandler reset) but that still didn't work: (Javascript Below, note ContactsFS is the collectionfs collection corresponding to gridfs contacts).
//Reset
ContactsFS.find({}).forEach(function(doc) {
ContactsFS.update({ _id: doc._id}, { $set: { handledAt: null, fileHandler: {} } });
});
//Set Completed to True
ContactsFS.update(fileRecord, {$set: {complete: true}});
I have come to the conclusion that the way the driver interacts with gridfs is much different than how meteor and collectionfs read and write to it. Is there any way to fix this? I am desperate for help, thanks!
EDIT:
After setting the uploaded files complete = true the file handler attempts to act on the drivers inserted file. However now I receive a server side error:
I believe this is because of how collectionfs reads the gridfs file. The gridfs image's data is stored by the obj-c driver as a Uint8Array (as seen in screenshot 1). I have tried setting each paramater on the obj-c driver image so that collectionfs will be happy:
ContactsFS.update(imageid, {$set: {handledAt: null}});
ContactsFS.update(imageid, {$set: {uploadDate: date}});
ContactsFS.update(imageid, {$set: {countChunks: 1}});
ContactsFS.update(imageid, {$set: {numChunks: 1}});
ContactsFS.update(imageid, {$set: {length: len}});›
ContactsFS.update(imageid, {$set: {uploading: false}});
ContactsFS.update(imageid, {$set: {encoding: encode}});
//Setting complete to True will send the file through the filehandlers
ContactsFS.update(imageid, {$set: {complete: true}});
Still nothing. How do I get around this problem?
Try this:
var len = "" + fileRecord.plength;
var chunkSize = 256 * 1024; // set this to whatever chunk size RadMongoDB is using
var chunkCount = Math.ceil(fileRecord.plength / chunkSize);
ContactsFS.update(imageid, {$set: {
handledAt: null,
uploadDate: Date.now(),
countChunks: chunkCount,
numChunks: chunkCount,
length: len,
uploading: false,
encoding: 'binary'
}});
ContactsFS.update(imageid, {$set: {complete: true}});
Requires the fix discussed in this issue, too.

ringojs fileupload example

Does anyone have an example of uploading a file to the server using ringojs?
There's a simple upload example in the demo app, but it stores uploads in-memory which is not a good idea for most apps. To save uploads to a temporary file you'll currently have to do something like this (this is a modified version of the upload demo action):
var fu = require("ringo/webapp/fileupload");
function upload(req) {
if (fu.isFileUpload(req.contentType)) {
var params = {};
fu.parseFileUpload(req, params, req.charset, fu.TempFileFactory);
return {
status: 200,
headers: {"Content-Type": "text/plain"},
body: [params.file.name, " saved to ", params.file.tempfile]
};
}
return Response.skin(module.resolve('skins/upload.txt'), {
title: "File Upload"
});
}
Unfortunately, there was a bug with saving uploads to temp files that I just fixed, so you'll have to use a current git snapshot or patch file modules/ringo/webapp/fileupload.js manually:
http://github.com/ringo/ringojs/commit/1793a815a9ca3ffde4aa5a07c656456969b504f9
We also need some high level way of doing this for the next release (e.g. setting a req.uploadTempDir property). I'll open an issue for this.