mark processed JSON files in forkJoin - angular5

while there are certain pieces of code missing, the core logic is that a JSON file should be marked as 'processed' and not be processed again. If a call to assets/b.json is done, I need to cache that the file was visited before and hence, no second http call should be made to the same file. Right now, a file is called multiple times. Each JSON has a path identification:
let touchedJSONFiles = {}
private setJSONAsTouched(path) {
this.touchedJSONFiles[path]= true;
}
processJSONFiles() {
// do some logic to build the JSON files that needs to be checked
let files = [];//contains paths only such as abc/abc.json, 34.json
let urls = [];// which urls to process, if any
files.map(file => {
if (this.touchedJSONFiles[file] == undefined) {
urls.push(file);
}
});
// do we have files to process?
if (urls.length > 0 ) {
forkJoin(urls).subscribe(results => {
const len = urls.length;
for(let k=0; k < len; k++) {
this.setJSONAsTouched(files[k]);
}
});
}
}
Given the method could be called from wherever, I want to make it as reusable as possible. The current issue I am having is that a JSON file is processed more than once even though it does get touched and the setJSONAsTouched() method is called correctly. While I am not sure it looks like I need to find a way for forkJoin to finish before the method resolves.

Related

recursive lambda function never seems to run

I'm not familiar enough with node.js or lambda to see an obvious solution to a dilemma I have. I'm writing some utilities on lambda to manipulate images in an S3 bucket and make them accessible via the GatewayAPI to rest calls.
BACKGROUND DETAILS:
One of the utilities I have retrieves the headObject information such as the mtime, size and metadata. The images themselves will likely be coming in from various means and I won't always have control over adding metadata to them when they arrive/are-created. But I don't really need it until it's necessary to view details about the image from a web interface. And when I do that, I use a thumbnail instead so I created a lambda create-event triggered script (and also have a fall back variation of it via the gatewayAPI) that will create a thumbnail (either when the image is first uploaded to S3 or whenever I make the gateway CreateThumbbnail call) at which time it adds metadata to the thumbnail for the image with things like the original image mimetype, pixel width and height.
What I would like to be able to do, is to create a 'GetObjectInfo' that firsts pulls the headObject data, then checks to see if the bucket specified is or is not the bucket with the associated thumbnail files. (e.g. if it is or is not a thumbnail object) If it is 'not' a thumbnail, I want to then go retrieve -- or at least attempt to retrieve -- the headObject for the associated thumbnail file and attach the thumbnail file's metadata (if the thumbnail exists) onto the data from the original head request before returning the information.
The problem is, that when I set up an async callback scheme, the first headObject request completes, the second never seems to get out of the starting gate.
The method in my class is:
getHeadObject(bucket,object,callback) {
console.log(bucket, "CLASS-head#1")
this.s3.headObject({"Bucket":bucket,"Key":object}, function(err,data){
console.log(bucket, "CLASS-head#2")
callback(err,data)
})
}
getObjectInfo(bucket,object,callback) {
let scope = this
console.log(bucket,"CLASS-object#1")
this.getHeadObject(bucket,object,function(err,data) {
console.log(bucket,"CLASS-object#2")
if(err)
callback(err,data)
else
callback(null,data)
})
}
The lambda code that calls it recursively is:
var cInst = new myClass()
cInst.getObjectInfo(srcBucket,filePath,function(err,data) {
if(data.status == 1) { // if parent request success
// if parent is not thumbnail
if(srcBucket != THUMB_BUCKET) { // see if a thumbnail exists
let thumbPath = myClass.getThumbPath(srcBucket,userId,directory,targetObject)
console.log('---- thumbPath', thumbPath)
cInst.getObjectInfo(THUMB_BUCKET,thumbPath, function(err,thumbData) {
console.log("thumbData #1",thumbData)
if(thumbData.status == 1) { // thumbnail exists
console.log("thumbData")
}
})
}
context.succeed(myClass.createResponse(1, data, api))
} else {
context.fail(myClass.createResponse(data.status, data, api))
}
})
First call on the parent is see
{bucket} "CLASS-object#1"
{bucket} "CLASS-head#1"
{bucket} "CLASS-head#2"
{bucket} "CLASS-object#2"
on the second I only see:
image-thumbnails "CLASS-object#1"
image-thumbnails "CLASS-head#1"
(getThumbPath is just a static utility function that builds the thumbnail path based on the parameters related to the original file. It is already tested as working and produces something like {original-bucket-name}/{userid}/{subdirectory}/{file-basename_150x150.jpg} for any given image - I confirmed that in this instance, the thumbnail exists and matches the path returned by getThumbPath and the acl appears to have permission to read the bucket and the object)
UPDATE: More weirdness
I tried setting the permissions to publicly readable on the thumbnail and it worked. So I started messing with the acl. For the time being since I am still testing, I just gave the role for the scripts full S3 permissions.
But I noticed now that it's working and not working intermittently. One time it completes, the next time it doesn't. WTF is going on here?
I would bet that this is the most common problem that people see when using Node.js with Lambda.
When a Node.js Lambda reaches the end of the main thread, it ends all other threads. When it reaches the end of the handler, it stops all concurrent promises or async calls that are running.
To make sure that the lambda does not prematurely terminate those threads, wait until those promises are complete by using await.
In your case, the following will work: wrap any async calls in a promise and then await them.
await new Promise(async (resolve, reject) => {
cInst.getObjectInfo(srcBucket,filePath,function(err,data) {
if(data.status == 1) {
if(srcBucket != THUMB_BUCKET) {
...
...
await new Promise((resolve2, reject2) => {
cInst.getObjectInfo(THUMB_BUCKET,thumbPath, function(err,thumbData) {
...
...
resolve2();
})
})
}
context.succeed(myClass.createResponse(1, data, api))
resolve();
} else {
context.fail(myClass.createResponse(data.status, data, api))
reject();
}
})
})

How to convert a TStringDynArray to a TStringList

I'm using TDirectory::GetFiles() to get a list of files (obviously).
The result is stored in a TStringDynArray and I want to transfer it to a TStringList for the sole purpose to use the IndexOf() member to see if a string is present in the list or not.
Any solution that will let me know if a certain string is present in the list of files returned from TDirectory::GetFiles() will do fine. Although, it would be interesting to know how to convert the TStringDynArray.
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
System::Classes::TStringList *Files = new System::Classes::TStringList;
Files->Assing(DynFiles) // I know this is wrong, but it illustrates what I want to do.
if(Files->IndexOf("Bar") { // <---- This is my goal, to find "Bar" in the list of files.
}
TStringList and TStringDynArray do not know anything about each other, so you will have to copy the strings manually:
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
System::Classes::TStringList *Files = new System::Classes::TStringList;
for (int I = DynFiles.Low; I <= DynFiles.High; ++I)
Files->Add(DynFiles[I]);
if (Files->IndexOf("Bar")
{
//...
}
delete Files;
Since you have to manually loop through the array anyway, you can get rid of the TStringList:
TStringDynArray DynFiles = TDirectory::GetFiles("Foo path");
for (int I = DynFiles.Low; I <= DynFiles.High; ++I)
{
if (DynFiles[I] == "Bar")
{
//...
break;
}
}
But, if you are only interested in checking for the existence of a specific file, look at TFile::Exists() instead, or even Sysutils::FileExists().
if (TFile::Exists("Foo path\\Bar"))
{
//...
}
if (FileExists("Foo path\\Bar"))
{
//...
}
* personally, I hate that the IOUtils unit uses dynamic arrays for lists. They are slow, inefficient, and do not integrate well with the rest of the RTL. But that is just my opinion.
TStrings knows TStringDynArray good enough to provide a member AddStrings:
Files->AddStrings(TDirectory::GetFiles("Foo path"));
will do the job.

uploading a file in a non-blocking manner without using gridFSBodyParser(gridFS)

The plugin play-reactivemongo offers an easy way to upload a file:
def upload = Action(gridFSBodyParser(gridFS)) { request =>
val futureFile: Future[ReadFile[BSONValue]] = request.body.files.head.ref
futureFile.map { file =>
// do something
Ok
}.recover { case e: Throwable => InternalServerError(e.getMessage) }
}
Unfortunately this solution doesn't suit me because:
I would like only my DAO layer to depend on reactive-mongo.
I need to save the file only if a user is authenticated (with SecureSocial) and use some user's properties as checks and metadata.
If no user is authenticated the request body shouldn't be parsed at all (see also this question).
It would be something along the lines
def upload = SecuredAction { request =>
val user = request.user
val enumerator = an enumrator from the body parsing ???
myDAO.saveFile(user, enumerator)
object myDAO {
def saveFile(user:User, enumerator:Enumerator[Array[Byte]]) = {
...
val fileToSave = DefaultFileToSave(...)
gridfs.save(enumerator, fileToSave)
...
}
}
Unfortunately it seems there is no way to get an enumerator from the parsing of the request body. The only way seems to provide the Action with a parser and an Iteratee that will be fed with the the body being parsed.
I couldn't figure out how to achieve it in a reactive way (without using a temporary file or storing the body in memory). Is it at all possible?
Actually, you might consider not using girdFS built-in parser at all:
val gfs = new GridFS(db)
// the controller method, Authenticated here is custom object extending ActionBuilder
def upload = Authenticated.async(parse.multipartFormData) { request =>
...
request.body.file("photo") match {
// handle error cases
...
case Some(photo) =>
val fileToSave = DefaultFileToSave(photo.filename, photo.contentType)
// here some more operations, basically you don't need the and need only photo.ref.file
val enumerator = Enumerator(Image(photo.ref.file).fitToWidth(120).write)
gfs.save(enumerator, fileToSave) map {
//handle responses and stuff
...
}
}
}
}

log4javascript - obtain history of messages programmatically?

I'm looking into using a javascript logging framework in my app.
I quite like the look of log4javascript (http://log4javascript.org/) but I have one requirement which I'm not sure that it satisfies.
I need to be able to ask the framework for all messages which have been logged.
Perhaps I could use an invisible InPageAppender (http://log4javascript.org/docs/manual.html#appenders) to log to a DOM element, then scrape out the messages from that DOM element - but that seems pretty heavy.
Perhaps I need to write my own "InMemoryAppender"?
There's an ArrayAppender used in log4javascript's unit tests that stores all log messages it receives in an array accessible via its logMessages property. Hopefully it should show up in the main distribution in the next version. Here's a standalone implementation:
var ArrayAppender = function(layout) {
if (layout) {
this.setLayout(layout);
}
this.logMessages = [];
};
ArrayAppender.prototype = new log4javascript.Appender();
ArrayAppender.prototype.layout = new log4javascript.NullLayout();
ArrayAppender.prototype.append = function(loggingEvent) {
var formattedMessage = this.getLayout().format(loggingEvent);
if (this.getLayout().ignoresThrowable()) {
formattedMessage += loggingEvent.getThrowableStrRep();
}
this.logMessages.push(formattedMessage);
};
ArrayAppender.prototype.toString = function() {
return "[ArrayAppender]";
};
Example use:
var log = log4javascript.getLogger("main");
var appender = new ArrayAppender();
log.addAppender(appender);
log.debug("A message");
alert(appender.logMessages);

when to check for file size/mimetype in node.js upload script?

I created an upload script in node.js using express/formidable. It basically works, but I am wondering where and when to check the uploaded file e. g. for the maximum file size or if the file´s mimetype is actually allowed.
My program looks like this:
app.post('/', function(req, res, next) {
req.form.on('progress', function(bytesReceived, bytesExpected) {
// ... do stuff
});
req.form.complete(function(err, fields, files) {
console.log('\nuploaded %s to %s', files.image.filename, files.image.path);
// ... do stuff
});
});
It seems to me that the only viable place for checking the mimetype/file size is the complete event where I can reliably use the filesystem functions to get the size of the uploaded file in /tmp/ – but that seems like a not so good idea because:
the possibly malicious/too large file is already uploaded on my server
the user experience is poor – you watch the upload progress just to be told that it didnt work afterwards
Whats the best practice for implementing this? I found quite a few examples for file uploads in node.js but none seemed to do the security checks I would need.
With help from some guys at the node IRC and the node mailing list, here is what I do:
I am using formidable to handle the file upload. Using the progress event I can check the maximum filesize like this:
form.on('progress', function(bytesReceived, bytesExpected) {
if (bytesReceived > MAX_UPLOAD_SIZE) {
console.log('### ERROR: FILE TOO LARGE');
}
});
Reliably checking the mimetype is much more difficult. The basic Idea is to use the progress event, then if enough of the file is uploaded use a file --mime-type call and check the output of that external command. Simplified it looks like this:
// contains the path of the uploaded file,
// is grabbed in the fileBegin event below
var tmpPath;
form.on('progress', function validateMimetype(bytesReceived, bytesExpected) {
var percent = (bytesReceived / bytesExpected * 100) | 0;
// pretty basic check if enough bytes of the file are written to disk,
// might be too naive if the file is small!
if (tmpPath && percent > 25) {
var child = exec('file --mime-type ' + tmpPath, function (err, stdout, stderr) {
var mimetype = stdout.substring(stdout.lastIndexOf(':') + 2, stdout.lastIndexOf('\n'));
console.log('### file CALL OUTPUT', err, stdout, stderr);
if (err || stderr) {
console.log('### ERROR: MIMETYPE COULD NOT BE DETECTED');
} else if (!ALLOWED_MIME_TYPES[mimetype]) {
console.log('### ERROR: INVALID MIMETYPE', mimetype);
} else {
console.log('### MIMETYPE VALIDATION COMPLETE');
}
});
form.removeListener('progress', validateMimetype);
}
});
form.on('fileBegin', function grabTmpPath(_, fileInfo) {
if (fileInfo.path) {
tmpPath = fileInfo.path;
form.removeListener('fileBegin', grabTmpPath);
}
});
The new version of Connect (2.x.) has this already baked into the bodyParser using the limit middleware: https://github.com/senchalabs/connect/blob/master/lib/middleware/multipart.js#L44-61
I think it's much better this way as you just kill the request when it exceeds the maximum limit instead of just stopping the formidable parser (and letting the request "go on").
More about the limit middleware: http://www.senchalabs.org/connect/limit.html