I would like to automatically invalidate the (cdn) cache for objects changed by invoking the command az storage blob sync. Therefore I would need the list of files touched by sync.
The best option I have found is to give it the verbose flag, parse the generated log file and apply a filter to get the files touched. This feels a bit hackish so I was wondering if there is a better option.
You can just filter files by last modified time, and based on your research, az sync also relies on modified times and meets your requirements here.
I'm planning to launch a simple Node.JS utility and push it to heroku. A fire and forget solution, will sleep for like 90% of the time probably. Unfortunately it seems that I require a persistent data storage for my purposes (heroku apps get rebooted daily and storing everything in RAM is unrealistic), and I don't know which way to look as:
Most SQL hostings are paid / limited time free / require constant refreshing ( like freemysqlhosting ).
Storing stuff in plain .txt format is seemingly hard to implement, besides git always overwrites the contents of a tracked .txt file, and leaving it untracked disposes of it on heroku and leads to ENOENT No such file error. Yeah, I tried.
So, the question is - how do I implement a simple and built in solution for storing data? Are there any relevant typical solutions? It's going to be equivalent to just 1 SQL table.
As you can see, you can answer this on many levels - maybe suggest a free deploy and forget SQL hosting (it obviously has to support external connections), maybe tell me how to keep a file tracked in git without actually replacing all of its content with every commit, maybe suggest some module to install. I hope this is not too broad..
At the time of saving the session, [aerospike express session store][1] stringifies the object and saves it in one bin (named session). I want to save some of the data in separate bin so that I can create a secondary index on them? Any way to do that?
It's unfortunately not possible at the moment. But maybe we can work out a solution via the Github issue you filed for the aerospike-session-store module.
I need to start an external process (which is around 300MB large on its own) several times using System.Diagnostics.Process.
The only problem is: once the first instance starts, it generates temporary data in its base folder (where the application is located), so I can't just start another instance - it would corrupt the data of the first one and mess up everything.
I thought about temporarily copying the whole application folder programmatically, so that each instance has its own, but that doesn't feel right.
Could anybody help me out? Thanks in advance!
Try starting each copy in a different directory.
If the third-party app ignores the current directory, you could make a symlink to it in a different folder. I'm not necessarily recommending that, though.
Pass an argument to your external process that specifies the temp folder to use.
It's a very common scenario: some process wants to drop a file on a server every 30 minutes or so. Simple, right? Well, I can think of a bunch of ways this could go wrong.
For instance, processing a file may take more or less than 30 minutes, so it's possible for a new file to arrive before I'm done with the previous one. I don't want the source system to overwrite a file that I'm still processing.
On the other hand, the files are large, so it takes a few minutes to finish uploading them. I don't want to start processing a partial file. The files are just tranferred with FTP or sftp (my preference), so OS-level locking isn't an option.
Finally, I do need to keep the files around for a while, in case I need to manually inspect one of them (for debugging) or reprocess one.
I've seen a lot of ad-hoc approaches to shuffling upload files around, swapping filenames, using datestamps, touching "indicator" files to assist in synchronization, and so on. What I haven't seen yet is a comprehensive "algorithm" for processing files that addresses concurrency, consistency, and completeness.
So, I'd like to tap into the wisdom of crowds here. Has anyone seen a really bulletproof way to juggle batch data files so they're never processed too early, never overwritten before done, and safely kept after processing?
The key is to do the initial juggling at the sending end. All the sender needs to do is:
Store the file with a unique filename.
As soon as the file has been sent, move it to a subdirectory called e.g. completed.
Assuming there is only a single receiver process, all the receiver needs to do is:
Periodically scan the completed directory for any files.
As soon as a file appears in completed, move it to a subdirectory called e.g. processed, and start working on it from there.
Optionally delete it when finished.
On any sane filesystem, file moves are atomic provided they occur within the same filesystem/volume. So there are no race conditions.
Multiple Receivers
If processing could take longer than the period between files being delivered, you'll build up a backlog unless you have multiple receiver processes. So, how to handle the multiple-receiver case?
Simple: Each receiver process operates exactly as before. The key is that we attempt to move a file to processed before working on it: that, and the fact the same-filesystem file moves are atomic, means that even if multiple receivers see the same file in completed and try to move it, only one will succeed. All you need to do is make sure you check the return value of rename(), or whatever OS call you use to perform the move, and only proceed with processing if it succeeded. If the move failed, some other receiver got there first, so just go back and scan the completed directory again.
If the OS supports it, use file system hooks to intercept open and close file operations. Something like Dazuko. Other operating systems may let you know about file operations in anoter way, for example Novell Open Enterprise Server lets you define epochs, and read list of files modified during an epoch.
Just realized that in Linux, you can use inotify subsystem, or the utilities from inotify-tools package
File transfers is one of the classics of system integration. I'd recommend you to get the Enterprise Integration Patterns book to build your own answer to these questions -- to some extent, the answer depends on the technologies and platforms you are using for endpoint implementation and for file transfer. It's a quite comprehensive collection of workable patterns, and fairly well written.