Is it tenable to access file metadata using flowgear resources? - flowgear

I have a requirement use a flowgear workflow to process files (via a droppoint, targeting an windows file share [SMB]), but targeting only the files that have been modified after a certain time of day.
How can one tell the the "Last Modified" date/time of a file using a flowgear node?
I have been searching the Flowgear help center, and have been experimenting with file-related nodes - File, File Enumerator, File Watcher and File Manage, but I haven't seen any property that exposes this piece of metadata.

Here's an example of how you can do it.
https://flowgear.me/#s/cCp8kGQ
In this sample, you simply use the script to get a list of files, after that you can once again use the normal File nodes to do the rest. It can be modified to return the file directly via the Script node, however that would require additional hand-coding and is needlessly complex.

Related

Is there a way to import backups in NiFi?

Using NiFi v0.6.1 is there a way to import backups/archives?
And by backups I mean the files that are generated when you call
POST /controller/archive using the REST api or "Controller Settings" (tool bar button) and then "Back-up flow" (link).
I tried unzipping the backup and importing it as a template but that didn't work. But after comparing it to an exported template file, the formats are reasonably different. But perhaps there is a way to transform it into a template?
At the moment my current work around is to not select any components on the top level flow and then select "create template"; which will add a template with all my components. Then I just export that. My issue with this is it's a bit more tricky to automate via the REST API. I used Fiddler to determine what the UI is doing and it first generates a snippet that includes all the components (labels, processors, connections, etc.). Then it calls create template (POST /nifi-api/contorller/templates) using the snippet ID. So the template call is easy enough but generating the definition for the snippet is going to take some work.
Note: Once the following feature request is implemented I'm assuming I would just use that instead:
https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
The entire flow for a NiFi instance is stored in a file called flow.xml.gz in the conf directory (flow.xml.tar in a cluster). The back-up functionality is essentially taking a snapshot of that file at the given point in time and saving it to the conf/archive directory. At a later point in time you could stop NiFi and replace conf/flow.xml.gz with one of those back-ups to restore the flow to that state.
Templates are a different format from the flow.xml.gz. Templates are more public facing and shareable, and can be used to represent portions of a flow, or the entire flow if no components are selected. Some people have used templates as a model to deploy their flows, essentially organizing their flow into process groups and making template for each group. This project provides some automation to work with templates: https://github.com/aperepel/nifi-api-deploy
You just need to stop NiFi, replace the nifi flow configuration file (for example this could be flow.xml.gz in the conf directory) and start NiFi back up.
If you have trouble finding it check your nifi.properties file for the string nifi.flow.configuration.file= to find out what you've set this too.
If you are using clustered mode you need only do this on the NCM.

Executing Abaqus Model in Taverna

I'm pretty new to both Taverna and Abaqus but I am trying to run an Abaqus model using a "Tool" in Taverna remotely on a HPC. This works fine if I already have my model file and inputs on the HPC but I need a way of uploading the files dynamically in Taverna (trying to generically wrap Abaqus models).
I've tried adding a input port that takes a file list but I don't know how I can copy it to the "location" that I've set for the tool. Could a beanshell service be the answer or can I iterate through the file list and copy them up before executing the abaqus model?
Thanks
When you say that you created an input port that takes a file list, I guess you mean an input to the tool service.
Assuming the input port is called my_file_list, when the tool service is run, it will take a list of data values on port my_file_list. As an example, say it has "hello", "hi" and "hola" is the three values in the list.
On the location where the tool service is run, it executes in a temporary directory - a different directory for each execution of the service. It is normally something like /tmp/usecase-2029778474741087696
Three files will be created in the temporary directory; those files contain the (in this example) three values the tool service received on port my_file_list. The files could be called
/tmp/usecase-2029778474741087696/tempfile.0.tmp containing hello
/tmp/usecase-2029778474741087696/tempfile.1.tmp containing hi
/tmp/usecase-2029778474741087696/tempfile.2.tmp containing hola
There will also be a file called my_input_list. That file will contain
/tmp/usecase-2029778474741087696/tempfile.0.tmp
/tmp/usecase-2029778474741087696/tempfile.1.tmp
/tmp/usecase-2029778474741087696/tempfile.2.tmp
The script of your tool service would normally read the contents of my_input_list line by line and do something with the contents of the listed file(s).
I have also seen some scripts that 'cheat' and iterate directly over tempfile*.tmp but that would be "a bad thing". The problem with that trick, is that if you want to add a second list of files to the tool service then the file my_input_list could contain
/tmp/usecase7932018053449784034/tempfile.4.tmp
/tmp/usecase7932018053449784034/tempfile.5.tmp
/tmp/usecase7932018053449784034/tempfile.6.tmp
as other temporary files were used for the other file list port.
I hope that helps
The tool service allows you to upload files - but if you are using the HPC through a job submission node, then you would have to modify your command line tool to then use the job file staging command to further push the files as part of the job. The files would be available in the current (temporary) directory of the specified tool script.
I would try to do it through the Tool service and not involve the beanshell - then you can keep your workflow simpler.
A good thing to remember is that you can write multiple shell commands in the box.
Similarly you would probably want to retrieve back the results so that you can process them further in the workflow (unless they are massive - in which case you should just output their remote filenames and send them in again to the next HPC job)
The exact commands to use for staging files and retrieving them depends on the HPC job submission system. Which one are you using?
Thanks for the input guys.
It was my misunderstanding of how Taverna uses the File list. All the files in the list are copied to the temp "sandbox" and are therefore available for use.
Another nice easy way is to zip the directory and pass the zipped files into an input port for the service. Then just unzip the files inside the command.
Thanks again

Any way to automate the process of opening a .mpp file and saving it as a .csv?

I need to find a way to automate the process when a user uploads a microsoft project file to a web application I already have created. The process will need to basically use the save as from project to save into a .csv file so I can use this to import the data to an SQL database (this is needed for custom reporting we already have set up using SQL). I need to automate this process because I will be receiving tons of project files, and if the process is automated the users will then be able to instantly see results.
Basically, is there any way to create or run an automated process that will save these project files as .csv files? Even if the csv files are not formatted correctly, I can find a way around that, just need to first get them into .csv files.
Thank you.
edit - the only way i could think of this is to follow the instructions listed below, but
I would then need to automate a process to open the file and hit save so this works... any other suggestions?
http://social.technet.microsoft.com/Forums/en-US/projectprofessional2010general/thread/eea4ca15-0a0b-4c07-9989-87536b961385/
edit 2 - also looking into ways using Microsoft.Office.Interop.MSProject but not finding any luck.
edit 3 0 now using mpxj - the only issue I am having is the following listed below. Converting their example to vb.
Private Shared Function ToEnumerable(ByVal javaCollection As Collection) As EnumerableCollection
Return New EnumerableCollection(javaCollection)
End Function
the error is with EnumberableCollection - visual studio is not picking it up as a valid type - anything I am doing wrong or should substitute?
If you aren't wedded to using MS Project itself to extract data from the project files, you could consider using the MPXJ library. This would allow you to write a simple utility to open the MPP files you are given, extract the data items you are interested in, and write them directly to your database (or an intermediate CSV file, as required). MPXJ comes in Java and .Net flavours, so you can use your preferred language to do the work.
Jon
p.s. Disclaimer: I maintain MPXJ

Watch folder for files being Read

I am trying to watch files in a directory to determine when files are opened/accessed. I thought FileSystemWatcher would do the trick using the event Changed.
Problem is that some applications do not create a lock on the file they open/access or change either the date modified or date accessed (even after fsutil behavior set disablelastaccess 0). Notepad for example. Apparently is makes a copy of the file in memory and plays with it there until you save it. Nor does it update the Date Accessed.
How can I monitor a directory of files and be notified when a file is simply opened/accessed by any program (e.g. Notepad)? Files may be opened from another computer, not necessarily on the computer running the "watcher".
I found lots of similar questions but did not see one focusing on file "access".
This is quite normal. Updating an existing file is quite dangerous since it can cause irretrievable data loss. A disk error (like disk full) while writing is very bad news. The common algorithm used:
rename the original file
write a new file using the original name
no error: delete the renamed file
error: delete the new file, rename original file back
Clearly this doesn't cause a Changed event to be raised, no file was changed.
Sorry, I didn't read the question well enough. There is no notification whatsoever for an app just opening a file for reading. FSW can only detect changes to the file system. There is no ready alternative either, this requires a custom file system filter driver that snoops on driver requests. Like the kind that SysInternals' ProcMon utility uses. I'm not aware of such a driver ready for use in a C# program, you can't write them in C# either. This just isn't a common requirement.

Recommended document structure. File Wrappers? Roll my own?

I'm currently working out the best structure for a document I'm trying to create. The document is basically a core data document that uses sqlite as its store, but uses the Apple provided NSPersistentDocument+FileWrapperSupport to enable file wrapper support.
The document makes heavy use of media, such as images, videos, audio files, etc. with potentially 1000s of files. So what I'm trying to do is create a structure similar to the following:
/myfile.ext/
/myfile.ext/store.sqlite
/myfile.ext/content/
/myfile.ext/content/images/*
/myfile.ext/content/videos/*
/myfile.ext/content/audio/*
Now, first of all I went down the route of creating a temporary directory and placing all of my media in there. Basically creating the paths and file names '/content/images/image1.jpg' as I wanted them to appear in the saved file wrapper, and then upon save I attempted to copy these all into the filewrapper...
What I found was that the files were indeed copied into the wrapper with the file structure I wanted, but when the actual wrapper was saved, these files all magically disappeared.
Great.
So, I trashed my existing solution and tried to use file wrappers instead. This solution involved creating a content wrapper file directory when a new document was created, or loading in a content directory file wrapper upon opening a document.
When an image was added/modified, I created the necessary directory wrappers inside this root content wrapper (i.e. an images directory wrapper if it didn't already exist, or any other intermediary directory wrappers that needed to be created) and then created a regular file wrapper for the media, removing any existing wrapper for that file name if one was there.
Saving the document was just a case of making sure the content file wrapper was added to the document file wrapper, and the document would save.
Well... it did. The first time. However, any attempts to make any subsequent changes i.e add an image, save. Then replace image, save. Did not behave as expected, only showing the image from the first save.
So, my question is... first of all, which of the above approaches is the correct one, if at all, and what am I doing that wrong for them to fail.
And secondly, as I expect to be managing 1000s of images, is using file wrappers the correct way to go about things at all.
With that much media in play, you should likely give your users control over whether the media resides in the document or only a reference to the media is included in the document, and the media resides elsewhere, such as in a library/repository managed by your application. Then they could save out a (potentially many times larger) copy with all references resolved.
You might want to zip/unzip any directory so that users don't get confused trying to attach the document to an email. I believe iWork has been doing this with its document bundles for a while now.
As far as what you are doing wrong, no-one can say, as you haven't provided any code demonstrating what you are doing.
Why don't you create a one-off application that lets you select files on disk and saves those files in a document using a file wrapper? This would let you tackle this functionality without any interference from other issues in your application. Once you understand how to use file wrappers, you can port the code back or just write new code that works.