How can I determine if files in a "drop folder" are completely transfered - file-io

Remote clients will upload images (and perhaps some instructional files in specially formatted text) to a "drop folder." Once the upload is complete we need to begin processing these images. It would be an easy, but flawed, solution to just have a script automatically begin processing any files in the folder every few seconds (the files can be move out of the folder once processed); but problems would arise when attempting to process large images which are only partially transfered.
What are some tricks I can use to ensure the files are fully uploaded before processing them?
A few of my own thoughts:
The script can check the validity of the file; ie, a partial jpeg would result in an error and you could respond to that error in the script, this would be fairly CPU intensive though. Some files have special markers on the end, but I can't count on this, I'm not sure what formats I'll be dealing with.
I've heard of "file handles" but haven't really figured out the basics of what they are and how I can tell if there is a "file handle" on a particular file. Basically the FTP daemon (actually, I'm on Windows, so "service") would keep a "handle" on the file while it's being uploaded and you would know not to process that file. These are just a few of my thoughts but I'm not really sure if they will work or if there are better or more accepted ways of solving this problem.

If you have an server-side script upload system (PHP, ASP, JSP, whatever), you could instruct the script to call another script to process the files, or to create a flag-file indicating the upload is done, something like this.
If your server is Linux-based, you can use lsof to check if the file is open. As your ftp/script/cgi will close the file after upload completes, lsof will not show the file in the list.
If your server is Windows-based, you can use Process Explorer to list the open files.

By what method are your users uploading the images?

Related

How to make the uploaded file available for use after saving it with GetRandomFileName according to the FileHelpers example?

In the documentation sample code for how to deal with user uploaded files, they save it as a trusted filename for filestorage via GetRandomFileName, and a trusted filename for HTML display.
In the comments it says: "In most production scenarios, an anti-virus/anti-malware scanner API is used on the file before making the file available for download or for use by other systems."
Is that going to be before it is saved with a random filename or after? Because that is the point of saving it as a random filename, so that it doesn't get executed? And when the scanning is done, how is the file going to be made available? I guess the file just has to be renamed if it passes the scan or else deleted? If so, what is the proper way to get the original file extenstion? And do you know of any good scanners that are gratis that are popular to use?
I try to learn web development. Thanks for your time and help.
The renaming of the file here has nothing to do with the anti-virus protection. The files don't tend to execute themselves whatever their name is. Same with the virus scan: it's not for the server protection, it's for the users protection. If your server executes the binary it gets from the client, it's a security breach regardless of whether it's a virus or not.
The renaming here is probably done just to be able to store the duplicates. That being said, in the production scenarios you'll probably never store the incoming files as physical files on the FS. They usually go to the DB as blobs, so the name is not an issue.
This is just a sample app designed to teach how to work with binary streams and file controllers. Don't expect too much from it in terms of applicability to the real solutions.

Preventing other application from opening custom file vb.net

I have a text file. Now I have changed its file type from .txt to .abc. My VB.NET program loads the text into textboxes from that file. After changing the file type, however, other apps like NotePad and Word are able to open and read my .abc file.
Is there any way that only my application will be able to open/read from the file and no other app would be able to do so? What I mean is, suppose I have a PhotoShop document .psd file, no other app, rather that photoshop itself, can open it. How do I make my file unreadable by other apps?
There is no way to prevent an app that you don't develop from opening any file. The extensions are just there for helping us humans, and maybe a bit for the computer to know the default app you select for an extension.
Like you said, a .txt file can be opened by many many apps. You can open a .txt file with Notepad, Firefox, VSCode, and many others.
Same way, a .psd file can be opened by many many apps. You can open that .psd file with Photoshop, but also Notepad, Firefox, and VSCode, and probably the same apps as above.
The difference is which apps can read and understand the file.
In order to make a file not understandable by other apps, you need to make it into a format that cannot recognize, because you planned it "in secret".
Like Visual Vincent said above, you could encrypt the file in a way, or you can have a binary file, that basically only your app knows know to understand.
Since you dont own the app you want the file to be understood by, then you either have to accept that it can be opened by any app that can open files, or you can try to encrypt the file outside the app, or like zipping it with a password, and then decrypting or unzipping when you want to use it.
Firstly, any file can be read unless it is still open by a particular process or service. Even PhotoShop files can be 'read' by NotePad - try it!
So, an attempt at my first answer...
You can try a couple of methods to prevent opening the file, for instance, applying a file lock. As an example, SQL Server .mdf files are locked by the SQL Server service. This happens because the files are maintained in an open state, however; your application would have to remain running to keep these files open. Technically, though, the files can still be copied.
Another way is to set the hidden attribute for the file. This hides the file from the less savvy users, but it will be displayed if the user show's hidden files.
And my second answer: You refer to the format of files by saying only PhotoShop can read or write its own files (not true, but I know what you're saying).
The format of the file must be decided by yourself. You must determine how you are going to store the data that you output from your application. It looks like you have been attempting to write your application data into a text file. Perhaps you should try writing to binary files instead. Binary files, while not encrypted, as suggested by Visual Vincent in the comments to your question, still provide a more tailored approach to storing your data.
Binary files write raw binary data instead of humanised text. For instance, if you write an integer to the file it will appear as a string of four bytes, not your usual 123456789 textual format.
So, you really need to clarify what data you want to write to the file, decide on a set structure to your file (as you also have to be able to read it back in to your application) and then be able to write the information.

FTP Server Download Returns blank files

We have a process in place built on Excel VBA that uploads a file to FTP Server. On the other side, our client downloads it. Very randomly, they complain that the file they received is blank (the file name is the same though). We then check at our end and see that the file that was uploaded was never blank. So here comes the problem: we're always arguing whether it was our error or theirs.
I figured that there might be a couple of reasons behind it but I have a few questions to ask before coming to conclusions:
If, say, the file was never uploaded (a possibility), what happens when the client runs a download process at their end? Can that download process generate a blank file with the same name as our output file? It sounds impossible to me but since the client is following up on this issue, I have to ask this silly question.
How does the mechanism work - what are the steps that happen on FTP server the moment my process completes uploading the file? I sometimes see that as soon as I upload the file, a 0kb file is created and then a second later (or less) the file with right size appears? Could it be possible that their process is running right before this actual file creation?
Thank you in advance for your help!

FTP client sees a file that isn't there... How can I successfully delete/overwrite this "ghost" file?

So we have a client that creates "training packages" and then uploads them via ftp to their website. They create the training packages in PowerPoint, and then use some program to convert them into html/swf files and package them within a folder. When they upload, they use Filezilla, and just transfer the entire folder over. The folder is uniquely named, uses no spaces or special characters.
These files have uploaded fine for about a year. Recently, they've run into a problem. Whenever they try to upload training package folder, they are immediately presented with the "This file already exists, do you want to overwrite?" message. Except... the folder they're moving is brand new, and the file it's asking to overwrite DOESN'T EXIST. When they choose "Overwrite" the file looks like it transfers, but the file size is wrong, and the training package doesn't work correctly.
This happens with every training package they try to upload. It's not just a badly outputted package. Also, it's always the same file that has the problem--it's the main "player" for the training package, and though it contains different content for every package, it is the same file name (cplayer.swf) every time.
Things they've tried without success:
-Re-uploading the file again by itself, and overwriting
-Deleting the "bad" file and re-uploading the single file - Get the overwrite message again, even though the file DOES NOT EXIST.
-Renaming the file on the server and re-uploading the single file - Get the overwrite message.
-Renaming the single file locally within the package and uploading/renaming it - Won't let us rename because the file already exists.
-Used another FTP client - Same results as above, so not a client specific problem.
-Used a different FTP login - Same results as above, so not a permissions problem.
Other things of note:
-The file is small--it's not a time out problem. Plus, all other files upload fine, and some are a lot larger.
-They've emailed this file to me, and I've uploaded it successfully.
I am completely at my wits end. Does anyone have any ideas where I can at least troubleshoot a little further?
Thanks for the non-help, the downvote, and the general lack of response on what was a pretty serious issue for me.
In case anyone else has a similar problem, here's what was going on:
Virus software (specifically Malware Bytes) was blocking THIS ONE SINGLE FILE. All I had to do was exclude the folder that contained the file.

Monitoring a folder for a specific file

I have a program that uploads .txt or .rje files from a folder. Now when you put any other file format into the folder, like .jar, then the application crashes.
Now I cannot change the mechanics of the application, so I would like to know if there is a type of program/script that I can use that monitors the folder for any files that are non-txt/rje and then move them out of the folder once they are put there...
Is this possible using a script? (I do not want to use a .exe application to do this...not allowed to install 3rd party software onto the server this folder exists...)
Thank you
Your solution won't work as you have a race condition between the program doing the upload and the one doing the deletion. If upload runs first it still crashes.
The correct solution is to modify the upload program to cope with this scenario.
If that is not possible then the only safe work around would be to use a new folder to drop the files in, have a script run that constantly scans the folder and if a new file appears either move it to the processing folder or deletes it as appropriate.
(For the actual detection that's not my area of expertise but the simplest would be to have a bat file that just runs periodically (or even just runs once and loops with a wait, check, move, wait, check, move, etc) and processes everything in the folder when it runs).