I wrote a windows service that compares large uncompressed images to their much smaller thumbnails to determine which images need new thumbnails or don't have thumbnails at all... I am using the Created Date of the files to determine which ones require updating (if an uncompressed image has a created date larger than the thumbnail then the thumbnail is out of date).
Everything is working great, my only issue is when I save new versions of the thumbnails over their existing ones... At first I was only doing a simple Bitmap.Save but when overwriting this would only change the Modified Date of the file. I added in a File.Delete() prior to saving the new version and it deletes the old version, saves a new one (as it should), but the Created Date of the new file is STILL THE OLD CREATED DATE...
I deleted every old thumbnail, waited a few minutes and then ran the creation code again, brand new create dates... Is there some timeframe Windows stores the file data in memory and perhaps its recognizing identical file names and giving the new files the old Created Date???
According to the documentation for the File.SetCreationTime(String, DateTime) Method,
NTFS-formatted drives may cache file meta-info, such as file creation time, for a short period of time. As a result, it may be necessary to explicitly set the creation time of a file if you are overwriting or replacing an existing file.
However, if you want to be cautious, rename the original file, say by putting ".old" at the end. That way, it would have to create a new directory entry for the new file. Then, if something goes horribly wrong, there is still the .old copy of it (until you delete that).
Related
I have two client programs which are using S3 to communicate some information. That information is a list of files.
Let's call the clients the "uploader" and "downloader":
The uploader does something like this:
upload file A
upload file B
upload file C
upload a SUCCESS marker file
The downloader does something lie this:
check for SUCCESS marker
if found, download A, B, C.
else, get data from somewhere else
and both of these programs are being run periodically. The uploader will populate a new directory when it is done, and the downloader will try to get the latest versions of A,B,C available.
Hopefully the intent is clear — I don't want the downloader to see a partial view, but rather get all of A,B,C or skip that directory.
However, I don't think that works, as written. Thanks to eventual consistency, the uploader's PUTs could be reordered into:
upload file B
upload a SUCCESS marker file
upload file A
...
And at this moment, the downloader might run, see the SUCCESS marker, and assume the directory is populated (which it is not).
So what's the right approach, here?
One idea is for the uploader to first upload A,B,C, then repeatedly check that the files are stored, and only after it sees all of them, then finally write the SUCCESS marker.
Would that work?
Stumbled upon similar issue in my project.
If the intention is to guarantee cross-file consistency (between files A,B,C) the only possible solution (purely within s3) is:
1) to put them as NEW objects
2) do not explicitly check for existence using HEAD or GET request prior to the put.
These two constraints above are required for fully consistent read-after-write behavior (https://aws.amazon.com/about-aws/whats-new/2015/08/amazon-s3-introduces-new-usability-enhancements/)
Each time you update the files, you need to generate a unique prefix (folder) name and put this name into your marker file (the manifest) which you are going to UPDATE.
The manifest will have a stable name but will be eventually consistent. Some clients may get the old version and some may get the new one.
The old manifest will point to the old “folder” and the new one will point the new “folder”. Thus each client will read only old files or only new files but never mixed, so cross file consistency will be achieved. Still different clients may end up having different versions. If the clients keep pulling the manifest and getting updated on change, they will eventually become consistent too.
Possible solution for client inconsistency is to move manifest meta data out of s3 into a consistent database (such as dynamo db)
A few obvious caveats with pure s3 approach:
1) requires full set of files to be uploaded each time (incremental updates are not possible)
2) needs eventual cleanup of old obsolete folders
3) clients need to keep pulling manifest to get updated
4) clients may be inconsistent between each other
It is possible to do this single copies in S3. Each file (A B C) will have prepended to it a unique hash or version code [e.g. md5sum generated from the concatenation of all three files.]
In addition the hash value will be uploaded to the bucket as well into a separate object.
When consuming the files, first read the hash file and compare to the last hash successfully consumed. If changed, then read the files and check the hash value within each. If they all match, the data is valid and may be used. If not, the downloaded files should be disgarded and downloaded again (after a suitable delay)..
This will catch the occassional race condition between write and read across multiple objects.
This works because the hash is repeated in all objects. The hash file is actually optional, serving as a low-cost and fast short cut for determining if the data is updated.
I am working on File Management System exactly like Dropbox in Cocoa.
My problem is when i edit any text file at that time NSFileSystemFileNumber is changed.
I want an unique NSFileSystemFileNumber even if that edited file is moved from the particular folder.
In short, I just want to know how to fetch that moved file's old or original path from the database.
Any alternate way to solve out this problem?
Thanks in Adv..!!
It depends on how the editor save functionality is implemented. Each editor will have different functionality and it sounds like the one you are using does the following:
Delete existing file.
Create new file.
Write file data.
Hence you get a new inode each time. Others might:
Truncate existing file.
Write file data.
which would result in the same inode each time.
There is nothing you can about this so you will need to track file changes using the name or something, not the inode.
Imagine there are 3 or more independent locations where a file can be modified. These locations communicate to each other through email or mail (direct flash drive restoration). Though there is a big room for flow - to make simultaneous editing to the file and screw up things, this client won't change too much. He rather call everyone that he is working on the last update or tell the other guys that he is waiting for third guy's last update. Anyway, at some point after several exchanges, due to one of participants unintentional error THE LAST VERSION of the file eventually gets mixed up. From this point everyone searches for the last version BY LOOKING THE CONTENT of the file.
This client wants to have a central location (he has actually, that is his PC's some location) and let everybody (including himself) copy any new or suspected new file to this location but prevent file's last version being copied. From this location he has to easily copy, send or open the file and work.
So, here is my concept (2 steps):
step 1: I made an ad to the main application where this file is created or edited. This ad prompts the user to give a version number to the file with every invoked save command from the editing application. In fact the file can be re-saved multiple times but not considered modified (file attributes creation, save etc. do not have great meaning here). This said the user can cancel my ad-in but have saved the file, not saving a new file version.
step 2: multiple solutions:
solution A: I'm thinking to have a folder/file watch and prevent the last version of the file being overwritten. As you know, FileSystemWatcher will fire the change/delete etc., events AFTER FACT so, I have to back copy overwritten file after the fact (w/ some tricks).
solution B: have a database to store all version of files and built-in some shell extension to extract/view files from the database. Move all copied/pasted files to the database (my program folder) and restore latest file in working folder after watcher fires change/delete event.
solution 3: find out built-in windows tools (API etc.) to greatly rely on it with some programming.
Any ideas?
Thanks in advance.
The question is open, but I could not find any answers to it on the web. hope it's not a copy:
I have to load some data in my app from a Json file. Every 3 seconds, all objects loaded previously are deleted and then reloaded (re allocated completely). I would like the app first to check if there has been a change, and only if there has been, reload the entire data, or even better, reload just the new data. how can I check if the file has been modified? Can I for example get the date from dropbox and check the date for updated versions of the file?
Ignoring the check for specific data changes, you can store hash / checksum of the file when you read it, and before reloading it compare the checksum / hash with the stored one. No change => no reload.
According to this discussion, you can get the metadata for a file on Dropbox and check the rev field in the metadata. (Stands for revision.) So you can keep track of the rev in your program and only try to reload the file when it changes.
http://forums.dropbox.com/topic.php?id=48787
Beware, there is an older metadata for revision, but it is deprecated. (Also covered in the discussion.)
On my site a user may upload a file (pic, zip, audio, video, whatever). He then may decide to replace it with a newer revision. This user may upload a file, make a post then decide to put up a new revision replacing the old (lets say its a large zip or tar.gz file). Theres a good chance people may be downloading it if he sent out an email or even im for the home user.
Problem. I need to replace the file and people may be downloading and it may be some minutes before it is deleted. I dont want my code to stall until i cant delete or check every second to see if its unused (especially bad if another user can start and he takes long creating a cycle).
How do i delete the file while users are downloading the file? i dont care if they stop i just care that the file can be replaced and new downloads are the new revision.
What about referencing the files indirectly?
A mapping script, maps a virtual file entry from your site to a real file . If the user wants to upload a new revision of his file you just update the mapping, not the real file.
You can install a daily task that scans all files and deletes all files without a mapping and without open connections.
lajuette's answer is right, the easiest solution is to work around the file locking altogether:
When a user uploads file foo.zip, internally store it as foo-v1.zip.
Create a mapping file somewhere (database, code, whatever) that maps foo.zip to foo-v1.zip.
Rather than exposing a direct link to the file, expose a link to a service that gets the file: mysite.com/Download?foo.zip or something. This service uses the mapping to determine which version of the file to send to the client.
When a new version is uploaded, create foo-v2.zip and update the mapping file.
It wouldn't be that hard to write a scheduled task that cleans up old, un-mapped files.
If your oppose to a database and If the filenames are in a fix format (such as user/id.ext) you could append the id with a revision number and enumerate the folder using a pattern (user/id-*) and use the latest revision.