How could I save all resources received into disk in phantomjs? - phantomjs

I want to use page.onResourceReceived event handler in phantomjs to save all resources received into disk, how could I accomplish this? Are there any methods to save all resources received into disk in phantomjs?
I think This question is not related to cache: if use cache, then how to parse the cached files and extract their data?


Can I trust aws-cli to re-upload my data without corrupting when the transfer fails?

I extensively use S3 to store encrypted and compressed backups of my workstations. I use the aws cli to sync them to S3. Sometimes, the transfer might fail when in progress. I usually just retry it and let it finish.
My question is: Does S3 has some kind of check to make sure that the previously failed transfer didn't leave corrupted files? Does anyone know if syncing again is enough to fix the previously failed transfer?
Individual files uploaded to S3 are never partially uploaded. Either the entire file is completed and S3 stores the file as an S3 object, or the upload is aborted and S3 object is never stored.
Even in the multi-part upload case, multiple parts can be uploaded but they never form a complete S3 object unless all of the pieces are uploaded and the "Complete Multipart Upload" operation is performed. So there is no need worry about corruption via partial uploads.
Syncing will certainly be enough to fix the previously failed transfer.
Yes, looks like AWS CLI does validate what it uploads and takes care of corruption scenarios by employing MD5 checksum.
The AWS CLI will perform checksum validation for uploading and downloading files in specific scenarios.
The AWS CLI will calculate and auto-populate the Content-MD5 header for both standard and multipart uploads. If the checksum that S3 calculates does not match the Content-MD5 provided, S3 will not store the object and instead will return an error message back the AWS CLI.

osxfuse: how to clean cache?

How to clean cache in OSXFUSE? I'm using it to mount custom remote filesystem. And OSXFUSE caching file content. It is great to fast access for the same file - but sometimes I need to re-read file content from remote server.
I am not sure, that you can clean this cache. But you can disable UBC (Unified Buffer Cache) by specifying 'noubc' in mount options (see OSXFUSE mount options)
It will disable caching your responses to 'read' calls and osxfuse will perform calls into your filesystem without using cache.

where does Jetty save files on upload temporarily?

I have a very basic question about file upload with embedded Jetty.
When I upload a large file, I know that Jetty server buffers it somewhere. Where does this buffer exist?
Also, Is there a way to disable the buffer and stream the request data directly to a destination such as HDFS?
If we can't disable this buffering, at least I need to have a control on deleting the tmp file after the file upload is complete.
Thanks in advance for the help.
That's defined by the servlet spec location value found in the MultipartConfigElement, typically declared as an #MultipartConfig annotation on the servlet you have receiving the uploaded files.

NFS server receives multiple inotify events on new file

I have 2 machines in our datacenter:
The public server exposes part of the internal servers's storage through ftp. When files are uploaded to the ftp, the files in fact end up on the internal storage. But when watching the inotify events on the internal server's storage, i notice the file gets written in chunks, probably due to buffering at client side. The software on the internal server, watches the inotify events, to determine if new files have arrived. But due to the NFS manner of writing the files, there is no good way of telling when a file is complete. Is there a way of telling the NFS client to write files in only one operation, or is there a work around for this behaviour?
The events i get on the internal server, when uploading a file of around 900 MB are:
./ CREATE big_buck_bunny_1080p_surround.avi
# after the CREATE i get around 250K MODIFY and CLOSE_WRITE,CLOSE events:
./ MODIFY big_buck_bunny_1080p_surround.avi
./ CLOSE_WRITE,CLOSE big_buck_bunny_1080p_surround.avi
# when the upload finishes i get a CLOSE_NOWRITE,CLOSE
./ CLOSE_NOWRITE,CLOSE big_buck_bunny_1080p_surround.avi
of course, i could listen to the CLOSE_NOWRITE event, but reading inotify documentation says:
A watched file or a file within a watched directory was closed, after being opened in read-only mode.
Which is not exactly the same as 'the file is complete'. The only workaround I see, is to use .part or .filepart files and move them, once uploaded, to the original filename and ignore the .part files in my storage watcher. Disadvantage is I'll have to explain this to customers, how to upload with .part. Not many ftp clients support this by default.
Basically, if you want to check when the write operations is completed, monitor the event IN_CLOSE_WRITE.
IN_CLOSE_WRITE gets "fired" when a file gets closed which was open for writing. Even if the file gets transferred in chunks, the FTP server will close the file only after the whole file has been transferred.

How to detect that a file is being uploaded over FTP

My application is keeping watch on a set of folders where users can upload files. When a file upload is finished I have to apply a treatment, but I don't know how to detect that a file has not finish to upload.
Any way to detect if a file is not released yet by the FTP server?
There's no generic solution to this problem.
Some FTP servers lock the file being uploaded, preventing you from accessing it, while the file is still being uploaded. For example IIS FTP server does that. Most other FTP servers do not. See my answer at Prevent file from being accessed as it's being uploaded.
There are some common workarounds to the problem (originally posted in SFTP file lock mechanism, but relevant for the FTP too):
You can have the client upload a "done" file once the upload finishes. Make your automated system wait for the "done" file to appear.
You can have a dedicated "upload" folder and have the client (atomically) move the uploaded file to a "done" folder. Make your automated system look to the "done" folder only.
Have a file naming convention for files being uploaded (".filepart") and have the client (atomically) rename the file after upload to its final name. Make your automated system ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for an example of implementing this approach.
Also, some FTP servers have this functionality built-in. For example ProFTPD with its HiddenStores directive.
A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, that the file is incomplete.
Some FTP servers allow you to configure a hook to be called, when an upload is finished. You can make use of that. For example ProFTPD has a mod_exec module (see the ExecOnCommand directive).
I use ftputil to implement this work-around:
connect to ftp server
list all files of the directory
call stat() on each file
wait N seconds
For each file: call stat() again. If result is different, then skip this file, since it was modified during the last seconds.
If stat() result is not different, then download the file.
This whole ftp-fetching is old and obsolete technology. I hope that the customer will use a modern http API the next time :-)
If you are reading files of particular extensions, then use WINSCP for File Transfer. It will create a temporary file with extension .filepart and it will turn to the actual file extension once it fully transfer the file.
I hope, it will help someone.
This is a classic problem with FTP transfers. The only mostly reliable method I've found is to send a file, then send a second short "marker" file just to tell the recipient the transfer of the first is complete. You can use a file naming convention and just check for existence of the second file.
You might get fancy and make the content of the second file a checksum of the first file. Then you could verify the first file. (You don't have the problem with the second file because you just wait until file size = checksum size).
And of course this only works if you can get the sender to send a second file.