Nancy/OWIN Service Fabric Microservice Writes Requests To Temp File - file-upload

I have a microservice, hosted in Service Fabric, that handles uploading files to blob storage. The microservice is implemented with Nancy and OWIN. When the request is over a certain size, something like a couple hundred KB maybe, the request gets written to disk in a temp directory. Occasionally these .tmp files fail to get cleaned up, and eat up the limited disk space on the SF Cluster VM.
I have not been able to find anything about requests automatically getting written to disk. And nothing in the code creates .tmp files. What could be generating these files: Service Fabric, Nancy, OWIN?

Nancy is doing this, it has something called "request stream switching" which, as you say, switches from a memory stream to a file based stream over a certain size to avoid being able to fill all the memory up by uploading a large (or neverending file).
They should get cleaned up after every request, I haven't see any reports of them not being for a long time (we've fixed bugs around this in the past), but if you want to disable it completely (and accept the potential issue above) you can use "StaticConfiguration.DisableRequestStreamSwitching" in your bootstrapper app startup to turn it off.

Related

Autodesk Forge - problems with very large .zip files

We allow our users to upload files to forge, but to our bucket (they don't need to create their own) as we're only using the model viewer. This means they need to upload to our server first.
The upload method uses the stream from the HttpContent (we're using WebAPI2) and sends it right on into the Forge API methods.
Well, it would, but I get this exception - Error getting value from 'WriteTimeout' on 'System.Net.Http.StreamContent+ReadOnlyStream'.
This means that the Forge API is checking the Write Timeout without checking CanWrite or CanTimeout. Have I found an API bug?
Copying to another stream is feasible but I can't use a debugger to test the file our client is reporting further problems with, because it's 1.1GB and my dev box runs out of memory.

NFS server receives multiple inotify events on new file

I have 2 machines in our datacenter:
The public server exposes part of the internal servers's storage through ftp. When files are uploaded to the ftp, the files in fact end up on the internal storage. But when watching the inotify events on the internal server's storage, i notice the file gets written in chunks, probably due to buffering at client side. The software on the internal server, watches the inotify events, to determine if new files have arrived. But due to the NFS manner of writing the files, there is no good way of telling when a file is complete. Is there a way of telling the NFS client to write files in only one operation, or is there a work around for this behaviour?
EDIT:
The events i get on the internal server, when uploading a file of around 900 MB are:
./ CREATE big_buck_bunny_1080p_surround.avi
# after the CREATE i get around 250K MODIFY and CLOSE_WRITE,CLOSE events:
./ MODIFY big_buck_bunny_1080p_surround.avi
./ CLOSE_WRITE,CLOSE big_buck_bunny_1080p_surround.avi
# when the upload finishes i get a CLOSE_NOWRITE,CLOSE
./ CLOSE_NOWRITE,CLOSE big_buck_bunny_1080p_surround.avi
of course, i could listen to the CLOSE_NOWRITE event, but reading inotify documentation says:
close_nowrite
A watched file or a file within a watched directory was closed, after being opened in read-only mode.
Which is not exactly the same as 'the file is complete'. The only workaround I see, is to use .part or .filepart files and move them, once uploaded, to the original filename and ignore the .part files in my storage watcher. Disadvantage is I'll have to explain this to customers, how to upload with .part. Not many ftp clients support this by default.
Basically, if you want to check when the write operations is completed, monitor the event IN_CLOSE_WRITE.
IN_CLOSE_WRITE gets "fired" when a file gets closed which was open for writing. Even if the file gets transferred in chunks, the FTP server will close the file only after the whole file has been transferred.

what's the performance impact causing from the large size of Apache's access.log?

If the logs file like access.log or error.log gets very large, will the large-size problem impact the performance of Apache running or user accessing? From my understanding, Apache doesn't read entire logs into memory, but just make use of filehandle to write. Right? If so, I don't have to remove the logs manually every time when it's large enough except for the filesystem issue. Please help and correct me if I'm wrong. Or is there any Apache Log I/O issue I'm supposed to take care when running it?
Thx very much
Well, i totally agree with you. Per my understanding apache access the log files using handlers and just append the new message at the end of the file. That's way a huge log file will not make the difference when has to do with writing to the file. But may be if you want to access the file or open it with a kind of logging monitoring tool then the huge size will slowdown the process of reading the file.
So i would suggest you to use log rotation to have an overall better end result.
This suggestion is directly form the apche web site.
Log Rotation
On even a moderately busy server, the quantity of information stored in the log files is very large. The access log file typically grows 1 MB or more per 10,000 requests. It will consequently be necessary to periodically rotate the log files by moving or deleting the existing logs. This cannot be done while the server is running, because Apache will continue writing to the old log file as long as it holds the file open. Instead, the server must be restarted after the log files are moved or deleted so that it will open new log files.
From the Apache Software Foundation site

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page

Can IIS serve Most Frequently Used static files from RAM?

Can I set up an IIS server so that it will cache the most frequently used static files (binary) from disk into RAM, and serve from RAM on request?
Update: mod_mem_cache in Apache Caching Guide seems to be what I'm looking for. Any equivalent thing in IIS?
Thanks.
Even if IIS isn't actually set up to perform caching on its own, for true static files that are only loaded from disk and sent over the wire (i.e. images, .css, .js), you'll likely end up using the in-memory file cache built into Windows itself. In Task Manager, you'll notice a "System Cache" metric in the Physical Memory section; that shows you how much space the OS is using for the cache. So, as long as you're talking true static files, adding explicit caching is unnecessary.
Edit:
For more details, here's a couple links about the Windows cache (you probably could find more with Google):
http://msdn.microsoft.com/en-us/library/aa364218(VS.85).aspx
http://support.microsoft.com/kb/895932
Here's a bit on IIS 6.0's file cache: http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/a0483502-c6da-486a-917a-586c463b7ed6.mspx?mfr=true. As David mentioned, IIS is likely doing this for you already.
IIS 7.0 Output Caching
IIS 6.0 file cache behavior is included in IIS 7.0 output caching. You can define your own rules if the default timeout seems too short. Kernel Caching takes advantage of OS caching.
IIS should be doing this already. In .Net this is what the output caching would do for you.
Set up a RAM Disk if you have lots of RAM
http://www.tweakxp.com/article37232.aspx links to a free one. Have your application copy the relevant files to that drive and set your wwwroot to point at that location.
This data is not safe between boots though.
Also I run a big IIS site and serve tons of static files. The windows file cache is fine and I get more problems on network latency. Time to first byte e.t.c. My disks are never bound. But ram disk will help if you have a known problem.
What Nate Bross said is probably the most reliable way to keep them in ram, assuming the RAM disk is dynamically created from a real disk somewhere at boot.
Additionally, you could set up an asp.net handler (*.ashx) for the files to use the cache built into ASP.Net. It would try to serve from the cache first and only load them if needed. This has the advantage of allowing you to easily expire the cache from the time to time if the file might occasionally change and allow IIS to re-claim that memory if it decides it needs it more for something else at the moment.
In ASP.NET:
Response.Cache.SetExpires(DateTime.Now.AddDays(1));
Response.Cache.SetCacheability(HttpCacheability.Public);
Response.Cache.SetValidUntilExpires(true);
This is to say, not that an ASP Solution is the best, but rather that IIS obeys the caching directives, an may opt to cache in RAM.
However, I believe you can signal IIS to cache a single file in the IIS management snap-in, if ASP/ASP.NET is not an option, by setting the content expiration 1 or more days in the future.
You may be able to setup a ram drive, and then move your files there and setup IIS virtual directory there.