What are the options and constancy of storing file data in NTFS? - ntfs

I know it has the $DATA attribute, but heard that it's not always include the whole file content due to some circumstances.
I've also heard that delayed write operations could make this attribute not fully accurate for indicating the file content.
So - what are the variations/possible structures for holding a file content in NTFS, and what is the constancy according to them?

File data can be stored in $DATA attributes (in unnamed stream, the default one), and/or in alternative data streams (ADS, named $DATA attributes). Another option to store additional metadata about file are extended attributes (in NTFS it's in attribute $EA). Each attribute is a data stream that is managed in kernel by Cache Manager (Cc* API in Windows kernel realm) and Memory Manager (Mm* API). Whole $DATA attribute can be either inside MFT file record, or externalized to disk clusters. Nevertheless, if you use only user mode API, you can disregard all this, system always gives you accurate data.

Related

How can I ensure that ASP.NET Core's IFormFile stream doesn't read more than what's specified in the file's Content-Length?

I have an API endpoint for uploading large files, streaming then directly to DB. I use ASP.NET Core's IFormFeature to do this, calling IFormFile.OpenReadStream() to get a Stream that I pass to SqlClient for streaming.
I want to enforce a a maximum file size to avoid abuse. I know IFormFile has a Length property, but I assume that is based on Content-Length or similar and can not be trusted (please correct me if I'm wrong, but AFAIK the only way to be 100% sure about the file size is to actually read the data; the client could send an incorrect Content-Length.)
I must therefore ensure that when the stream is read, it does not read more than what is specified in IFormFile.Length (ideally it should throw if it encounters additional bytes). I have not found a way to do this. Is this possible, or is there perhaps a better way to ensure the server doesn't read enormous amounts of data from clients sending incorrect Content-Length headers?
(It should go without saying that this must not entail reading the entire file into memory.)

CQRS and DDD: File uploads

I'm new to the concept of DDD and CQRS and can't find a final solution how to upload images, or files in general, in a clean way.
Imagine the following scenario:
In an online portal there is a support request formular where a file (image in specific) could be attached to.
The posted data will raise a CreateSupportRequestCommand. Then the required aggregates will be loaded and changed.
I have three ideas to solve this, but I'm not very satisfied with them.
Way 1:
1. Post all data including the image (multipart) in a single request
2. Create a FileUploadCommand, which is returning the FileUploadId.
3. After that create a CreateSupportRequestCommand and pass the FileUploadId with the root data in the constructor.
Drawback: A single request will trigger two commands. In terms of CQRS one user interaction should be only one command.
Way 2:
1. Post the image to a seperate endpoint, create a temporary file and return the id or a file handle.
2. Post the formular with the attached tempfile id.
3. Invoke the CreateSupportRequestCommand with all root data including a file handle which points to the physical file.
4. Inside the command persist the tempfile into a FileUpload aggregate (by FileUploadRepository) then
5. Create the SupportRequest aggregate, assign the FileUploadId and persist.
Drawback: I handle 2 aggregates in the same command. Creating a support request is not responsible for uploading the file.
Way 3:
1. Post the image to a seperate endpoint, create a temporary file and return the id or a file handle.
2. Post the formular with the attached tempfile id.
3. Invoke the CreateSupportRequestCommand with all root data including a file handle which points to the physical file.
4. Only persist the root data to the SupportRequest aggregate. Raise a SupportRequestCreatedEvent and attach the file handle.
5. Inside the event process and assign the file handle.
Drawback: The SupportRequestCreatedEvent should not really care about a file handle.
Is there a better way to solve this?
I do not think handling File upload is a Domain Concern. The file metadata like FileContentId may be part of your domain but not the actual file upload. I would perform the file operation before the CommandHandler is executed. Probably in a middleware or perhaps before queing up the Command onto the message bus.
CreateSupportRequestCommandHandler would then only be invoking an operation like CreateSupportRequest on your aggrerate (say SupportRequest). Within that CreateSupportRequest method you will have all your business rule pretaining to the operation. SupportRequest then eventually would be saved in your repository.

Implementing basic S3 compatible API with akka-http

I'm trying to implement the file storage ыукмшсу with basic S3 compatible API using akka-http.
I use s3 java sdk to test my service API and got the problem with the putObject(...) method. I can't consume file properly on my akka-http backend. I wrote simple route for the test purposes:
def putFile(bucket: String, file: String) = put{
extractRequestEntity{ ent =>
val finishedWriting = ent.dataBytes.runWith(FileIO.toPath(new File(s"/tmp/${file}").toPath))
onComplete(finishedWriting) { ioResult =>
complete("Finished writing data: " + ioResult)
}
}
}
It saves file, but file is always corrupted. Looking inside the file I found the lines like these:
"20000;chunk-signature=73c6b865ab5899b5b7596b8c11113a8df439489da42ddb5b8d0c861a0472f8a1".
When I try to PUT file with any other rest client it works as fine as expected.
I know S3 uses "Expect: 100-continue" header and may it he causes problems.
I really can't figure out how to deal with that. Any help appreciated.
This isn't exactly corrupted. Your service is not accounting for one of the four¹ ways S3 supports uploads to be sent on the wire, using Content-Encoding: aws-chunked and x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD.
It's a non-standards-based mechanism for streaming an object, and includes chunks that look exactly like this:
string(IntHexBase(chunk-size)) + ";chunk-signature=" + signature + \r\n + chunk-data + \r\n
...where IntHexBase() is pseudocode for a function that formats an integer as a hexadecimal number as a string.
This chunk-based algorithm is similar to, but not compatible with, Transfer-Encoding: chunked, because it embeds checksums in the stream.
Why did they make up a new HTTP transfer encoding? It's potentially useful on the client side because it eliminates the need to either "read your payload twice or buffer [the entire object payload] in memory [concurrently]" -- one or the other of which is otherwise necessary if you are going to calculate the x-amz-content-sha256 hash before the upload begins, as you otherwise must, since it's required for integrity checking.
I am not overly familiar with the internals of the Java SDK, but this type of upload might be triggered by using .withInputStream() or it might be standard behavor for files too, or for files over a certain size.
Your minimum workaround would be to throw an HTTP error if you see x-amz-content-sha256: STREAMING-AWS4-HMAC-SHA256-PAYLOAD in the request headers since you appear not to have implemented this in your API, but this would most likely only serve to prevent storing objects uploaded by this method. The fact that this isn't already what happens automatically suggests that you haven't implemented x-amz-content-sha256 handling at all, so you are not doing the server-side payload integrity checks that you need to be doing.
For full compatibility, you'll need to implement the algorithm supported by S3 and assumed to be available by the SDKs, unless the SDKs specifically support a mechanism for disabling this algorithm -- which seems unlikely, since it serves a useful purpose, particularly (it appears) for streams whose length is known but that aren't seekable.
¹ one of four -- the other three are a standard PUT, a web-based html form POST, and the multipart API that is recommended for large files and mandatory for files larger than 5 GB.

How to register a Property Handler on folders?

I built a virtual filesystem (not a namespace extension) for Windows which acts as a frontend of our document management server consisting of files and folders. In order to be able to display some metadata of the DMS objects in Windows Explorer as additional selectable columns, I successfully provided properties to the Windows Property System by implementing a COM Property Handler. Wheras normal property handlers focus on specific file types for which they feel responsible, my Property Handler adds properties to all files regardless of their type. Because Property Handlers can only be registered on the file type level, I registered my handler for about 30 types under
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\PropertySystem\PropertyHandlers\<.Extension>
However, I did not manage to register the Property Handler for folder objects. Since all objects in our file system are virtual I build the property store (IPropertyStore) by implementing IInitializeWithFile instead of IInitializeWithStream. The properties are requested from our DMS with the path of IInitializeWithFile acting as key and were not read from an objects content. This concept would work for folders as well.
For getting called on folders I tried to associate the handler by registering under different well known identifiers like Folder, Directory, AllFileSystemObjects and * instead of the file extension without success.
I also didn’t find anything in the MSDN documentation regarding this aspect.
Is there a way to register a Windows Property Handler on folders? Or is there some other way to add custom columns to folders in Windows Explorer?
I'm not sure if it is possible to do this.
Property handlers are clearly not the right approach, they are system wide and there can only be one per file extension. They should only be implemented by the software that "owns" the file extension and can parse the file to extract properties.
The old column handlers would have been your best bet (IMHO) but they are officially dead and you already said you can't use them.
Have you considered creating a namespace extension? Either as a root item somewhere (Desktop or My Computer) the way My Documents used to work in 2000/XP or maybe something more along the lines of how OneDrive works?
I'm not sure if desktop.ini files work in the root of a drive but it might be worth looking into. You would then find yourself in the poorly documented land of [.ShellClassInfo] and its CLSID, CLSID2 and UICLSID members. The general idea would be to act as a IShellFolder proxy on top of the "real" IShellFolder so you could create a multiplex property store. I think there are some (undocumented?) property keys you can override to change the folders default columns and tooltips as well.
There is also something called a delegated folder that allows you to play with nested PIDLs but the documentation is once again pretty useless so I'm not sure if this is something worth looking into.
A 3rd option is to pretend to be a cloud storage provider. I don't know if this gets you any closer to your goal and you would still have to implement some NSE bits to get to the point where you can layer yourself on top of the underlying IShellFolder. This feature is rather new and only documented to work on Windows 10.
The inner workings of how Explorer/IShellBrowser is connected to the IShellFolder/IShellView is one of the least documented parts of Windows. There are hundreds of undocumented interfaces. Explorer gives DefView special treatment leaving other 3rd-party implementations out in the cold.
My feeling is that there is no clean solution to implement this on top of a drive letter but you might get lucky, if Raymond Chen drops by he might have some tips for you...

Uploading a file via Jaxax REST Client interface, with third party server

I need to invoke a remote REST interface handler and submit it a file in request body. Please note that I don't control the server. I cannot change the request to be multipart, the client has to work in accordance to external specification.
So far I managed to make it work like this (omitting headers etc. for brevity):
byte[] data = readFileCompletely ();
client.target (url).request ().post (Entity.entity (data, "file/mimetype"));
This works, but will fail with huge files that don't fit into memory. And since I have no restriction on filesize, this is a concern.
Question: is it somehow possible to use streams or something similar to avoid reading the whole file into memory?
If possible, I'd prefer to avoid implementation-specific extensions. If not, a solution that works with RESTEasy (on Wildfly) is also acceptable.
ReastEasy as well as Jersey support InputStream out of the box so simply use Entity.entity(inputStream, "application/octet-stream"); or whatever Content-Type header you want to set.
You can go low-level and construct the HTTP request using a library such as the plain java.net.URLConnection.
I have not tried it myself but there is example code which reads a local file and writes it to the request stream without loading it into a byte array.
Upload files from Java client to a HTTP server
Of course this solution requires more manual coding but it should work (unless java.net.URLConnection loads the whole file into memory)