Where is the FileSystem logic located? - file-io

This might be a silly question but where is a filesystem logic code located? This question appeared while I was studying the implementation of filesystems in Linux:
I understand that different filesystems (ext4, xfs, etc) will create different structures in a storage disk. But how does the system know how to use those structures (for example, when I read from a file, how does it know the logic of: first check the inode of the file, then check the blocks associated with it...). Since different filesystems will have different structures, and so different file access logic, I'm guessing that there is some code somewhere which tells the OS how to read a file in a particular FileSystem. Where is that code?
My first guess would be that the logic code for each filesystem comes directly in the Linux Kernel. But this would mean that the Kernel only knows about the filesystems that the kernel developers put there (I guess) like ext4, xfs, FAT. What if someone wants to create their own filesystem? Do they have to update the kernel to include it?

Yes, the kernel has several filesystem drivers. There is also FUSE which allows userland implementation of filesystems.

Related

Is Dropbox considered a Distributed File System?

I was just reading this https://en.wikipedia.org/wiki/Clustered_file_system#Distributed_file_systems
The definition of a DFS seems to exactly describe Dropbox to me but it isn't in the list of examples, which of course it would be if it was one I think.
So what is different about Dropbox which makes it not fall into this category?
Usually, when talking about distributed file-systems, you expect properties that Dropbox doesn't support. For example, if you and I share a folder, I can create a file called "work.txt" in it and you can create a file "work.txt" in it, and if we do it fast enough (or when we're not syncing with dropbox) we'll have conflicting copies of the same file.
A similar example would be if we both edit the same file concurrently - we'll have conflicting copies, which is something a distributed file system should prevent. In the link you refer to, this is called "Concurrency transparency; all clients have the same view of the state of the file system".
Another example of a property dropbox doesn't support: if my computer fails (e.g., my hard-drive is corrupted) I might lose data that wasn't uploaded to Dropbox. There is a small window in which I think my data was written to the local disk, but if my computer fails, I lose that data.
Lastly, I'm not sure how Dropbox will operate with file locks. For example, MS office takes locks on .doc files, to ensure no one else is working on them at the same time. I don't think Dropbox supports this feature.
I've written a blog post about some of complexities of implementing a distributed file-system, you might find it helpful as well.

What kernel level operations are performed when editing a file?

Can anybody please explain to me what kernel level operations are performed, when a file is edited? The thing i'm confused with is that is it the case that a new inode is created every time a file is edited. Please explain the steps, if possible. I have searched the internet, but no satisfactory answers there.
Thanks in advance.
There's no single general answer, because this depends on what the application does when it's editing the file, what system it's running on, and what the file is stored on. It might be creating new temporary files, or clobbering and rewriting the original file, or using memory mapping, or using versioned filesystem features, or doing network file system operations, etc etc.
Instead of trying to answer this in the abstract, pick an open source editor you're interested in, and read through its source code and debug it to understand what it in particular is doing. Then if you have questions, you can read the API docs to figure out what kernel operations the functions it's calling map to or rely on.

Objective-C - Finding directory size without iterating contents

I need to find the size of a directory (and its sub-directories). I can do this by iterating through the directory tree and summing up the file sizes etc. There are many examples on the internet but it's a somewhat tedious and slow process, particularly when looking at exceptionally large directory structures.
I notice that Apple's Finder application can instantly display a directory size for any given directory. This implies that the operating system is maintaining this information in real time. However, I've been unable to determine how to access this information. Does anyone know where this information is stored and if it can be retrieved by an Objective-C application?
IIRC Finder iterates too. In the old days, it used to use FSGetCatalogInfo (an old File Manager call) to do this quickly. I think there's a newer POSIX call for that these days that's the fastest, lowest-level API for this, especially if you're not interested in all the other info besides the size and really need blazing speed over easily maintainable code.
That said, if it is cached somewhere in a publicly accessible place, it is probably Spotlight. Have you checked whether the spotlight info for a folder includes its size?
PS - One important thing to remember when determining the size of a file: Mac files can have two "forks", the data fork, and the resource fork (where e.g. Finder keeps the info if you override a particular file to open with another application than the default for its file type, and custom icons assigned to files). So make sure you add up both forks' sizes, or your measurements will be off.

Software configuration management tool for hundreds of binary files, many are large

Note: I've tried searching, Stackoverflows near useless. I am not sure what kind of tool I need.
At my organization we need to keep track of the software configuration for many types of computers including the binary installers and automation scripts. Change is infrequent but the size of latest version of the configuration is several gigs.
We are trying to use Mercurial to store changes but it is just too slow, even without many revisions at all. I did an hg status but killed it after it took 10 minutes without finishing.
We are looking for a way to store the current configuration as well as having the old configurations there just in case. I have never done anything like this before and do not know what tools are available or even suitable for such tasks. Can someone point me in the right direction or tell me how the are solving this problem? Thanks
Since hard disk space is cheap and being able to view binary differences isn't very helpful, perhaps the best option you have is to store each configuration in a new directory that is indexed somehow. Example below:
/software/configs/2009-03-15
/software/configs/2009-09-28
/software/configs/2009-09-30
Given the size of your files and the infrequent number of changes, this would allow you to pick a configuration from a given 'tag' without the overhead of revision control.
If you pack your files into a single tar file and generate a SHA-512 hash, then you can be reasonably sure that no one has tampered with your files since they were archived.
While I don't know specific details about how to implement this strategy in mercurial, I have been working with git and git-fat. It sets up a general procedure that is likely to be feasible on mercurial as well. Basically the idea is whenever you add a binary file to the repository, under the hood, the repo creates a symlink to the file that is actually stored in another location as a checksummed object.
This allows large files to be tracked by the repo, without storing the actual data inside. It requires the data to be stored in some other location (perhaps in a binary management system).
It might take some configuration to do it in mercurial, but I think it's an elegantly simple solution.

How to run unmanaged executable from memory rather than disc

I want to embed a command-line utility in my C# application, so that I can grab its bytes as an array and run the executable without ever saving it to disk as a separate file (avoids storing executable as separate file and avoids needing ability to write temporary files anywhere).
I cannot find a method to run an executable from just its byte stream. Does windows require it to be on a disk, or is there a way to run it from memory?
If windows requires it to be on disk, is there an easy way in the .NET framework to create a virtual drive/file of some kind and map the file to the executable's memory stream?
You are asking for a very low-level, platform-specific feature to be implemented in a high-level, managed environment. Anything's possible...but nobody said it would be easy...
(BTW, I don't know why you think temp file management is onerous. The BCL does it for you: http://msdn.microsoft.com/en-us/library/system.io.path.gettempfilename.aspx )
Allocate enough memory to hold the executable. It can't reside on the managed heap, of course, so like almost everything in this exercise you'll need to PInvoke. (I recommend C++/CLI, actually, so as not to drive yourself too crazy). Pay special attention to the attribute bits you apply to the allocated memory pages: get them wrong and you'll either open a gaping security hole or have your process be shut down by DEP (i.e., you'll crash). See http://msdn.microsoft.com/en-us/library/aa366553(VS.85).aspx
Locate the executable in your assembly's resource library and acquired a pinned handle to it.
Memcpy() the code from the pinned region of the managed heap to the native block.
Free the GCHandle.
Call VirtualProtect to prevent further writes to the executable memory block.
Calculate the address of the executable's Main function within your process' virtual address space, based on the handle you got from VirtualAlloc and the offset within the file as shown by DUMPBIN or similar tools.
Place the desired command line arguments on the stack. (Windows Stdcall convention). Any pointers must point to native or pinned regions, of course.
Jump to the calculated address. Probably easiest to use _call (inline assembly language).
Pray to God that the executable image doesn't have any absolute jumps in it that would've been fixed up by calling LoadLibrary the normal way. (Unless, of course, you feel like re-implementing the brains of LoadLibrary during step #3).
Retrieve the return value from the #eax register.
Call VirtualFree.
Steps #5 and #11 should be done in a finally block and/or use the IDisposable pattern.
The other main option would be to create a RAMdrive, write the executable there, run it, and cleanup. That might be a little safer since you aren't trying to write self-modifying code (which is tough in any case, but especially so when the code isn't even yours). But I'm fairly certain it will require even more platform API calls than the dynamic code injection option -- all of them requiring C++ or PInvoke, naturally.
Take a look at the "In Memory" section of this paper. Realize that it's from a remote DLL injection perspective, but the concept should be the same.
Remote Library Injection
Creating a RAMdisk or dumping the code into memory and then executing it are both possible, but extremely complicated solutions (possibly more so in managed code).
Does it need to be an executable? If you package it as an assembly, you can use Assembly.Load() from a memory stream - a couple of trivial lines of code.
Or if it really has to be an executable, what's actually wrong with writing a temp file? It'll take a few lines of code to dump it to a temp file, execute it, wait for it to exit, and then delete the temp file - it may not even get out of the disk cache before you've deleted it! Sometimes the simple, obvious solution is the best solution.
This is explicitly not allowed in Vista+. You can use some undocumented Win32 API calls in XP to do this but it was broken in Vista+ because it was a massive security hole and the only people using it were malware writers.