I have a question about Inter-process-communication in operating systems.
Can 2 processes communicate with each other by both processes opening the same file (which say was created before both processes, so both processes have the file handler) and then communicating by writing into this file?
If yes, what does this method come under? I have heard that 2 major ways of IPC is by shared-memory and message-passing. Which one of these, this method comes under?
The reason, I am not sure if it comes under shared-memory is that, because this file is not mapped to address space of any of these processes. And, from my understanding, in shared-memory, the shared-memory-region is part of address space of both the processes.
Assume that processes write into the file in some pre-agreed protocol/format so both have no problem in knowing where the other process writes and when etc. This assumption is to merely understand. In real world though, this may be too stringent to hold true etc.
If no, what is wrong with this scenario? Is it that if 2 different processes open the same file, then the changes made by 1st process are not flushed into persistent storage for others to view until the process terminates? or something else?
Any real world example from Windows and Linux should also be useful.
Thanks,
Using a file is a kind of shared memory. Instead of allocating a common memory buffer in RAM, a common file is used.
To successfully manage the communication some kind of locking mechanism for different ranges in the file is needed. This could either be locking of ranges provided by the file system (available at least on Windows) or global operating system mutexes.
One real-world scenario where disk storage is used for inter-process-communication is the quorom disk used in clusters. It is a common disk resource, accessible over a SAN by all cluster nodes, that stores the cluster's configuration.
The posix system call mmap does mappings of files to virtual memory. If the mapping is shared between two processes, writes to that area in one process will affect other processes. Now coming to you question, yes a process reading from or writing to the underlying file will not always see the same data that the process that has mapped it, since the segment of the file is copied into RAM and periodically flushed to disk. Although I believe you can force synchronization with the msync system call. Do read up on mmap(). It has a host of other memory sharing options.
Related
I have a system that runs windows via a USB stick (it's a proprietary machine). This type of machine is commonly powered off by 'pulling the plug'. There is no way around it, that is how it is operated.
We occasionally have drive corruption on the USB stick, or at least corruption in the directory that we write things into. Is there really any software solution to get around this problem other than 'write as little/infrequently as possible'?
It's a windows machine and the applications that write are typically written in Java/C# if that is useful to anyone. The corruption typically shows up as a write directory or the parent of a write directory that can no longer be accessed due to the corruption. The only way to deal with it is to delete it via command line and start over.
Is there any way to programmatically deal with such a scenario, to perhaps restore a previous state of the memory as opposed to deleting and starting anew?
I don't feel as though there is any way to prevent this type of thing from happening given our current design. If you do enough writes and keep pulling the plug you are eventually going to get a corruption and that's just facts. Especially in this design. Even if the backup batteries are charged, if the software doesn't shutdown gracefully within the battery's discharge time, the corruptions could still occur. Not to mention as gravitymixes said above its going to damage hardware eventually which we have seen before.
A system redesign needs to considered for this project as a whole. Some type of networked solution comes to mind immediately where data is sent off the volatile machine to be logged on a machine with a more reliable power source over a reliable network connection with writing to the disk on the actual volatile machine as a last ditch effort if network comms are not reliable at a given point in time (backfill). I feel like this would increase hardware life as well. Of course the problem of network reliability then becomes your problem.
This is a sentence in the PowerPoint of my system lecture, but I don't understand why context switch invalidates the MMU. I know it will invalidate the cache since the cache contains information of another process. However, as for MMU, it just maps virtual memory to physical memory. If context switch invalidates it, does this mean the MMU use different mechanism of mapping in different processes?
Does this mean the MMU use different mechanism of mapping in different processes?
Your conclusion is essentially right.
Each process has its mapping from virtual to physical addresses (called context).
The address 0x401000 for example can be translated to 0x01234567 for process A and to 0x89abcdef for process B.
Having different contexts allows for an easy isolation of the processes, easy on demand paging and simplified relocation.
So each context switch must invalidate the TLB or the CPU would continue using the old translations.
Some pages however are global, meaning that they have the same translation independently of the current process address space.
For example the kernel code is mapped in the same way for every process adn thus doesn't need to be remapped.
So in the end only a part of the TLB is invalidated.
You can read how Linux handles the process address space for a real example of applied theory.
What you are describing is entirely system specific.
First of all, what they are probably referring to is invaliding the MMU cache. That assume the MMU has a cache (likely these days but not guaranteed).
When a context switch occurs, the processor has set put the MMU in a state where leftovers from the previous process would screw up the new process. If it did not, the cache would map the new process's logical pages to the old process's physical page frames.
For example, some processors use one page table for the system space and one or more other page tables for the user space. After a context switch, it would be ideal for the processor to invalidate any caching of the user space page tables but leave any caching of the system table table alone.
Note that in most processors all of this is done entirely behind the scenes. Even OS programmers do not need to deal with (or even be aware of) any flushing or invalidation of the MMU. There is a single switch process context instruction that handles everything. Other processors require the OS programmer to handle additional tasks as part of a context switch which, in some oddball processors, includes explicitly flushing the MMU cache.
my app saves a 1MB file and then another app reads it back. After that I want to sercure delete it. I thought about a ram drive because I know that even with a secure delete appl. something would remain on HDD or SSD. I can accept to lose the content of that file on shutdown. The fact is that I read about some bugs in some ram disk applications bug lists(ex.: imdisk) related to file corruption. Solved bugs but I'm wondering if ram disk apps are secure from file integrity point of view. On the other hand neither a normal disk is 100% secure. My temp file is absolutely important for me. I also protect my file through a sha1 or similar, but let's suppose for a moment that there is no protection, just to understand what is the best solution.
Thanks
Pupillo
What storage place is best certainly depends on the hardware involved, amongst others their age, their MTTF and any previous failures encountered.
I don't think it is possible to give a general answer.
Sounds to me like you are looking for an IPC mechanism, like shared memory.
This would also avoid using file systems and their -- imo very rare -- bugs.
If you think about file corruptions you should also think what will happen on crashes of the involved applications.
So you might have multiple problems:
IPC
Persistency on crashes
Security concerncs, e.g. others reading the involved sections of RAM/HD
Let's say I have a 10 MB file and go through these steps:
Open it in my favorite programming language for Read/Write
Erase everything in the stream
Write exactly 10 MB of random back to the same stream
Save the changes to disk
Delete the file through normal means
Can I be certain that the new 10 MB successfully overwrote the old 10 MB on a sector level in the hard drive? Or is it possible that the "erase everything in the stream" step deletes the old file and potentially writes the new 10 MB in a new location?
The data may still be accessible by a professional who knows what they're doing and can access the raw data on the disk (i.e. without going through the filesystem).
Your program is basically equivalent to the Linux shred command, which contains the following warning:
CAUTION: Note that shred relies on a very important assumption:
that the file system overwrites data in place. This is the traditional
way to do things, but many modern file system designs do not satisfy this
assumption. The following are examples of file systems on which shred is
not effective, or is not guaranteed to be effective in all file system modes:
log-structured or journaled file systems, such as those supplied with
AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)
file systems that write redundant data and carry on even if some writes
fail, such as RAID-based file systems
file systems that make snapshots, such as Network Appliance's NFS server
file systems that cache in temporary locations, such as NFS
version 3 clients
compressed file systems
There's other situations as well, such as SSDs with wear leveling.
no, since on any modern file system commits are atomic, you can be almost 100% certain the 10Mb did not overwrite the old 10Mb, and that's before we consider journaled file systems that actually guarantee this.
Short answer: No.
This might depend on your language and OS. I have a feeling that the stream calls are passed to the OS and the OS then decides what to do, so I'd lean towards your second question being correct just to err on the safe side. Furthermore, magnetic artifacts will be present after a deletion which can still be used to recover said data. Even overwriting the same sectors with all zeros could leave behind the data in a faded state. Generally it is recommended to make several deletion passes. See here for an explanation or here for an open source C# file shredder.
For Windows you could use the SDelete command line utility which implements the Department of Defense clearing and sanitizing standard:
Secure delete applications overwrite a deleted file's on-disk data
using techiques that are shown to make disk data unrecoverable, even
using recovery technology that can read patterns in magnetic media
that reveal weakly deleted files.
Of particular note:
Compressed, encrypted and sparse are managed by NTFS in 16-cluster
blocks. If a program writes to an existing portion of such a file NTFS
allocates new space on the disk to store the new data and after the
new data has been written, deallocates the clusters previously
occupied by the file.
I realize this number will change based on many factors, but in general, when I write data to a hard-drive (e.g. copy a file), how long does it take for that data to actually be written to the platter after Windows says the copy is done?
Could anyone point me in the right direction to discover more on this topic?
If you are looking for a hard number, that is pretty much unknowable. Generally it is the order of a tens to a few hundred milliseconds for the data to start reaching the disk platters, but can be as high as several seconds in a large server disk array with RAID and de-duplication.
The flow of events goes something like this.
The application calls a function like fwrite().
This call is handled by the filesystem layer in your Operating System, which has to figure out what specific disk sectors are to be manipulated.
The SATA/IDE driver in your OS will talk to the hard drive controller hardware. On a modern PC, it typically uses DMA to feed the data to the disk.
The data sits in a write cache inside the hard disk (RAM).
When the physical platters and heads have made it into position, it will begin to transfer the contents of cache onto the platters.
Steps 3-6 may repeat several times depending on how much data is to be written, where on the disk it is to be written. Additionally, there is usually filesystem metadata that must be updated (e.g. free space counters), which will trigger more writes to the disk.
The time it takes from steps 1-3 can be unpredictable in a general purpose OS like Windows due to task scheduling, background threads, and your disk write is probably queued up with a few dozen other processes. I'd say it is usually on the order of 10-100msec on a typical PC. If you go to the Windows Resource Monitor and click the Disk tab, you can get an idea of the average disk queue length. You can use the Performance Monitor to produce more finely-controlled graphs.
Steps 3-4 are largely controlled by the disk controller and disk interface (SATA, SAS, etc). In the server world, you can be talking about a SAN with FC or iSCSI network switches, which impose their own latencies.
Step 5 will be controlled by they physical performance of the disk. Many consumer-grade HDD manufacturers do not post average seek times anymore, but 10-20msec is common.
Interesting detail about Step 5: Some HDDs lie about flushing their write cache to get better benchmark scores.
Step 6 will depend on your filesystem and how much data you are writing.
You are right that there can be a delay between Windows indicating that data writing is finished and the last data actually written. Things to consider are:
Device Manager, Disk Drive, Properties, Policies - Options for disabling Write Caching.
You might be better off using Direct I/O so that Windows does not save it temporarily in File Cache.
If your program writes the data, you can log what has been copied.
If you are sending the data over a network, you are likely to have no control of when the remote system has finished.
To see what is happening, you can set up Perfmon logging. One of my examples of monitoring:
http://www.roylongbottom.org.uk/monitor1.htm#anchor2