Process synchronization - process

Factors designating a piece of code as critical section
As of I understand, process synchronization is employed using kernel data structures such as semaphores, to prevent concurrent access to the critical section of the code. In general I see definitions to be, "critical section is the piece of code that may access shared data (or) shared resources". So the questions are:
Shared data is a user space entity. Hence it is the responsibility of the user process to ensure consistency of its access.
I presume that concurrent access to resources by multiple processes is something that the kernel should take care. What sort of user level synchronisations are required there?
What are the factors by which a piece of code in user space program is decided to be a critical section.

You are mixing "kernel space/user space" with "critical section".
Kernel/User space only defines what kind of privilege does a process possess. If a thread is executing in user space it cannot access the physical memory directly. It has to go through kernel's virtual memory management.
On the other hand, critical section is part of the code that if executed by two processes in parallel could result in data corruption. This would be happening because of the fact that the code is accessing some shared resource.
Those two concept are independent. Critical section can be either in user space or kernel space. Some kind of synchronization is needed inorder for avoiding the corruption of a shared resource. Even in the case where two process/thread are running in kernel mode and want to access a shared resource, they need to apply some sort of synchronization mechanism(spinlocks or mutex).
I hope this explains helps.

Inter-process synchronization can be implemented by named synchronization objects. The Windws Synchronization Functions offer for example named mutexes and named semaphores.
See this answer for Linux.
Shared resources of a number of processes may for example be shared memory.
Using the term critical section the way done in the question is a bit misleading since
there are Critical Section Objects (Windows) dealing with thread synchronization.
I suspect you mean this more general since you explicitly note processes too.
However, any shared resource, be it shared memory or any other object shall be protected against concurrent access while being worked on.

Related

Whatefficient a simple way to lock access to specific resource in kotlin

We received an assignment where we have to create a distributed file system. This file-system should have multiple servers, each performing a certain function.
This question relates to the lock-server, which is used to prevent two people from writing to the same file at once. Every attempt to access a file generates a thread, that when finished provides access to the requested file. If a file that is not currently free is accessed, the thread should be BLOCKED until the lock is released. With JAVA I would probably just use the wait() and notify() methods, but these are not present in Kotlin (I know you can force them in by casting but it is frowned upon). Is there an elegant way to do this? We are not limited in what libraries we can use so if you know one that could fit I will gladly check it out. Right now the one I think would fit the most is the ReentrantLock, but I am looking for more possibilities.
I have also checked out this list: https://stackoverflow.com/a/35521983/7091281
But none of the ones listed seemed to fit - I specifically need to block the thread, while everything I find does the exact opposite.
BTW the different parts of the system are supposed to communicate via RMI. Also while we can go our own way, it is encouraged to use threads instead of coroutines. (we are supposed to work in JAVA but we were allowed to use kotlin and scala)
If you want to use pure Kotlin, you could leverage coroutines, and more specifically its Mutex for locking.
More info can be found at the Kotlin docs, regarding Shared Mutable State and Concurrency

What are the consequences for leaking resources?

Obviously, we should clean up after ourselves as a matter of principle. And those of us around before the Windows 2000 era know the pain that memory leaks inflict on users. But I'm curious as to what the consequences of leaking handles to other system resources might be.
This would be things like unclosed files or database connections. Really anything that would be IDisposable in .net. We're a Windows shop, but I would be interested in other OSes as well.
What arguments can I use to get team members to take this more seriously, or are there bigger fish to fry on modern systems?
Instead of thinking of resources as "things" to be "released", it's better to think of the acquisition of an IDisposable object a responsibility to be carried out. Many kinds of IDisposable objects ask outside entities to do things on their behalf until they notify those entities that their services are no longer needed; by so doing, they acquire a responsibility to ensure that those outside entities are in fact given such notice. When Dispose is called on an IDisposable, it can carry out its responsibility to notify anything whose services it had been using that those services are no longer required.
Objects can request notification if the system notices that they've been abandoned. Objects that receive such notification can then generally assume that their services are no longer needed, and notify anyone whose services that they had been using of that. This mechanism works okay in some cases, but it should not be considered reliable, since a variety of factors may prevent the system from noticing that an object has been effectively abandoned.
As for the consequence of failing to call Dispose, it's very simple: things that were supposed to happen as a consequence of an object's services no longer being required, won't happen. If an object was supposed to notify other objects or entities that their services are no longer required, and they were in turn supposed to notify other objects or entities, none of those notifications will happen.
Except in a few cases where code will be using a managed resource for the life of the program, and the OS can be relied upon to recognize termination of the program as an indication that the program no longer needs its services, it will generally be easier to call Dispose on things that are no longer needed, whether or not they really "care", than to try to identify the cases where major problems would be caused by the resulting failures of entities to notify everything that care that their services are no longer needed.
It really depends on what the resource is.
Some resources almost only affect your own process. Open file handles are limited within your process, but you don't much affect the overall system by leaking them. Cleaning them up is important if you have a long-running process such as a server or a GUI application, but for a run-once job, it's not that important. When your process shuts down, these resources get cleaned up anyway.
Some resources affect other processes. Databases typically have connection limits that are quite low (sometimes due to licensing restrictions). If you don't properly close your connections when you're done with them, you will run out very quickly. In addition, open connections use up resources of the database server, thus potentially slowing it down for all users. In addition, such resources may not be reclaimed on process shutdown either, because the OS is not aware of them; rather, the connections might eventually time out on the server, but that may be considerably longer than your process is running.

How implement go style channels (CSP) with objective-c?

I wonder how create a CSP library for obj-c, that work like Go's channels/goroutines but with idiomatic obj-c (and less boilerplate than actual ways).
In other languages with native courutines and/or generators is possible to model it easily, but I don't grasp how do the same with the several ways of do concurrent programing in obj-c (plus, the idea is have "cheap" threads).
Any hint about what I need to do?
I would look at the State Threads library as it implements roughly the same idea which underlies the goroutine switching algorythm of Go: a goroutine surrenders control to the scheduler when it's about to sleep in a syscall, and so the ST library wraps OS-level file descriptors to provide their own FD-like objects which can be read from (and/or written to) but instead of blocking the whole process these operation transfer control to other light-weight threads managed by the library.
Then you might need a scheduler more advanced than that of the ST library to keep OS threads busy running your SPs. A no-brainer introduction to the Go 1.2 scheduler is here, and it contains a link to a more hard-core design document. The rest is in the Go's source code.
See also this answer on SO.
Create operations, e.g. for an example consider this process:
process x takes number from east, transforms it to a string, and gives it to west.
That I could model it with an object that keeps an internal state of x (consisting of number and string) and the following operations:
east-output, operation defined somewhere else by east process logic
x-input, operation that depends on east-output. It copies number from east-output's data structure into x's data structure
x-output, operation that depends on x-input. Its content is defined as purely internal transformation - in our example, stringWithFormat...
west-input, operation that depends on x-output, etc.
Then you dump the operations into NSOperationQueue and see what happens (does it work, or are there contradicting dependencies...)

Distributed Objects + Grand Central Dispatch

Not a specific question as such, I'm more trying to test the waters. I like distributed objects, and I like grand central dispatch; How about I try to combine the two?
Does that even make sense? Has anyone played around in these waters? Would I be able to use GCD to help synchronize object access across machines? Or would it be better to stick to synchronizing local objects only? What should I look out for? What design patterns are helpful and what should I avoid?
as an example, I use GCD queues to synchronize accesses to a shared resource of some kind. What can I expect to happen if I make this resource public via distributed objects? Questions like: How nicely to blocks play with distributed objects? Can I expect to use the everything as normal across machines? If not, can I wrangle it to do so? What difficulties can I expect?
I very much doubt this will work well. GCD objects are not Cocoa objects, so you can't reference them remotely. GCD synchronization primitives don't work across process boundaries.
While blocks are objects, they do not support NSCoding, so they can't be transmitted across process boundaries. (If you think about it, they are not much more than function pointers. The pointed-to function must have been compiled into the executable. So, it doesn't make sense that two different programs would share a block.)
Also, Distributed Objects depends on the connection being scheduled in a given run loop. Since you don't manage the threads used by GCD, you are not entitled to add run loop sources except temporarily.
Frankly, I'm not even sure how you envision it even theoretically working. What do you hope to do? How do you anticipate it working?
Running across machines -- as in a LAN, MAN, or WAN?
In a LAN, distributed objects will probably work okay as long as the server you are connecting to is operational. However, most programmers you meet will probably raise an eyebrow and just ask you, "Why didn't you just use a web server on the LAN and just build your own wrapper class that makes it 'feel' like Distributed Objects?" I mean, for one thing, there are well-established tools for troubleshooting web servers, and it's easier and often cheaper to hire someone to build a web service for you rather than a distributed object server.
On a MAN or WAN, however, this would be slow and a very bad idea for most uses. For that type of communication, you're better off using what everyone else uses -- REST-like APIs with HTTPS/HTTP, sending either XML, JSON, or key/value data back and forth. So, you could make a class wrapper that makes this "feel" sort of like distributed objects. And my gut feeling tells me that you'll need to use tricks to speed this up, such as caching chunks of data locally on the client so that you don't have to keep fetching from the server, or even caching on the server so that it doesn't have to interact with a database as often.
GCD, Distributed Objects, Mach Ports, XPC, POSIX Message Queues, Named Pipes, Shared Memory, and many other IPC mechanisms really only make the most sense on local, application to application communication on the same computer. And they have the added advantage of privilege elevation if you want to utilize that. (Note, I said POSIX Message Queues, which are workstation-specific. You can still use a 'message queue service' on a LAN, MAN, or WAN -- there are many products available for that.)

How to save a program's progress, and resume later?

You may know a lot of programs, e.g some password cracking programs, we can stop them while they're running, and when we run the program again (with or without entering a same input), they will be able to continue from where they have left. I wonder what kind of technique those programs are using?
[Edit] I am writing a program mainly based on recursion functions. Within my knowledge, I think it is incredibly difficult to save such states in my program. Is there any technique, somehow, saves the stack contents, function calls, and data involved in my program, and then when it is restarted, it can run as if it hasn't been stopped? This is just some concepts I got in my mind, so please forgive me if it doesn't make sense...
It's going to be different for every program. For something as simple as, say, a brute force password cracker all that would really need to be saved was the last password tried. For other apps you may need to store several data points, but that's really all there is too it: saving and loading the minimum amount of information needed to reconstruct where you were.
Another common technique is to save an image of the entire program state. If you've ever played with a game console emulator with the ability to save state, this is how they do it. A similar technique exists in Python with pickling. If the environment is stable enough (ie: no varying pointers) you simply copy the entire apps memory state into a binary file. When you want to resume, you copy it back into memory and begin running again. This gives you near perfect state recovery, but whether or not it's at all possible is highly environment/language dependent. (For example: most C++ apps couldn't do this without help from the OS or if they were built VERY carefully with this in mind.)
Use Persistence.
Persistence is a mechanism through which the life of an object is beyond programs execution lifetime.
Store the state of the objects involved in the process on the local hard drive using serialization.
Implement Persistent Objects with Java Serialization
To achieve this, you need to continually save state (i.e. where you are in your calculation). This way, if you interrupt the probram, when it restarts, it will know it is in the middle of calculation, and where it was in that calculation.
You also probably want to have your main calculation in a separate thread from your user interface - this way you can respond to "close / interrupt" requests from your user interface and handle them appropriately by stopping / pausing the thread.
For linux, there is a project named CRIU, which supports process-level save and resume. It is quite like hibernation and resuming of the OS, but the granularity is broken down to processes. It also supports container technologies, specifically Docker. Refer to http://criu.org/ for more information.