Process Synchronization using a flag - process

I am learning operating systems through an online course and I came across some software solutions for Process Synchronization. The teacher is explaining all software solutions starting from using a single turn variable upto Peterson's solution.
I have a doubt in the most basic approach. Please refer to the attached screenshot from the course video for clarity. The approach is to use a single turn variable in case of two processes and store 1 or 2 in turn depending on which process wants to access the critical section. This approach guarantees mutual exclusion but does not satisfy the progress requirement because if turn is 1 initially and P2 wants to enter the critical section first then it will be simply blocked waiting in the while loop even though P1 is not in the critical section. My idea is to initiate turn as -1 and now proceed. No process will be blocked depending on other then.
I have checked multiple online courses but no one is discussing this simple change and rather moving on to advanced solutions like Petersons algorithm or semaphores. Am I thinking right? Is my approach correct?

What you're describing here is called "strict alternation". Your proposal to modify the algorithm so that turn is initially -1 won't work. In your example, if turn is not equal to 2, then process 1 will not block. If turn is not equal to 1, then process 2 will not block. Initially, if turn is -1, then neither process 1 nor process 2 will block. As a result, both can execute their critical sections at the same time. You no longer have the mutual exclusion property.

Related

UML activity diagrams: the meaning of <<iterative >>

I want to check what is the definition of «iterative» in expansion regions in activity diagrams. For me personally this was never a question because I understand it as letting me do a For loop, e.g.,
For i=1 to 10
Do-Something // So it does it 10 times
End For
However, while I was presenting my UML diagram to an audience, an engineer team leader (not a UML maven) objected against the term ‘iterative’, because he understood ‘iterative’ to mean an 'iterative process' such that each step improves a result. I am also aware of this definition, but I assume the UML definition is not that, but rather means a simple For-Loop.
Please confirm that the UML definition of «iterative» and iteration is like a simple For-loop. Or otherwise, if so.
No, it has a different meaning. UML 2.5 states in p. 480:
The mode of an ExpansionRegion controls how its expansion executions proceed.
If the value is iterative, the expansion executions must occur in an iterative sequence, with one completing before another can begin. The first expansion execution begins immediately when the ExpansionRegion starts executing, with subsequent executions starting when the previous execution is completed. If the input collections are ordered, then the expansion executions are sequenced in the order induced by the input collection. Otherwise, the order of the expansion executions is not defined.
Other values for this keyword are parallel and stream. You can guess that behavior defined in a parallel region can be executed in parallel. stream is a bit more complicated and you might read on that page in the UML spec.
The for-loop itself comes from the input collection you pass to the region. This can be processed in either of the above ways.
tl;dr
So rather than a for loop the keyword «iterative» for the region tells that it's behavior may not be handeled in parallel.
Ahhh, semantics...
First a disclaimer - I am not a native English speaker. Yet my believe both my level of English and IT experience are sufficient to answer this question.
Let's have a look at the dictionary definition of iterative first:
iterative adjective
/ˈɪtərətɪv/
/ˈɪtəreɪtɪv/, /ˈɪtərətɪv/
​(of a process) that involves repeating a process or set of instructions again and again, each time applying it to the result of the previous stage
We used an iterative process of refinement and modification.
an iterative procedure/method/approach
The highlight with a script font is mine.
Of course this is a pure word definition, not in the context of software development.
In real life a process can quite easily be considered repetitive but in itself not really iterative. Imagine an assembly line in a mass production factory. On one of the positions a particular screw/set of screws is applied to join two or more elements. For every next run, identical set of elements the same type and number of screws is applied. There is a virtually endless stream of similar part sets, each set consisting of the same type of parts as previously and requiring the same kind of connection. From the position perspective joining the elements is a repetitive process but it is not iterative, as each join is applied to a different set of elements - it does not apply to those already joined.
If you think of a code, it's somewhat different though. When applying a loop, almost always you have some sort of a resulting set impacted by it and one can argue that with every loop step that resulting set is being further changed, meaning the next loop step is applied on the result of the previous step. From this perspective almost every loop is iterative.
On the other hand, you can have a loop like that:
loop
wait 10
while buffer is empty
read buffer
You can clearly say it is a loop and nothing is being changed. All the code does is waiting for a buffer to fill. So it is not iterative.
For UML specifically though the precise meaning is included in qwerty_so's answer so I will not repeat it here.

Extended Events - read vs read-write apps

As part of extended events I track rpc_completed and sql_batch_completed events. It is to catch all the queries hitting the db so I have a better understanding of the traffic. Based on that I would like to split application into two groups: read and read-write. The read ones I could move later to the read-only AG nodes for the performance reasons. My question is what is the best way to do that? I can see the Wrties field which at the first sight looks like what I am after. I have done some tests and it looks fine, i can see that for the insert/delete/update statements, the value in this column is greater than 0 when for selects it is 0.
Do you know of any pitfalls of fully depending on that field? Could you recommend another way of dealing with that task?
Update:
Do you know what is the definition of the writes field for sql_batch_completed or rpc_completed?
I couldn't find it. Is it the same as for profiler? Per this thread Making sense of the number of reads/writes in SQL Profiler it is:
Writes: Number of logical write I/Os issued by the user during the connection.
Kind regards,
Rafal

Erlang ETS tables versus message passing: Optimization concerns?

I'm coming into an existing (game) project whose server component is written entirely in erlang. At times, it can be excruciating to get a piece of data from this system (I'm interested in how many widgets player 56 has) from the process that owns it. Assuming I can find the process that owns the data, I can pass a message to that process and wait for it to pass a message back, but this does not scale well to multiple machines and it kills response time.
I have been considering replacing many of the tasks that exist in this game with a system where information that is frequently accessed by multiple processes would be stored in a protected ets table. The table's owner would do nothing but receive update messages (the player has just spent five widgets) and update the table accordingly. It would catch all exceptions and simply go on to the next update message. Any process that wanted to know if the player had sufficient widgets to buy a fooble would need only to peek at the table. (Yes, I understand that a message might be in the buffer that reduces the number of widgets, but I have that issue under control.)
I'm afraid that my question is less of a question and more of a request for comments. I'll upvote anything that is both helpful and sufficiently explained or referenced.
What are the likely drawbacks of such an implementation? I'm interested in the details of lock contention that I am likely to see in having one-writer-multiple-readers, what sort of problems I'll have distributing this across multiple machines, and especially: input from people who've done this before.
first of all, default ETS behaviour is consistent, as you can see by documentation: Erlang ETS.
It provides atomicity and isolation, also multiple updates/reads if done in the same function (remember that in Erlang a function call is roughly equivalent to a reduction, the unit of measure Erlang scheduler uses to share time between processes, so a multiple function ETS operation could possibly be split in more parts creating a possible race condition).
If you are interested in multiple nodes ETS architecture, maybe you should take a look to mnesia if you want an OOTB multiple nodes concurrency with ETS: Mnesia.
(hint: I'm talking specifically of ram_copies tables, add_table_copy and change_config methods).
That being said, I don't understand the problem with a process (possibly backed up by a not named ets table).
I explain better: the main problem with your project is the first, basic assumption.
It's simple: you don't have a single writing process!
Every time a player takes an object, hits a player and so on, it calls a non side effect free function updating game state, so even if you have a single process managing game state, he must also tells other player clients 'hey, you remember that object there? Just forget it!'; this is why the main problem with many multiplayer games is lag: lag, when networking is not a main issue, is many times due to blocking send/receive routines.
From this point of view, using directly an ETS table, using a persistent table, a process dictionary (BAD!!!) and so on is the same thing, because you have to consider synchronization issues, like in objects oriented programming languages using shared memory (Java, everyone?).
In the end, you should consider just ONE main concern developing your application: consistency.
After a consistent application has been developed, only then you should concern yourself with performance tuning.
Hope it helps!
Note: I've talked about something like a MMORPG server because I thought you were talking about something similar.
An ETS table would not solve your problems in that regard. Your code (that wants to get or set the player widget count) will always run in a process and the data must be copied there.
Whether that is from a process heap or an ETS table makes little difference (that said, reading from ETS is often faster because it's well optimized and doesn't perform any other work than getting and setting data). Especially when getting the data from a remote node. For multple readers ETS is most likely faster since a process would handle the requests sequentially.
What would make a difference however, is if the data is cached on the local node or not. That's where self replicating database systems, such as Mnesia, Riak or CouchDB, comes in. Mnesia is in fact implemented using ETS tables.
As for locking, the latest version of Erlang comes with enhancements to ETS which enable multiple readers to simultaneously read from a table plus one writer that writes. The only locked element is the row being written to (thus better concurrent performance than a normal process, if you expect many simultaneous reads for one data point).
Note however, that all interaction with ETS tables is non-transactional! That means that you cannot rely on writing a value based on a previous read because the value might have changed in the meantime. Mnesia handles that using transactions. You can still use the dirty_* functions in Mneisa to squeeze out near-ETS performance out of most operations, if you know what you're doing.
It sounds like you have a bunch of things that can happen at any time, and you need to aggregate the data in a safe, uniform way. Take a look at the Generic Event behavior. I'd recommend using this to create an event server, and have all these processes share this information via events to your server, at that point you can choose to log it or store it somewhere (like an ETS table). As an aside, ETS tables are not good for peristent data like how many "widgets" a player has - consider Mnesia, or an excellent crash only db like CouchDB. Both of these replicate very well across machines.
You bring up lock contention - you shouldn't have any locks. Messages are processed in a synchronous order as they are received by each process. In fact, the entire point of the message passing semantics built into the language is to avoid shared-state concurrency.
To summarize, normally you communicate with messages, from process to process. This is hairy for you, because you need information from processes scattered all over the place, so my recommendation for you is based of the idea of concentrating all information that is "interesting" outside of the originating processes into a single, real-time source.

Restarting agent program after it crashes

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"
So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
lock_global_variable(balance,agent_machine_id);
///////////////////// **POINT A**
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
/////////////////// **POINT B**
dequeue(); // once transaction is complete, request can be dequeued
}
Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :
agent
{
look_queue(); // take a look at the leftmost request on queue without dequeuing
*atomic*
{
lock_global_variable(balance,agent_machine_id);
modify(balance,value);
unlock_global_variable(balance,agent_machine_id);
dequeue(); // once transaction is complete, request can be dequeued
}
}
I am looking for answers to these questions :
How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?
Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.
Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.
Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.
Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.
Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.
Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.
Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.
If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.

How to signal that a Use Case is over?

Let's say I am making a Use Case about filling a quiz. You have only 5 minutes to fill that quiz. When doing the Use Case for "Filling the Quiz", how should I signal there is a time limit and that after that the Use Case is finished? I simply write it by text or is there anything more formal to use?
Sketch of what my Use Case can be:
1. The Actor tells the System he's ready to start the quiz.
2. The System presents the Actor with the first question of the Quiz and its 4 possible answers and tells him how much time he has left.
3. The Actor tells the System what is his chosen answer (a number between 1 and 4).
Repeat steps 2-3 until there are no questions left.
4. The System registers the results of the quiz.
I could just put operations between all those shown above to check whenever the time left is over, but there is probably a better way to show this.
Thanks
You can use an alternative flow timeout case, like
Alternative Flow 1: Timeout
2. The System detects that ...
To be a bit more precise, but agree with the other answer in general, for this situations Alistair Cockburn (Writing effective Use Cases) recommends using extending use cases (alternate flows), which would have in their extension points the time limit. In textual form, you can easily use the range of numbers of lines of scenario lines, where the timeout can happen.