Go - How to know when an output channel is done - input

I tried to follow Rob Pike's example from the talk 'Concurrency is not parallelism' and did something like this:
I'm starting many go routines as workers that read from an input channel, perform some processing and then send the result through the output channel.
Then I start another go routine that reads data from some source and send it to the workers through their input channel.
Lastly I want to iterate over all of the results in the output channel and do something with them.
The problem is that because the work is split between the workers I don't know when all of the workers have finished so I can stop asking the output channel for more results, and my program could end properly.
What is the best practice to know when workers have finished sending results to an output channel?

I personally like to use a sync.WaitGroup for that. A waitgroup is a synchronized counter that has three methods - Wait(), Done() and Add(). What you do is increment the the waitgroup's counter, pass it to the workers, and have them call Done() when they're done. Then you just block on the waitgroup on the other end and close the output channel when they're all done, causing the output processor to exit.
Basically:
// create the wait group
wg := sync.WaitGroup{}
// this is the output channel
outchan := make(chan whatever)
// start the workers
for i := 0; i < N; i++ {
wg.Add(1) //we increment by one the waitgroup's count
//the worker pushes data onto the output channel and calls wg.Done() when done
go work(&wg, outchan)
}
// this is our "waiter" - it blocks until all workers are done and closes the channel
go func() {
wg.Wait()
close(outchan)
}()
//this loop will exit automatically when outchan is closed
for item := range outchan {
workWithIt(item)
}
// TADA!

Please can I firstly clarify your terminology: a misunderstanding on the ends of channels could cause problems later. You ask about "output channels" and "input channels". There is no such thing; there are only channels.
Every channel has two ends: the output (writing) end, and the input (reading) end. I will assume that that is what you meant.
Now to answer your question.
Take the simplest case: you have only one sender goroutine writing to a channel, and you only have one worker goroutine reading from the other end, and the channel has zero buffering. The sender goroutine will block as it writes each item it until that item has been consumed. Typically this happens quickly the first time. Once the first item has passed to the worker, the worker will then be busy and the sender will have to wait before the second item can be passed over. So a ping-pong effect follows: either the writer or the reader will be busy but not both. The goroutines will be concurrent in the sense described by Rob Pike, but not always actually executing in parallel.
In the case where you have many worker goroutines reading from the channel (and its input end is shared by all of them), the sender can initially distribute one item to each worker, but then it has to wait whilst they work (similar to the ping-pong case described above). Finally, when all items have been sent by the sender, it has finished its work. However, the readers may not, yet, have finished their work. Sometimes we care that the sender finishes early, and sometimes we don't. Knowing when this happens is most easily done with a WaitGroup (see Not_a_Golfer's answer and my answer to a related question).
There is a slightly more complex alternative: you can use a return channel for signalling completion instead of a WaitGroup. This isn't hard to do, but WaitGroup is preferred in this case, being simpler.
If instead the channel were to contain a buffer, the point at which the sender had sent its last item would happen sooner. In the limit case when the channel has one buffer space per worker; this would allow the sender to complete very quickly and then, potentially, get on with something else. (Any more buffering than this would be wasteful).
This decoupling of the sender allows a fully asynchronous pattern of behaviour, beloved of people using other technology stacks (Node-JS and the JVM spring to mind). Unlike them, Go doesn't need you to do this, but you have the choice.
Back in the early '90s, as a side-effect of work on the Bulk Synchronous Parallelism (BSP) strategy, Leslie Valiant proved that sometimes very simple synchronisation strategies can be cheap. The crucial factor is that there is a need for enough parallel slackness (a.k.a. excess parallelism) to keep the processor cores busy. That means there must be plenty enough other work to be done so that it really doesn't matter if any particular goroutine is blocked for a period of time.
Curiously, this can mean that working with smaller numbers of goroutines might require more care than working with larger numbers.
Understanding the impact of excess parallelism is useful: it is often not necessary to put extra effort into making everything asynchronous if the network as a whole has excess parallelism, because the CPU cores would be busy either way.
Therefore, although it is useful to know how to wait until your sender has completed, a larger application may not need you to be concerned in the same way.
As a final footnote, WaitGroup is a barrier in the sense used in BSP. By combining barriers and channels, you are making use of both BSP and CSP.

var Z = "Z"
func Loop() {
sc := make(chan *string)
ss := make([]string, 0)
done := make(chan struct{}, 1)
go func() {
//1 QUERY
slice1 := []string{"a", "b", "c"}
//2 WG INIT
var wg1 sync.WaitGroup
wg1.Add(len(slice1))
//3 LOOP->
loopSlice1(slice1, sc, &wg1)
//7 WG WAIT<-
wg1.Wait()
sc <- &Z
done <- struct{}{}
}()
go func() {
var cc *string
for {
cc = <-sc
log.Infof("<-sc %s", *cc)
if *cc == Z {
break
}
ss = append(ss, *cc)
}
}()
<-done
log.Infof("FUN: %#v", ss)
}
func loopSlice1(slice1 []string, sc chan *string, wg1 *sync.WaitGroup) {
for i, x := range slice1 {
//4 GO
go func(n int, v string) {
//5 WG DONE
defer wg1.Done()
//6 DOING
//[1 QUERY
slice2 := []string{"X", "Y", "Z"}
//[2 WG INIT
var wg2 sync.WaitGroup
wg2.Add(len(slice2))
//[3 LOOP ->
loopSlice2(n, v, slice2, sc, &wg2)
//[7 WG WAIT <-
wg2.Wait()
}(i, x)
}
}
func loopSlice2(n1 int, v1 string, slice2 []string, sc chan *string, wg2 *sync.WaitGroup) {
for j, y := range slice2 {
//[4 GO
go func(n2 int, v2 string) {
//[5 WG DONE
defer wg2.Done()
//[6 DOING
r := fmt.Sprintf("%v%v %v,%v", n1, n2, v1, v2)
sc <- &r
}(j, y)
}
}

Related

Limit input size in chars from stdin

I want to write an application in Rust that deals with input from terminal and I want to prevent it from crashing/being killed by running out of memory. It displays a prompt, processes a command and displays prompt again.
Basically I am looking for read_line_max(n) or read_until(delimiter, max_chars) API where at most n bytes or until delimiter is reached are read.
Possibilities I considered:
for io::stdin().lock().take(n).lines() takes n bytes total at most, I want unlimited chars but limit line size.
for io::stdin().lock().lines().take(n) limits number of lines
for line in io::stdin().lock().lines() {
let line = line?.chars().take(n);
println!("{}", respond_to(line));
}
too late, hogging 50GB of memory, kill imminent
I found out that BufReader used to have .chars() iterator that could be used in this way but it was removed.
With io::stdin().lock().read(buff) there is an issue with bytes vs chars but it may be my best bet. And then try to throw it into a String to check UTF-8 validity but that seems like something I would do in C and very unidiomatic.
Actually while writing this I quickly put together this thing:
let inp = io::stdin();
let mut bufinp = inp.lock();
let mut linebytes = [0_u8; 10];
loop {
match bufinp.read(&mut linebytes) {
Ok(bytes_read) => {
match String::from_utf8(linebytes[..bytes_read].to_vec()) {
Ok(line) => println!("processed line: {}", &line),
Err(err) => eprintln!("utf8 err: {:?}", err),
}
},
Err(err) => {
eprintln!("line read err: {:?}", err);
}
}
}
So... that kind of does what I want but I have some issues with it:
1) I need to trim the '\n' if input is smaller than buffer.
2) It doesn't clear rest of the stdin if input is larger than buffer. I'm guessing I need to put a skip_while() there at the end so that it doesn't spill over to the next read. Is there a nicer way to clear it?
3) It may split graphemes while I could in in fact handle those additional 3 bytes. I don't really care about reading up until specific hard limit. I just want to prevent the input/memory usage from being "too much".
4) It's just too low-level and complicated and not in line with "make good and safe choices easy to code and unsafe ones less available" which makes me think I'm not doing it right. But at least cat /dev/zero | ./target/debug/test doesn't result in SIGQUIT anymore.
I find it strange that a language that prides itself on safety wouldn't provide a fool-proof way to deal with potentially large input. Am I missing something or thinking too much about it? Every article I found just closes its eyes and fires read_to_end() or read_line() without much thought.
How should I read user input safely and idiomatically?

Embedded System - Polling

I have about 6 sensors (GPS, IMU, etc.) that I need to constantly collect data from. For my purposes, I need a reading from each (within a small time frame) to have a complete data packet. Right now I am using interrupts, but this results in more data from certain sensors than others, and, as mentioned, I need to have the data matched up.
Would it be better to move to a polling-based system in which I could poll each sensor in a set order? This way I could have data from each sensor every 'cycle'.
I am, however, worried about the speed of polling because this system needs to operate close to real time.
Polling combined with a "master timer interrupt" could be your friend here. Let's say that your "slowest" sensor can provide data on 20ms intervals, and that the others can be read faster. That's 50 updates per second. If that's close enough to real-time (probably is close for an IMU), perhaps you proceed like this:
Set up a 20ms timer.
When the timer goes off, set a flag inside an interrupt service routine:
volatile uint8_t timerFlag = 0;
ISR(TIMER_ISR_whatever)
{
timerFlag = 1; // nothing but a semaphore for later...
}
Then, in your main loop act when timerFlag says it's time:
while(1)
{
if(timerFlag == 1)
{
<read first device>
<read second device>
<you get the idea ;) >
timerflag = 0;
}
}
In this way you can read each device and keep their readings synched up. This is a typical way to solve this problem in the embedded space. Now, if you need data faster than 20ms, then you shorten the timer, etc. The big question, as it always is in situations like this, is "how fast can you poll" vs. "how fast do you need to poll." Only experimentation and knowing the characteristics and timing of your various devices can tell you that. But what I propose is a general solution when all the timings "fit."
EDIT, A DIFFERENT APPROACH
A more interrupt-based example:
volatile uint8_t device1Read = 0;
volatile uint8_t device2Read = 0;
etc...
ISR(device 1)
{
<read device>
device1Read = 1;
}
ISR(device 2)
{
<read device>
device2Read = 1;
}
etc...
// main loop
while(1)
{
if(device1Read == 1 && device2Read == 1 && etc...)
{
//< do something with your "packet" of data>
device1Read = 0;
device2Read = 0;
etc...
}
}
In this example, all your devices can be interrupt-driven but the main-loop processing is still governed, paced, by the cadence of the slowest interrupt. The latest complete reading from each device, regardless of speed or latency, can be used. Is this pattern closer to what you had in mind?
Polling is a pretty good and easy to implement idea in case your sensors can provide data practically instantly (in comparison to your desired output frequency). It does get into a nightmare when you have data sources that need a significant (or even variable) time to provide a reading or require an asynchronous "initiate/collect" cycle. You'd have to sort your polling cycles to accommodate the "slowest" data source.
What might be a solution in case you know the average "data conversion rate" of each of your sources, is to set up a number of timers (per data source) that trigger at poll time - data conversion rate and kick in the measurement from those timer ISRs. Then have one last timer that triggers at poll timer + some safety margin that collects all the conversion results.
On the other hand, your apparent problem of "having too many measurements" from the "fast" data sources wouldn't bother me too much as long as you don't have anything reasonable to do with that wasted CPU/sensor load.
A last and easier approach, in case you have some cycles to waste, is: Simply sort the data sources from "slowest" to "fastest" and initiate a measurement in that order, then wait for results in the same order and poll.

In SPIN/Promela, how to receive a MSG from a channel in the correct way?

I read the spin guide yet there is no answer for the following question:
I have a line in my code as following:
Ch?x
where Ch is a channel and x is channel type (to receive MSG)
What happens if Ch is empty? will it wait for MSG to arrive or not?
Do i need to check first if Ch is not empty?
basically all I want is that if Ch is empty then wait till MSG arrives and when it's arrive continue...
Bottom line: the semantics of Promela guarantee your desired behaviour, namely, that the receive-operation blocks until a message can be received.
From the receive man page
EXECUTABILITY
The first and the third form of the statement, written with a single
question mark, are executable if the first message in the channel
matches the pattern from the receive statement.
This tells you when a receive-operation is executable.
The semantics of Promela then tells you why executability matters:
As long as there are executable transitions (corresponding to the
basic statements of Promela), the semantics engine will select one of
them at random and execute it.
Granted, the quote doesn't make it very explicit, but it means that a statement that is currently not executable will block the executing process until it becomes executable.
Here is a small program that demonstrates the behaviour of the receive-operation.
chan ch = [1] of {byte};
/* Must be a buffered channel. A non-buffered, i.e., rendezvous channel,
* won't work, because it won't be possible to execute the atomic block
* around ch ! 0 atomically since sending over a rendezvous channel blocks
* as well.
*/
short n = -1;
proctype sender() {
atomic {
ch ! 0;
n = n + 1;
}
}
proctype receiver() {
atomic {
ch ? 0;
n = -n;
}
}
init {
atomic {
run sender();
run receiver();
}
_nr_pr == 1;
assert n == 0;
/* Only true if both processes are executed and if sending happened
* before receiving.
*/
}
Yes, the current proctype will block until a message arrives on Ch. This behavior is described in the Promela Manual under the receive statement. [Because you are providing a variable x (as in Ch?x) any message in Ch will cause the statement to be executable. That is, the pattern matching aspect of receive does not apply.]

Process Synchronisation using semaphores

Here is the problem.
I want two processes to occur alternatively, the complete problem is here.
Q. In a system ther are two processes named A and B.When the system starts,the process A executes twice then process B executes once.The Process B cannot execute until process A has executed twice. Once Process A haas executed it can not execute again until process B has executed. The restriction mentioned above allows the process A and B to execute in the following manner.
AABAABAAB...
Write the pseudocode for Process A and B using counting semaphore to achieve the desired synchronisation.
Here is my attemp for this.
Solution:
Process A
var a=1,b=0,i;
begin
repeat
wait(a);
for(i=0;i<2;i++)
printf("A"); // conidering this is what process a does.
signal(b);
forever
end
Process B
begin
repeat
wait(b);
printf("B"); //considering this is what process B does.
signal(a);
forever
end
Is this correct?
An alternative solution would be:
Semaphore as = 1;
Semaphore bs = 0;
A() {
int counter = 0;
while(TRUE) {
if(counter % 2 == 0)
P(as);
print("A"); // and whatever A does
counter++;
if(counter % 2 == 0)
V(bs);
}
}
B() {
P(bs);
print("B"); // and whatever B does
V(as);
}
The idea is that A waits for B on every 2nd turn (execept the 0th).
I think that the general idea is correct, but the terminology is rather strange. Wait-signal pair is usually used for condition variables (although e.g. POSIX semaphorese use post/wait).
I suggest you substitue the wait with semaphore_down and signal with semaphore_up.

How to implement prioritized lock with only compare_and_swap?

Given only compare and swap, I know how to implement a lock.
However, how do I implement a spin lock
1) multiple threads can block on it while trying to lock
2) and then the threads are un-blocked (and acquire the lock) in the order that they blocked on it?
Is it even possible? If not, what other primitives do I need?
If so, how do I do it?
Thanks!
You are going to need a list for the waiting threads. You need to add and remove items from the list in a thread safe manner. You will need to be able to sleep threads that fail to acquire the lock. You will need to be able to wake 1 thread when the lock becomes available. In linux you can accomplish the sleep and wait thing by having the thread wait on a signal.
Now there is a lazy way to do this, you might not need to care about waking threads. Here is pseudo code for our skiplist. This is what we do to add an item.
cFails = 0
while (1) {
NewState = OldState = State;
if (cFails > 3 || OldState.Lock) {
sleep(); // not too sophisticated, because they cant be awoken
cFails = 0;
continue;
}
Look for item in skiplist
return item if we found it
// to add the item to the list we need to lock it
// ABA lock uses a version number
NewState.Lock=1;
NewState.nVer++;
if (!CAS(&State,OldState, NewState)) {
++cFails;
continue;
}
// if the thread gets preempted right here, the lock is left on, and other threads
// spinning would waste their entire time slice.
// unlock
OldState = NewState;
NewState.Lock = 0;
NewState.nVer++;
CAS(&State, OldState,NewState);
}
We expect the skiplist to usually find the item and only rarely have to add it. We rarely have a race to add, even with a lot of threads. We tested this with a worst case scenario consisting of lots of threads adding and searching for millions of items to a single list. The result is we rarely saw threads fail to get the lock. So the simple approach that is high performance for the expected case works for us. There is one bad thing that can happen - a thread gets preempted holding the lock. Thats when cFails > 3 catches this and sleeps waiting threads so we don't waste their timeslices with a million useless spins. So cFails is set high enough that it detects that the owner of the lock is not active.