Process with redirected standard input/output behaves differently depending on size of input - process

I have an odd behavior about running processes from F#. My basic problem was to run Graphviz's dot.exe with a generated graph, and visualize it.
When I restricted the graph to be small, everything worked fine. But with bigger graphs, it hanged on a specific line. I'm curious why is it happening, so maybe I can fix my issue. Tho I created an MVCE for the purpose.
I have 2 console programs. One is the simulator of dot.exe, where I would showel the input, and expect .jpg from. This version just streams 10 times the input to the output in blocks:
// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.
open System
open System.IO
[<EntryPoint>]
let main argv =
let bufferSize = 4096
let mutable buffer : char [] = Array.zeroCreate bufferSize
// it repeats in 4096 blocks, but it doesn't matter. It's a simulation of outputing 10 times amount output.
while Console.In.ReadBlock(buffer, 0, bufferSize) <> 0 do
for i in 1 .. 10 do
Console.Out.WriteLine(buffer)
0 // return an integer exit code
So I have an .exe named C:\Users\MyUserName\Documents\Visual Studio 2015\Projects\ProcessInputOutputMVCE\EchoExe\bin\Debug\EchoExe.exe Than comes another console project, which uses it:
// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.
open System.IO
[<EntryPoint>]
let main argv =
let si =
new System.Diagnostics.ProcessStartInfo(#"C:\Users\MyUserName\Documents\Visual Studio 2015\Projects\ProcessInputOutputMVCE\EchoExe\bin\Debug\EchoExe.exe", "",
// from Fake's Process handling
#if FX_WINDOWSTLE
WindowStyle = ProcessWindowStyle.Hidden,
#else
CreateNoWindow = true,
#endif
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true,
RedirectStandardInput = true)
use p = new System.Diagnostics.Process()
p.StartInfo <- si
if p.Start() then
let input =
Seq.replicate 3000 "aaaa"
|> String.concat "\n"
p.StandardInput.Write input
// hangs on Flush()
p.StandardInput.Flush()
p.StandardInput.Close()
use outTxt = File.Create "out.txt"
p.StandardOutput.BaseStream.CopyTo outTxt
// double WaitForExit because of https://msdn.microsoft.com/en-us/library/system.diagnostics.process.standardoutput(v=vs.110).aspx
// saying first WaitForExit() waits for StandardOutput. Next is needed for the whole process.
p.WaitForExit()
p.WaitForExit()
0 // return an integer exit code
Which hangs on p.StandardInput.Flush(). Except if I change the volume of input to Seq.replicate 300 "aaaa". Why is it working differently?
Process.StandardOutput's reference states that reading from StandardOutput and child process writing to that stream in the same time might cause deadlock. Who is a child process in that case? Is it my p.StandardInput.Write input?
Other possible deadlock would be
read all text from both the standard output and standard error streams.
But I'm not reading the error stream. Anyway it suggests to handle the input/output with async so I have a following version:
// same as before
...
if p.Start() then
let rec writeIndefinitely (rows: string list) =
async {
if rows.Length = 0 then
()
else
do! Async.AwaitTask (p.StandardInput.WriteLineAsync rows.Head)
p.StandardInput.Flush()
do! writeIndefinitely rows.Tail
}
let inputTaskContinuation =
Seq.replicate 3000 "aaaa"
|> Seq.toList
|> writeIndefinitely
Async.Start (async {
do! inputTaskContinuation
p.StandardInput.Close()
}
)
let bufferSize = 4096
let mutable buffer : char array = Array.zeroCreate bufferSize
use outTxt = File.CreateText "out.txt"
let rec readIndefinitely() =
async {
let! readBytes = Async.AwaitTask (p.StandardOutput.ReadAsync (buffer, 0, bufferSize))
if readBytes <> 0 then
outTxt.Write (buffer, 0, buffer.Length)
outTxt.Flush()
do! readIndefinitely()
}
Async.Start (async {
do! readIndefinitely()
p.StandardOutput.Close()
})
// with that it throws "Cannot mix synchronous and asynchronous operation on process stream." on let! readBytes = Async.AwaitTask (p.StandardOutput.ReadAsync (buffer, 0, bufferSize))
//p.BeginOutputReadLine()
p.WaitForExit()
// using dot.exe, it hangs on the second WaitForExit()
p.WaitForExit()
Which doesn't hang, and writes out.txt. Except in the real code using dot.exe. It's as async as it gets. Why does it throw exception for p.BeginOutputReadLine()?
After some experimenting, the whole async out can stay p.StandardOutput.BaseStream.CopyTo outTxt where outTxt is File.Create not File.CreateText. Only the async input behaves correct against the synchronous input handling. Which is weird.
To sum it up. If I have asynchronous input handling, it works fine (except dot.exe, but if I figure it out, maybe I can fix that too), if it have synchronous input handling, it's depending on the size of the input. (300 works, 3000 doesn't) Why is that?
Update
Since I really not needed to redirect the standard error, I have removed RedirectStandardError = true. That solved the mysterious dot.exe problem.

I think that the deadlock here is following:
Host process writes too much data to the input buffer.
Child process reads from the buffer and writes to output.
Host process does not read from the buffer (it happens after sending all the data). When the output buffer of the child process is filled up, it blocks on writing and stops reading from the input. Two processes are now in the deadlock state.
It is not necessary to use completely asynchronous code. What I think would work is writing in chunks to dot's stdin and reading to the end of the dot's stdout before writing again.

Related

F# Console app is completely bombing out ignoring error handling

Excuse all the Console.WriteLines! Trying to figure out what is happening here.
The following code works if I run it in visual studio. When I compile and run it as a command line program - the line that tries to access an API using HttpClient completely stops the whole process and ends the program. No error handling, nothing. How can this happen? It bombs out if I remove the try/ with block as well. I am baffled.
let getTransactionData(ewayCSVData: EwayCSVData, httpClient: HttpClient) = task{
try
Console.WriteLine("get transaction data 1")
if ewayCSVData.transactionType.ToLower() = "refund" then
Console.WriteLine("get transaction data 2")
let url = "https://api.ewaypayments.com/Transaction/"+ewayCSVData.transactionNumber.ToString() + "/Refund"
Console.WriteLine("get transaction data 3")
let! postResult = httpClient.PostAsync(url, null)
Console.WriteLine("get transaction data 4")
let! result = postResult.Content.ReadAsStringAsync()
Console.WriteLine("get transaction data 5")
return result
else
Console.WriteLine("get transaction data 6")
let url: string = "https://api.ewaypayments.com/Transaction/"+ewayCSVData.transactionNumber
let! result = httpClient.GetStringAsync(url) // This line kills the whole process
Console.WriteLine("get transaction data 7")
return result
with (ex) ->
Console.Write(ex.Message)
Console.WriteLine("get transaction data 8")
return ""
}
I suspect your command-line program is simply exiting because it has reached the end of its code. You're launching an asynchronous task to access the API, but your program doesn't wait for this task to complete before exiting. That's why there are no exceptions or error messages.
To fix this, you can either explicitly wait on the result (e.g. via Task.Result), or put something at the end of your program to prevent the main thread from exiting. One common way to do this is by calling Console.ReadLine() on the last line of your top-level function.
For example, the following program (usually) exits without writing anything to the console:
open System
open System.Threading.Tasks
Task.Run<unit>(fun () ->
Console.WriteLine("Hello world"))
|> ignore
You can fix this by calling Result:
let task = Task.Run<unit>(fun () ->
Console.WriteLine("Hello world"))
task.Result
Or by calling Console.ReadLine:
Task.Run<unit>(fun () ->
Console.WriteLine("Hello world"))
|> ignore
Console.ReadLine() |> ignore
Note that my examples use Task.Run to illustrate the point. I don't think the F# task builder behaves quite the same way, but it sounds similar enough from your description.

Fastest way to read huge text file (6 GB) line by line [duplicate]

I have large txt file with 100000 lines.
I need to start n-count of threads and give every thread unique line from this file.
What is the best way to do this? I think I need to read file line by line and iterator must be global to lock it. Loading the text file to list will be time-consuming and I can receive OutofMemory exception. Any ideas?
You can use the File.ReadLines Method to read the file line-by-line without loading the whole file into memory at once, and the Parallel.ForEach Method to process the lines in multiple threads in parallel:
Parallel.ForEach(File.ReadLines("file.txt"), (line, _, lineNumber) =>
{
// your code here
});
After performing my own benchmarks for loading 61,277,203 lines into memory and shoving values into a Dictionary / ConcurrentDictionary() the results seem to support #dtb's answer above that using the following approach is the fastest:
Parallel.ForEach(File.ReadLines(catalogPath), line =>
{
});
My tests also showed the following:
File.ReadAllLines() and File.ReadAllLines().AsParallel() appear to run at almost exactly the same speed on a file of this size. Looking at my CPU activity, it appears they both seem to use two out of my 8 cores?
Reading all the data first using File.ReadAllLines() appears to be much slower than using File.ReadLines() in a Parallel.ForEach() loop.
I also tried a producer / consumer or MapReduce style pattern where one thread was used to read the data and a second thread was used to process it. This also did not seem to outperform the simple pattern above.
I have included an example of this pattern for reference, since it is not included on this page:
var inputLines = new BlockingCollection<string>();
ConcurrentDictionary<int, int> catalog = new ConcurrentDictionary<int, int>();
var readLines = Task.Factory.StartNew(() =>
{
foreach (var line in File.ReadLines(catalogPath))
inputLines.Add(line);
inputLines.CompleteAdding();
});
var processLines = Task.Factory.StartNew(() =>
{
Parallel.ForEach(inputLines.GetConsumingEnumerable(), line =>
{
string[] lineFields = line.Split('\t');
int genomicId = int.Parse(lineFields[3]);
int taxId = int.Parse(lineFields[0]);
catalog.TryAdd(genomicId, taxId);
});
});
Task.WaitAll(readLines, processLines);
Here are my benchmarks:
I suspect that under certain processing conditions, the producer / consumer pattern might outperform the simple Parallel.ForEach(File.ReadLines()) pattern. However, it did not in this situation.
Read the file on one thread, adding its lines to a blocking queue. Start N tasks reading from that queue. Set max size of the queue to prevent out of memory errors.
Something like:
public class ParallelReadExample
{
public static IEnumerable LineGenerator(StreamReader sr)
{
while ((line = sr.ReadLine()) != null)
{
yield return line;
}
}
static void Main()
{
// Display powers of 2 up to the exponent 8:
StreamReader sr = new StreamReader("yourfile.txt")
Parallel.ForEach(LineGenerator(sr), currentLine =>
{
// Do your thing with currentLine here...
} //close lambda expression
);
sr.Close();
}
}
Think it would work. (No C# compiler/IDE here)
If you want to limit the number of threads to n, the easiest way is to use AsParallel() along with WithDegreeOfParallelism(n) to limit the thread count:
string filename = "C:\\TEST\\TEST.DATA";
int n = 5;
foreach (var line in File.ReadLines(filename).AsParallel().WithDegreeOfParallelism(n))
{
// Process line.
}
As #dtb mentioned above, the fastest way to read a file and then process the individual lines in a file is to:
1) do a File.ReadAllLines() into an array
2) Use a Parallel.For loop to iterate over the array.
You can read more performance benchmarks here.
The basic gist of the code you would have to write is:
string[] AllLines = File.ReadAllLines(fileName);
Parallel.For(0, AllLines.Length, x =>
{
DoStuff(AllLines[x]);
//whatever you need to do
});
With the introduction of bigger array sizes in .Net4, as long as you have plenty of memory, this shouldn't be an issue.

How do I read the output of a child process without blocking in Rust?

I'm making a small ncurses application in Rust that needs to communicate with a child process. I already have a prototype written in Common Lisp. I'm trying to rewrite it because CL uses a huge amount of memory for such a small tool.
I'm having some trouble figuring out how to interact with the sub-process.
What I'm currently doing is roughly this:
Create the process:
let mut program = match Command::new(command)
.args(arguments)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
{
Ok(child) => child,
Err(_) => {
println!("Cannot run program '{}'.", command);
return;
}
};
Pass it to an infinite (until user exits) loop, which reads and handles input and listens for output like this (and writes it to the screen):
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout {
Some(ref mut out) => {
let mut buf_string = String::new();
match out.read_to_string(&mut buf_string) {
Ok(_) => output_viewer.append_string(buf_string),
Err(_) => return,
};
}
None => return,
};
}
The call to read_to_string however blocks the program until the process exits. From what I can see read_to_end and read also seem to block. If I try running something like ls which exits right away, it works, but with something that doesn't exit like python or sbcl it only continues once I kill the subprocess manually.
Based on this answer, I changed the code to use BufReader:
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout.as_mut() {
Some(out) => {
let buf_reader = BufReader::new(out);
for line in buf_reader.lines() {
match line {
Ok(l) => {
output_viewer.append_string(l);
}
Err(_) => return,
};
}
}
None => return,
}
}
However, the problem still remains the same. It will read all lines that are available, and then block. Since the tool is supposed to work with any program, there is no way to guess out when the output will end, before trying to read. There doesn't appear to be a way to set a timeout for BufReader either.
Streams are blocking by default. TCP/IP streams, filesystem streams, pipe streams, they are all blocking. When you tell a stream to give you a chunk of bytes it will stop and wait till it has the given amout of bytes or till something else happens (an interrupt, an end of stream, an error).
The operating systems are eager to return the data to the reading process, so if all you want is to wait for the next line and handle it as soon as it comes in then the method suggested by Shepmaster in Unable to pipe to or from spawned child process more than once (and also in his answer here) works.
Though in theory it doesn't have to work, because an operating system is allowed to make the BufReader wait for more data in read, but in practice the operating systems prefer the early "short reads" to waiting.
This simple BufReader-based approach becomes even more dangerous when you need to handle multiple streams (like the stdout and stderr of a child process) or multiple processes. For example, BufReader-based approach might deadlock when a child process waits for you to drain its stderr pipe while your process is blocked waiting on it's empty stdout.
Similarly, you can't use BufReader when you don't want your program to wait on the child process indefinitely. Maybe you want to display a progress bar or a timer while the child is still working and gives you no output.
You can't use BufReader-based approach if your operating system happens not to be eager in returning the data to the process (prefers "full reads" to "short reads") because in that case a few last lines printed by the child process might end up in a gray zone: the operating system got them, but they're not large enough to fill the BufReader's buffer.
BufReader is limited to what the Read interface allows it to do with the stream, it's no less blocking than the underlying stream is. In order to be efficient it will read the input in chunks, telling the operating system to fill as much of its buffer as it has available.
You might be wondering why reading data in chunks is so important here, why can't the BufReader just read the data byte by byte. The problem is that to read the data from a stream we need the operating system's help. On the other hand, we are not the operating system, we work isolated from it, so as not to mess with it if something goes wrong with our process. So in order to call to the operating system there needs to be a transition to "kernel mode" which might also incur a "context switch". That is why calling the operating system to read every single byte is expensive. We want as few OS calls as possible and so we get the stream data in batches.
To wait on a stream without blocking you'd need a non-blocking stream. MIO promises to have the required non-blocking stream support for pipes, most probably with PipeReader, but I haven't checked it out so far.
The non-blocking nature of a stream should make it possible to read data in chunks regardless of whether the operating system prefers the "short reads" or not. Because non-blocking stream never blocks. If there is no data in the stream it simply tells you so.
In the absense of a non-blocking stream you'll have to resort to spawning threads so that the blocking reads would be performed in a separate thread and thus won't block your primary thread. You might also want to read the stream byte by byte in order to react to the line separator immediately in case the operating system does not prefer the "short reads". Here's a working example: https://gist.github.com/ArtemGr/db40ae04b431a95f2b78.
P.S. Here's an example of a function that allows to monitor the standard output of a program via a shared vector of bytes:
use std::io::Read;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
/// Pipe streams are blocking, we need separate threads to monitor them without blocking the primary thread.
fn child_stream_to_vec<R>(mut stream: R) -> Arc<Mutex<Vec<u8>>>
where
R: Read + Send + 'static,
{
let out = Arc::new(Mutex::new(Vec::new()));
let vec = out.clone();
thread::Builder::new()
.name("child_stream_to_vec".into())
.spawn(move || loop {
let mut buf = [0];
match stream.read(&mut buf) {
Err(err) => {
println!("{}] Error reading from stream: {}", line!(), err);
break;
}
Ok(got) => {
if got == 0 {
break;
} else if got == 1 {
vec.lock().expect("!lock").push(buf[0])
} else {
println!("{}] Unexpected number of bytes: {}", line!(), got);
break;
}
}
}
})
.expect("!thread");
out
}
fn main() {
let mut cat = Command::new("cat")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("!cat");
let out = child_stream_to_vec(cat.stdout.take().expect("!stdout"));
let err = child_stream_to_vec(cat.stderr.take().expect("!stderr"));
let mut stdin = match cat.stdin.take() {
Some(stdin) => stdin,
None => panic!("!stdin"),
};
}
With a couple of helpers I'm using it to control an SSH session:
try_s! (stdin.write_all (b"echo hello world\n"));
try_s! (wait_forĖ¢ (&out, 0.1, 9., |s| s == "hello world\n"));
P.S. Note that await on a read call in async-std is blocking as well. It's just instead of blocking a system thread it only blocks a chain of futures (a stack-less green thread essentially). The poll_read is the non-blocking interface. In async-std#499 I've asked the developers whether there's a short read guarantee from these APIs.
P.S. There might be a similar concern in Nom: "we would want to tell the IO side to refill according to the parser's result (Incomplete or not)"
P.S. Might be interesting to see how stream reading is implemented in crossterm. For Windows, in poll.rs, they are using the native WaitForMultipleObjects. In unix.rs they are using mio poll.
Tokio's Command
Here is an example of using tokio 0.2:
use std::process::Stdio;
use futures::StreamExt; // 0.3.1
use tokio::{io::BufReader, prelude::*, process::Command}; // 0.2.4, features = ["full"]
#[tokio::main]
async fn main() {
let mut cmd = Command::new("/tmp/slow.bash")
.stdout(Stdio::piped()) // Can do the same for stderr
.spawn()
.expect("cannot spawn");
let stdout = cmd.stdout().take().expect("no stdout");
// Can do the same for stderr
// To print out each line
// BufReader::new(stdout)
// .lines()
// .for_each(|s| async move { println!("> {:?}", s) })
// .await;
// To print out each line *and* collect it all into a Vec
let result: Vec<_> = BufReader::new(stdout)
.lines()
.inspect(|s| println!("> {:?}", s))
.collect()
.await;
println!("All the lines: {:?}", result);
}
Tokio-Threadpool
Here is an example of using tokio 0.1 and tokio-threadpool. We start the process in a thread using the blocking function. We convert that to a stream with stream::poll_fn
use std::process::{Command, Stdio};
use tokio::{prelude::*, runtime::Runtime}; // 0.1.18
use tokio_threadpool; // 0.1.13
fn stream_command_output(
mut command: Command,
) -> impl Stream<Item = Vec<u8>, Error = tokio_threadpool::BlockingError> {
// Ensure that the output is available to read from and start the process
let mut child = command
.stdout(Stdio::piped())
.spawn()
.expect("cannot spawn");
let mut stdout = child.stdout.take().expect("no stdout");
// Create a stream of data
stream::poll_fn(move || {
// Perform blocking IO
tokio_threadpool::blocking(|| {
// Allocate some space to store anything read
let mut data = vec![0; 128];
// Read 1-128 bytes of data
let n_bytes_read = stdout.read(&mut data).expect("cannot read");
if n_bytes_read == 0 {
// Stdout is done
None
} else {
// Only return as many bytes as we read
data.truncate(n_bytes_read);
Some(data)
}
})
})
}
fn main() {
let output_stream = stream_command_output(Command::new("/tmp/slow.bash"));
let mut runtime = Runtime::new().expect("Unable to start the runtime");
let result = runtime.block_on({
output_stream
.map(|d| String::from_utf8(d).expect("Not UTF-8"))
.fold(Vec::new(), |mut v, s| {
print!("> {}", s);
v.push(s);
Ok(v)
})
});
println!("All the lines: {:?}", result);
}
There's numerous possible tradeoffs that can be made here. For example, always allocating 128 bytes isn't ideal, but it's simple to implement.
Support
For reference, here's slow.bash:
#!/usr/bin/env bash
set -eu
val=0
while [[ $val -lt 10 ]]; do
echo $val
val=$(($val + 1))
sleep 1
done
See also:
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
If Unix support is sufficient, you can also make the two output streams as non-blocking and poll over them as you would do it on TcpStream with the set_nonblocking function.
The ChildStdout and ChildStderr returned by the Command spawn are Stdio (and contain a file descriptor), you can modify directly the read behavior of these handle to make it non-blocking.
Based on the work of jcreekmore/timeout-readwrite-rs and anowell/nonblock-rs, I use this wrapper to modify the stream handles:
extern crate libc;
use std::io::Read;
use std::os::unix::io::AsRawFd;
use libc::{F_GETFL, F_SETFL, fcntl, O_NONBLOCK};
fn set_nonblocking<H>(handle: &H, nonblocking: bool) -> std::io::Result<()>
where
H: Read + AsRawFd,
{
let fd = handle.as_raw_fd();
let flags = unsafe { fcntl(fd, F_GETFL, 0) };
if flags < 0 {
return Err(std::io::Error::last_os_error());
}
let flags = if nonblocking{
flags | O_NONBLOCK
} else {
flags & !O_NONBLOCK
};
let res = unsafe { fcntl(fd, F_SETFL, flags) };
if res != 0 {
return Err(std::io::Error::last_os_error());
}
Ok(())
}
You can manage the two streams as any other non-blocking stream. The following example is based on the polling crate which makes really easy to handle read event and BufReader for line reading:
use std::process::{Command, Stdio};
use std::path::PathBuf;
use std::io::{BufReader, BufRead};
use std::thread;
extern crate polling;
use polling::{Event, Poller};
fn main() -> Result<(), std::io::Error> {
let path = PathBuf::from("./worker.sh").canonicalize()?;
let mut child = Command::new(path)
.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("Failed to start worker");
let handle = thread::spawn({
let stdout = child.stdout.take().unwrap();
set_nonblocking(&stdout, true)?;
let mut reader_out = BufReader::new(stdout);
let stderr = child.stderr.take().unwrap();
set_nonblocking(&stderr, true)?;
let mut reader_err = BufReader::new(stderr);
move || {
let key_out = 1;
let key_err = 2;
let mut out_closed = false;
let mut err_closed = false;
let poller = Poller::new().unwrap();
poller.add(reader_out.get_ref(), Event::readable(key_out)).unwrap();
poller.add(reader_err.get_ref(), Event::readable(key_err)).unwrap();
let mut line = String::new();
let mut events = Vec::new();
loop {
// Wait for at least one I/O event.
events.clear();
poller.wait(&mut events, None).unwrap();
for ev in &events {
// stdout is ready for reading
if ev.key == key_out {
let len = match reader_out.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stdout read returned error: {}", e);
0
}
};
if len == 0 {
println!("stdout closed (len is null)");
out_closed = true;
poller.delete(reader_out.get_ref()).unwrap();
} else {
print!("[STDOUT] {}", line);
line.clear();
// reload the poller
poller.modify(reader_out.get_ref(), Event::readable(key_out)).unwrap();
}
}
// stderr is ready for reading
if ev.key == key_err {
let len = match reader_err.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stderr read returned error: {}", e);
0
}
};
if len == 0 {
println!("stderr closed (len is null)");
err_closed = true;
poller.delete(reader_err.get_ref()).unwrap();
} else {
print!("[STDERR] {}", line);
line.clear();
// reload the poller
poller.modify(reader_err.get_ref(), Event::readable(key_err)).unwrap();
}
}
}
if out_closed && err_closed {
println!("Stream closed, exiting process thread");
break;
}
}
}
});
handle.join().unwrap();
Ok(())
}
Additionally, used with a wrapper over an EventFd, it becomes possible to easily stop the process from another thread without blocking nor active polling and uses and only a single thread.
EDIT: It seems the polling crate sets automatically the polled handles in non-blocking mode following my tests. The set_nonblocking function is still useful in case you want to directly use the nix::poll object.
I have encountered enough use-cases where it was useful to interact with a subprocess over line-delimited text that I wrote a crate for it, interactive_process.
I expect the original problem has long since been solved, but I thought it might be helpful to others.

OSX Serial read freeze / hang

I'm writing a serial communication wrapper class in Objective-C. To list all serial available modems and setup the connection I'm using pretty much the same code as used in this example project by Apple.
I could read and write the ways apple does it. But I want to implement a loop on a second thread and write to the stream if a NSString *writeString longer 0 and read after write if bytes are available.
I got writing working quite straight forward. I just used the write function declared in unistd.h.
Reading will not work. Whenever I call read(), the function hangs and my loop does not proceed.
Here is the code used in my loop:
- (void)runInCOMLoop {
do {
// write
} while (bytesWritten < strlen([_writeString UTF8String]));
NSMutableString *readString = [NSMutableString string];
ssize_t bytesRead = 0;
ssize_t readB = 0;
char buffer[256];
do {
readB = read(_fileDescriptor, &buffer, sizeof(buffer));
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this function hangs
bytesRead += readB;
if (readB == -1 {
// error
}
else if (readB > 0) {
if(buffer[bytesRead - 1] == '\r' ]] buffer[bytesRead - 1] == '\n') {
break;
}
[readString appendString:[NSString stringWithUTF8String:buffer]];
}
} while (readB > 0);
What am I doing wrong here?
read() will block if there is nothing to read. Apple probably has their own of doing things, but you can use select() to see if there is anything to read on _fileDescriptor. Google around for examples on how to use select.
Here's one link on StackOverflow:
Can someone give me an example of how select() is alerted to an fd becoming "ready"
This excerpt from the select man is pertains:
To effect a poll, the timeout argument should be
non-nil, pointing to a zero-valued timeval structure. Timeout is not
changed by select(), and may be reused on subsequent calls, however it is
good style to re-initialize it before each invocation of select().
You can set the non-blocking flag (O_NONBLOCK) on the file descriptor using fcntl() to keep read() from waiting for data, but if you do that, you have to continuously poll looking for data, which is obviously bad from a CPU usage standpoint. As Charlie Burns' answer explains, the best solution is to use select() which will allow your program to efficiently wait until there is some data to be read on the port's file descriptor. Here's some example code taken from my own Objective-C serial port class, ORSSerialPort (slightly modified):
fd_set localReadFDSet;
FD_ZERO(&localReadFDSet);
FD_SET(self.fileDescriptor, &localReadFDSet);
timeout.tv_sec = 0;
timeout.tv_usec = 100000; // Check to see if port closed every 100ms
result = select(localPortFD+1, &localReadFDSet, NULL, NULL, &timeout);
if (!self.isOpen) break; // Port closed while select call was waiting
if (result < 0) {
// Handle error
}
if (result == 0 || !FD_ISSET(localPortFD, &localReadFDSet)) continue;
// Data is available
char buf[1024];
long lengthRead = read(localPortFD, buf, sizeof(buf));
NSData *readData = nil;
if (lengthRead>0) readData = [NSData dataWithBytes:buf length:lengthRead];
Note that select() indicates that data is available by returning. So, your program will sit suspended at the select() call while no data is available. The program is not hung, that's how it's supposed to work. If you need to do other things while select() is waiting, you should put the select() call on a different queue/thread from the other work you need to do. ORSSerialPort does this.

sprintf() and WriteFile() affecting string Buffer

I have a very weird problem which I cannot seem to figure out. Unfortunately, I'm not even sure how to describe it without describing my entire application. What I am trying to do is:
1) read a byte from the serial port
2) store each char into tagBuffer as they are read
3) run a query using tagBuffer to see what type of tag it is (book or shelf tag)
4) depending on the type of tag, output a series of bytes corresponding to the type of tag
Most of my code is implemented and I can get the right tag code sent back out the serial port. But there are two lines that I've added as debug statements which when I tried to remove them, they cause my program to stop working.
The lines are the two lines at the very bottom:
sprintf(buf,"%s!\n", tagBuffer);
WriteFile(hSerial,buf,strlen(buf), &dwBytesWritten,&ovWrite);
If I try to remove them, "tagBuffer" will only store the last character as oppose being a buffer. Same thing with the next line, WriteFile().
I thought sprintf and WriteFile are I/O functions and would have no effect on variables.
I'm stuck and I need help to fix this.
//keep polling as long as stop character '-' is not read
while(szRxChar != '-')
{
// Check if a read is outstanding
if (HasOverlappedIoCompleted(&ovRead))
{
// Issue a serial port read
if (!ReadFile(hSerial,&szRxChar,1,
&dwBytesRead,&ovRead))
{
DWORD dwErr = GetLastError();
if (dwErr!=ERROR_IO_PENDING)
return dwErr;
}
}
// resets tagBuffer in case tagBuffer is out of sync
time_t t_time = time(0);
char buf[50];
if (HasOverlappedIoCompleted(&ovWrite))
{
i=0;
}
// Wait 5 seconds for serial input
if (!(HasOverlappedIoCompleted(&ovRead)))
{
WaitForSingleObject(hReadEvent,RESET_TIME);
}
// Check if serial input has arrived
if (GetOverlappedResult(hSerial,&ovRead,
&dwBytesRead,FALSE))
{
// Wait for the write
GetOverlappedResult(hSerial,&ovWrite,
&dwBytesWritten,TRUE);
if( strlen(tagBuffer) >= PACKET_LENGTH )
{
i = 0;
}
//load tagBuffer with byte stream
tagBuffer[i] = szRxChar;
i++;
tagBuffer[i] = 0; //char arrays are \0 terminated
//run query with tagBuffer
sprintf(query,"select type from rfid where rfidnum=\"");
strcat(query, tagBuffer);
strcat(query, "\"");
mysql_real_query(&mysql,query,(unsigned int)strlen(query));
//process result and send back to handheld
res = mysql_use_result(&mysql);
while(row = mysql_fetch_row(res))
{
printf("result of query is %s\n",row[0]);
string str = "";
str = string(row[0]);
if( str == "book" )
{
WriteFile(hSerial,BOOK_INDICATOR,strlen(BOOK_INDICATOR),
&dwBytesWritten,&ovWrite);
}
else if ( str == "shelf" )
{
WriteFile(hSerial,SHELF_INDICATOR,strlen(SHELF_INDICATOR),
&dwBytesWritten,&ovWrite);
}
else //this else doesn't work
{
WriteFile(hSerial,NOK,strlen(NOK),
&dwBytesWritten,&ovWrite);
}
}
mysql_free_result(res);
// Display a response to input
//printf("query is %s!\n", query);
//printf("strlen(tagBuffer) is %d!\n", strlen(tagBuffer));
//without these, tagBuffer only holds the last character
sprintf(buf,"%s!\n", tagBuffer);
WriteFile(hSerial,buf,strlen(buf), &dwBytesWritten,&ovWrite);
}
}
With those two lines, my output looks like this:
s sh she shel shelf shelf0 shelf00 BOOKCODE shelf0001
Without them, I figured out that tagBuffer and buf only stores the most recent character at any one time.
Any help at all will be greatly appreciated. Thanks.
Where are you allocating tagbuffer, how large is it?
It's possible that you are overwriting 'buf' because you are writing past the end of tagbuffer.
It seems unlikely that those two lines would have that effect on a correct program - maybe you haven't allocated sufficient space in buf for the whole length of the string in tagBuffer? This might cause a buffer overrun that is disguising the real problem?
The first thing I'd say is a piece of general advice: bugs aren't always where you think they are. If you've got something going on that doesn't seem to make sense, it often means that your assumptions somewhere else are wrong.
Here, it does seem very unlikely that an sprintf() and a WriteFile() will change the state of the "buf" array variable. However, those two lines of test code do write to "hSerial", while your main loop also reads from "hSerial". That sounds like a recipie for changing the behaviour of your program.
Suggestion: Change your lines of debugging output to store the output somewhere else: to a dialog box, or to a log file, or similar. Debugging output should generally not go to files used in the core logic, as that's too likely to change how the core logic behaves.
In my opinion, the real problem here is that you're trying to read and write the serial port from a single thread, and this is making the code more complex than it needs to be. I suggest that you read the following articles and reconsider your design:
Serial Port I/O from Joseph Newcomer's website.
Serial Communications in Win32 from MSDN.
In a multithreaded implementation, whenever the reader thread reads a message from the serial port you would then post it to your application's main thread. The main thread would then parse the message and query the database, and then queue an appropriate response to the writer thread.
This may sound more complex than your current design, but really it isn't, as Newcomer explains.
I hope this helps!