Rusoto S3 reading object from S3 - amazon-s3

apologies if this is a dumb question, I am new enough to using rust and i am having a hard time figuring out how to read an object from s3 using the rusoto library.
https://rusoto.github.io/rusoto/rusoto_s3/
so far i've worked out this much:
let mut get_object_request = rusoto_s3::GetObjectRequest::default();
get_object_request.bucket = bucket.to_owned();
get_object_request.key = object_id.to_owned();
let resp = client.get_object(get_object_request)
.await
.map_err(|_| {
FailedToParseFile(format!(
"Failed to retrieve file from s3, bucket: {}, object id: {}",
bucket, object_id
))
})?;
let mut resp_body = match resp.body {
Some(body) => {body}
None => { return Err(ClientError::FailedToObtainConfiguredFile(
format!("Failed to retrieve the file: {} in {}", object_id, bucket)
))}
};
However i've no idea how to turn the streaming body returned from this into a usable string. I've tried a few things to get this working but none seem to work for me.
let mut byte_buffer = String::new();
resp_body.into_blocking_read()
.read_to_string(&mut byte_buffer)
.map_err(|_| {
FailedToParseFile(format!(
"Failed to parse the received file, bucket: {}, object id: {}",
bucket, object_id
))
})?;
let resp_body_bytes = byte_buffer.into_bytes();
As i am using tokio (rocket), this blocking queue doesn't seem to be an option for me causing it to panic on attempting to block on async thread. I started looking at the method 'into_async_read()' rather than blocking but i'm unfamiliar with this too and seem to be struggling using it as intended, this is starting to seem pretty convoluted maybe i'm missing something, any help with this would be appreciated.

let mut byte_buffer = String::new();
Pin::new(&mut resp_body.into_async_read())
.read_to_string(&mut byte_buffer)
.await;
This actually seems to do what I need it to, messing about with this I didn't realise the result of .read_to_string(&mut byte_buffer) could be awaited in order to fill the supplied buffer.

Related

Upload AsyncRead to s3 rust

I am try to upload AsyncRead trait to s3, but I can not figure out how to convert the AsyncRead object to ByteStream which the library (aws-sdk-s3) know how to work with.
The signature of my function is like this:
async fn upload(&self, key: String, data: &mut impl AsyncRead) -> Result<(),Error>;
let byte_stream = codec::FramedRead::new(read, codec::BytesCodec::new()).map(|r| r.freeze());
let res = s3
.put_object()
.key(id)
.body(ByteStream::new(byte_stream))
.send()
.await;
The errors that I got are:
the trait bound &impl 'async_trait + AsyncRead: AsyncRead is not satisfied
FramedRead<&impl 'async_trait + AsyncRead, BytesCodec> is not an iterator
I found this and tried to implement it like the "good way" section but I got compile errors.
I also looked at the documentation and found nothing that can help with the conversion (found only the other way around).
How can I upload an AsyncRead object to s3?

Why does Arc::try_unwrap() cause a panic?

I'm writing a simple chat server which broadcasts messages to all the clients connected.
The code might look terrible, since I'm a beginner. Peers are not used anywhere yet, since I want to pass it to handle_client function as well, so when data will be available in stream and read successfully, I want to broadcast it over all the clients connected. I understand this is not a good approach, I'm just trying to understand how can I do things like this in general.
use std::io::BufRead;
use std::io::Write;
use std::net::{TcpListener, TcpStream};
use std::sync::Arc;
fn handle_client(arc: Arc<TcpStream>) -> std::io::Result<()> {
let mut stream = Arc::try_unwrap(arc).unwrap();
stream.write(b"Welcome to the server!\r\n")?;
println!("incomming connection: {:?}", stream);
std::thread::spawn(move || -> std::io::Result<()> {
let peer_addr = stream.peer_addr()?;
let mut reader = std::io::BufReader::new(stream);
let mut buf = String::new();
loop {
let bytes_read = reader.read_line(&mut buf)?;
if bytes_read == 0 {
println!("client disconnected {}", peer_addr);
return Ok(());
}
buf.remove(bytes_read - 1);
println!("{}: {}", peer_addr, buf);
buf.clear();
}
});
Ok(())
}
fn start() -> std::io::Result<()> {
let listener = TcpListener::bind("0.0.0.0:1111")?;
println!("listening on {}", listener.local_addr()?.port());
let mut peers: Vec<Arc<TcpStream>> = vec![];
for stream in listener.incoming() {
let mut stream = stream.unwrap();
let arc = Arc::new(stream);
peers.push(arc.clone());
handle_client(arc.clone()).unwrap();
}
Ok(())
}
fn main() -> std::io::Result<()> {
start()
}
It compiles fine, but let mut stream = Arc::try_unwrap(arc).unwrap(); in the handle_client function panics. What am I doing wrong? Why is it panicking?
Why is it panicking?
You are calling unwrap on a Result::Err. The Err comes from try_unwrap failing on the Arc.
What am I doing wrong?
Unwrapping an Arc will move its value and take ownership of it. This fails because there are three clones of the same Arc:
one in the main loop which is still in scope
one in the peers vector
the one that you are trying to unwrap inside handle_client.
The other two clones would become invalid if Rust allowed you to unwrap and move the value.
Instead of unwrapping the value you can use Arc's Deref implementation to borrow it:
let stream: &TcpStream = &arc;
Since you are now borrowing the value from the Arc, you need to move the scope of the arc variable inside the new thread, otherwise the borrow checker won't be able to ensure that it lives as long as the thread:
fn handle_client(arc: Arc<TcpStream>) -> std::io::Result<()> {
std::thread::spawn(move || -> std::io::Result<()> {
let mut stream: &TcpStream = &arc;
stream.write(b"Welcome to the server!\r\n")?;
let peer_addr = stream.peer_addr()?;
let mut reader = std::io::BufReader::new(stream);
let mut buf = String::new();
// ...
}
}
It says in the documentation
Returns the contained value, if the Arc has exactly one strong
reference.
Otherwise, an Err is returned with the same Arc that was passed in.
This will succeed even if there are outstanding weak references.
(weak reference)
Your code will work fine with one strong and many weak references.
let mut peers: Vec<Weak<TcpStream>> = vec![];
for stream in listener.incoming() {
let mut stream = stream.unwrap();
let arc = Arc::new(stream);
peers.push(Arc::downgrade(&arc));
handle_client(arc).unwrap();
}
One thing to note about the weak references: if you unwrap your one strong reference, you will not able to use weak references.

Object oriented design patterns for error checking

I have written the following function that reads the contents of a text file and panic!s if an error is encountered.
fn get_file_contents(name: String) -> Result<String, io::Error> {
let mut f = try!(File::open(name));
let mut contents = String::new();
try!(f.read_to_string(&mut contents));
Ok(contents)
}
And the contents are extracted from the Result using:
let file_contents = match get_file_contents(file_name) {
Ok(contents) => contents,
Err(err) => panic!("{}", err)
};
I am now trying to reimplement this in an object oriented manner using structures and implementations. I created the following structure:
struct FileReader {
file_name: String,
file_contents: String,
}
and implemented the following methods:
impl FileReader {
fn new(fname: &str) -> FileReader {
FileReader {
file_name: fname.to_string(),
file_contents: String::new(),
}
}
fn get_file_contents(&mut self) {
let mut f = match File::open(&self.file_name) {
Ok(file) => file,
Err(err) => panic!("{}", err)
};
match f.read_to_string(&mut self.file_contents) {
Ok(size) => size,
Err(err) => panic!("{}", err)
};
}
}
In the OO approach, I haven't used the try! macro as I don't want the method to return any value. Is my OO implementation of get_file_contents a typical way of achieving this functionality? If not, can you please suggest an alternative way?
In the OO approach, I haven't used the try! macro as I don't want the method to return any value.
It's unclear why you think that "object oriented" means "doesn't return a value". If an error can occur, the code should indicate that.
Many languages have the equivalent of exceptions — out of band values that are thrown (also known as "returned") from a function or method. Note that this means that these languages allow for two disjoint types to be returned from a given function: the "normal" type and the "exceptional" type. That is a close equivalent for Rust's Result: Result<NormalType, ExceptionalType>.
Exceptional isn't a great term for this, as you should expect that opening a file should fail. There's an infinite number of ways that it could not work, but only a narrow subset of ways that it can succeed.
Panicking is closer to "kill the entire program / thread right now". Unlike C, you are forced to either deal with a problem, pass it back to the caller, or kill the program (panic).
If you would have thrown an exception in a language that supports them, use a Result. If you would have killed the program, or don't want to handle an error, use a panic.
If you want to panic in your particular case, use unwrap, or even better, expect:
fn get_file_contents(&mut self) {
let mut f = File::open(&self.file_name).expect("Couldn't open file");
f.read_to_string(&mut self.file_contents).expect("Couldn't read file");
}
seems kind of clunky to have to deal with the Result for each method.
Which is why the Error Handling section of The Rust Programming Language spends a good amount of time discussing the try! macro:
A cornerstone of error handling in Rust is the try! macro. The try! macro abstracts case analysis like combinators, but unlike combinators, it also abstracts control flow. Namely, it can abstract the early return pattern seen above.
(this makes more sense in context of the page)
I don't want my code to try and recover from the error (most likely caused by the file not being found) - I want it to print a useful error message and then die
Then by all means, panic. There's more succinct AND more detailed ways to do it (as shown above).

How do I read the output of a child process without blocking in Rust?

I'm making a small ncurses application in Rust that needs to communicate with a child process. I already have a prototype written in Common Lisp. I'm trying to rewrite it because CL uses a huge amount of memory for such a small tool.
I'm having some trouble figuring out how to interact with the sub-process.
What I'm currently doing is roughly this:
Create the process:
let mut program = match Command::new(command)
.args(arguments)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
{
Ok(child) => child,
Err(_) => {
println!("Cannot run program '{}'.", command);
return;
}
};
Pass it to an infinite (until user exits) loop, which reads and handles input and listens for output like this (and writes it to the screen):
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout {
Some(ref mut out) => {
let mut buf_string = String::new();
match out.read_to_string(&mut buf_string) {
Ok(_) => output_viewer.append_string(buf_string),
Err(_) => return,
};
}
None => return,
};
}
The call to read_to_string however blocks the program until the process exits. From what I can see read_to_end and read also seem to block. If I try running something like ls which exits right away, it works, but with something that doesn't exit like python or sbcl it only continues once I kill the subprocess manually.
Based on this answer, I changed the code to use BufReader:
fn listen_for_output(program: &mut Child, output_viewer: &TextViewer) {
match program.stdout.as_mut() {
Some(out) => {
let buf_reader = BufReader::new(out);
for line in buf_reader.lines() {
match line {
Ok(l) => {
output_viewer.append_string(l);
}
Err(_) => return,
};
}
}
None => return,
}
}
However, the problem still remains the same. It will read all lines that are available, and then block. Since the tool is supposed to work with any program, there is no way to guess out when the output will end, before trying to read. There doesn't appear to be a way to set a timeout for BufReader either.
Streams are blocking by default. TCP/IP streams, filesystem streams, pipe streams, they are all blocking. When you tell a stream to give you a chunk of bytes it will stop and wait till it has the given amout of bytes or till something else happens (an interrupt, an end of stream, an error).
The operating systems are eager to return the data to the reading process, so if all you want is to wait for the next line and handle it as soon as it comes in then the method suggested by Shepmaster in Unable to pipe to or from spawned child process more than once (and also in his answer here) works.
Though in theory it doesn't have to work, because an operating system is allowed to make the BufReader wait for more data in read, but in practice the operating systems prefer the early "short reads" to waiting.
This simple BufReader-based approach becomes even more dangerous when you need to handle multiple streams (like the stdout and stderr of a child process) or multiple processes. For example, BufReader-based approach might deadlock when a child process waits for you to drain its stderr pipe while your process is blocked waiting on it's empty stdout.
Similarly, you can't use BufReader when you don't want your program to wait on the child process indefinitely. Maybe you want to display a progress bar or a timer while the child is still working and gives you no output.
You can't use BufReader-based approach if your operating system happens not to be eager in returning the data to the process (prefers "full reads" to "short reads") because in that case a few last lines printed by the child process might end up in a gray zone: the operating system got them, but they're not large enough to fill the BufReader's buffer.
BufReader is limited to what the Read interface allows it to do with the stream, it's no less blocking than the underlying stream is. In order to be efficient it will read the input in chunks, telling the operating system to fill as much of its buffer as it has available.
You might be wondering why reading data in chunks is so important here, why can't the BufReader just read the data byte by byte. The problem is that to read the data from a stream we need the operating system's help. On the other hand, we are not the operating system, we work isolated from it, so as not to mess with it if something goes wrong with our process. So in order to call to the operating system there needs to be a transition to "kernel mode" which might also incur a "context switch". That is why calling the operating system to read every single byte is expensive. We want as few OS calls as possible and so we get the stream data in batches.
To wait on a stream without blocking you'd need a non-blocking stream. MIO promises to have the required non-blocking stream support for pipes, most probably with PipeReader, but I haven't checked it out so far.
The non-blocking nature of a stream should make it possible to read data in chunks regardless of whether the operating system prefers the "short reads" or not. Because non-blocking stream never blocks. If there is no data in the stream it simply tells you so.
In the absense of a non-blocking stream you'll have to resort to spawning threads so that the blocking reads would be performed in a separate thread and thus won't block your primary thread. You might also want to read the stream byte by byte in order to react to the line separator immediately in case the operating system does not prefer the "short reads". Here's a working example: https://gist.github.com/ArtemGr/db40ae04b431a95f2b78.
P.S. Here's an example of a function that allows to monitor the standard output of a program via a shared vector of bytes:
use std::io::Read;
use std::process::{Command, Stdio};
use std::sync::{Arc, Mutex};
use std::thread;
/// Pipe streams are blocking, we need separate threads to monitor them without blocking the primary thread.
fn child_stream_to_vec<R>(mut stream: R) -> Arc<Mutex<Vec<u8>>>
where
R: Read + Send + 'static,
{
let out = Arc::new(Mutex::new(Vec::new()));
let vec = out.clone();
thread::Builder::new()
.name("child_stream_to_vec".into())
.spawn(move || loop {
let mut buf = [0];
match stream.read(&mut buf) {
Err(err) => {
println!("{}] Error reading from stream: {}", line!(), err);
break;
}
Ok(got) => {
if got == 0 {
break;
} else if got == 1 {
vec.lock().expect("!lock").push(buf[0])
} else {
println!("{}] Unexpected number of bytes: {}", line!(), got);
break;
}
}
}
})
.expect("!thread");
out
}
fn main() {
let mut cat = Command::new("cat")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("!cat");
let out = child_stream_to_vec(cat.stdout.take().expect("!stdout"));
let err = child_stream_to_vec(cat.stderr.take().expect("!stderr"));
let mut stdin = match cat.stdin.take() {
Some(stdin) => stdin,
None => panic!("!stdin"),
};
}
With a couple of helpers I'm using it to control an SSH session:
try_s! (stdin.write_all (b"echo hello world\n"));
try_s! (wait_forˢ (&out, 0.1, 9., |s| s == "hello world\n"));
P.S. Note that await on a read call in async-std is blocking as well. It's just instead of blocking a system thread it only blocks a chain of futures (a stack-less green thread essentially). The poll_read is the non-blocking interface. In async-std#499 I've asked the developers whether there's a short read guarantee from these APIs.
P.S. There might be a similar concern in Nom: "we would want to tell the IO side to refill according to the parser's result (Incomplete or not)"
P.S. Might be interesting to see how stream reading is implemented in crossterm. For Windows, in poll.rs, they are using the native WaitForMultipleObjects. In unix.rs they are using mio poll.
Tokio's Command
Here is an example of using tokio 0.2:
use std::process::Stdio;
use futures::StreamExt; // 0.3.1
use tokio::{io::BufReader, prelude::*, process::Command}; // 0.2.4, features = ["full"]
#[tokio::main]
async fn main() {
let mut cmd = Command::new("/tmp/slow.bash")
.stdout(Stdio::piped()) // Can do the same for stderr
.spawn()
.expect("cannot spawn");
let stdout = cmd.stdout().take().expect("no stdout");
// Can do the same for stderr
// To print out each line
// BufReader::new(stdout)
// .lines()
// .for_each(|s| async move { println!("> {:?}", s) })
// .await;
// To print out each line *and* collect it all into a Vec
let result: Vec<_> = BufReader::new(stdout)
.lines()
.inspect(|s| println!("> {:?}", s))
.collect()
.await;
println!("All the lines: {:?}", result);
}
Tokio-Threadpool
Here is an example of using tokio 0.1 and tokio-threadpool. We start the process in a thread using the blocking function. We convert that to a stream with stream::poll_fn
use std::process::{Command, Stdio};
use tokio::{prelude::*, runtime::Runtime}; // 0.1.18
use tokio_threadpool; // 0.1.13
fn stream_command_output(
mut command: Command,
) -> impl Stream<Item = Vec<u8>, Error = tokio_threadpool::BlockingError> {
// Ensure that the output is available to read from and start the process
let mut child = command
.stdout(Stdio::piped())
.spawn()
.expect("cannot spawn");
let mut stdout = child.stdout.take().expect("no stdout");
// Create a stream of data
stream::poll_fn(move || {
// Perform blocking IO
tokio_threadpool::blocking(|| {
// Allocate some space to store anything read
let mut data = vec![0; 128];
// Read 1-128 bytes of data
let n_bytes_read = stdout.read(&mut data).expect("cannot read");
if n_bytes_read == 0 {
// Stdout is done
None
} else {
// Only return as many bytes as we read
data.truncate(n_bytes_read);
Some(data)
}
})
})
}
fn main() {
let output_stream = stream_command_output(Command::new("/tmp/slow.bash"));
let mut runtime = Runtime::new().expect("Unable to start the runtime");
let result = runtime.block_on({
output_stream
.map(|d| String::from_utf8(d).expect("Not UTF-8"))
.fold(Vec::new(), |mut v, s| {
print!("> {}", s);
v.push(s);
Ok(v)
})
});
println!("All the lines: {:?}", result);
}
There's numerous possible tradeoffs that can be made here. For example, always allocating 128 bytes isn't ideal, but it's simple to implement.
Support
For reference, here's slow.bash:
#!/usr/bin/env bash
set -eu
val=0
while [[ $val -lt 10 ]]; do
echo $val
val=$(($val + 1))
sleep 1
done
See also:
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
If Unix support is sufficient, you can also make the two output streams as non-blocking and poll over them as you would do it on TcpStream with the set_nonblocking function.
The ChildStdout and ChildStderr returned by the Command spawn are Stdio (and contain a file descriptor), you can modify directly the read behavior of these handle to make it non-blocking.
Based on the work of jcreekmore/timeout-readwrite-rs and anowell/nonblock-rs, I use this wrapper to modify the stream handles:
extern crate libc;
use std::io::Read;
use std::os::unix::io::AsRawFd;
use libc::{F_GETFL, F_SETFL, fcntl, O_NONBLOCK};
fn set_nonblocking<H>(handle: &H, nonblocking: bool) -> std::io::Result<()>
where
H: Read + AsRawFd,
{
let fd = handle.as_raw_fd();
let flags = unsafe { fcntl(fd, F_GETFL, 0) };
if flags < 0 {
return Err(std::io::Error::last_os_error());
}
let flags = if nonblocking{
flags | O_NONBLOCK
} else {
flags & !O_NONBLOCK
};
let res = unsafe { fcntl(fd, F_SETFL, flags) };
if res != 0 {
return Err(std::io::Error::last_os_error());
}
Ok(())
}
You can manage the two streams as any other non-blocking stream. The following example is based on the polling crate which makes really easy to handle read event and BufReader for line reading:
use std::process::{Command, Stdio};
use std::path::PathBuf;
use std::io::{BufReader, BufRead};
use std::thread;
extern crate polling;
use polling::{Event, Poller};
fn main() -> Result<(), std::io::Error> {
let path = PathBuf::from("./worker.sh").canonicalize()?;
let mut child = Command::new(path)
.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("Failed to start worker");
let handle = thread::spawn({
let stdout = child.stdout.take().unwrap();
set_nonblocking(&stdout, true)?;
let mut reader_out = BufReader::new(stdout);
let stderr = child.stderr.take().unwrap();
set_nonblocking(&stderr, true)?;
let mut reader_err = BufReader::new(stderr);
move || {
let key_out = 1;
let key_err = 2;
let mut out_closed = false;
let mut err_closed = false;
let poller = Poller::new().unwrap();
poller.add(reader_out.get_ref(), Event::readable(key_out)).unwrap();
poller.add(reader_err.get_ref(), Event::readable(key_err)).unwrap();
let mut line = String::new();
let mut events = Vec::new();
loop {
// Wait for at least one I/O event.
events.clear();
poller.wait(&mut events, None).unwrap();
for ev in &events {
// stdout is ready for reading
if ev.key == key_out {
let len = match reader_out.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stdout read returned error: {}", e);
0
}
};
if len == 0 {
println!("stdout closed (len is null)");
out_closed = true;
poller.delete(reader_out.get_ref()).unwrap();
} else {
print!("[STDOUT] {}", line);
line.clear();
// reload the poller
poller.modify(reader_out.get_ref(), Event::readable(key_out)).unwrap();
}
}
// stderr is ready for reading
if ev.key == key_err {
let len = match reader_err.read_line(&mut line) {
Ok(len) => len,
Err(e) => {
println!("stderr read returned error: {}", e);
0
}
};
if len == 0 {
println!("stderr closed (len is null)");
err_closed = true;
poller.delete(reader_err.get_ref()).unwrap();
} else {
print!("[STDERR] {}", line);
line.clear();
// reload the poller
poller.modify(reader_err.get_ref(), Event::readable(key_err)).unwrap();
}
}
}
if out_closed && err_closed {
println!("Stream closed, exiting process thread");
break;
}
}
}
});
handle.join().unwrap();
Ok(())
}
Additionally, used with a wrapper over an EventFd, it becomes possible to easily stop the process from another thread without blocking nor active polling and uses and only a single thread.
EDIT: It seems the polling crate sets automatically the polled handles in non-blocking mode following my tests. The set_nonblocking function is still useful in case you want to directly use the nix::poll object.
I have encountered enough use-cases where it was useful to interact with a subprocess over line-delimited text that I wrote a crate for it, interactive_process.
I expect the original problem has long since been solved, but I thought it might be helpful to others.

Reading from a processes stdout without placing it all in memory at once

I am trying to capture the output from an external tool which is run in a separate process. I would like to do so in a non-blocking way as the output is larger than memory. I saw How would you stream output from a Process in Rust? but I am not sure how to proceed. I have copied an example from here but this seems to capture output into a variable before proceeding. So far I have:
let path = "/Documents/Ruststuff/DB30_igh_badPE.bam";
let output = Command::new("samtools")
.arg("view")
.arg("-H")
.arg(path)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.unwrap_or_else(|e| panic!("failed {}", e));
let mut s = String::new();
match output.stdout.unwrap().read_to_string(&mut s) {
Err(why) => panic!("{}", Error::description(&why)),
Ok(_) => print!("{}", s),
}
Is it possible to loop over stdout from the child process instead of using the match? Something like:
for line in &output.stdout {}
You don't need non-blocking IO for what you want. You can use a buffered reader to loop over the lines of input. This assumes that you always need a full line, and that a full line isn't too much data:
use std::{
io::{BufRead, BufReader},
process::{Command, Stdio},
};
fn main() {
let mut child = Command::new("yes")
.stdout(Stdio::piped())
.spawn()
.expect("Unable to spawn program");
if let Some(stdout) = &mut child.stdout {
let lines = BufReader::new(stdout).lines().enumerate().take(10);
for (counter, line) in lines {
println!("{}, {:?}", counter, line);
}
}
}
ChildStdout implements Read for itself, but not for an immutable reference (&ChildStdout). Although we own the standard out, we don't want to consume it, so we need a reference of some kind. Read is implemented for a mutable reference to any other type that is itself Read, so we switch to a mutable reference. Then child needs to be mutable as well.