What's the de-facto way of reading and writing files in Rust 1.x? - file-io

With Rust being comparatively new, I've seen far too many ways of reading and writing files. Many are extremely messy snippets someone came up with for their blog, and 99% of the examples I've found (even on Stack Overflow) are from unstable builds that no longer work. Now that Rust is stable, what is a simple, readable, non-panicking snippet for reading or writing files?
This is the closest I've gotten to something that works in terms of reading a text file, but it's still not compiling even though I'm fairly certain I've included everything I should have. This is based off of a snippet I found on Google+ of all places, and the only thing I've changed is that the old BufferedReader is now just BufReader:
use std::fs::File;
use std::io::BufReader;
use std::path::Path;
fn main() {
let path = Path::new("./textfile");
let mut file = BufReader::new(File::open(&path));
for line in file.lines() {
println!("{}", line);
}
}
The compiler complains:
error: the trait bound `std::result::Result<std::fs::File, std::io::Error>: std::io::Read` is not satisfied [--explain E0277]
--> src/main.rs:7:20
|>
7 |> let mut file = BufReader::new(File::open(&path));
|> ^^^^^^^^^^^^^^
note: required by `std::io::BufReader::new`
error: no method named `lines` found for type `std::io::BufReader<std::result::Result<std::fs::File, std::io::Error>>` in the current scope
--> src/main.rs:8:22
|>
8 |> for line in file.lines() {
|> ^^^^^
To sum it up, what I'm looking for is:
brevity
readability
covers all possible errors
doesn't panic

None of the functions I show here panic on their own, but I am using expect because I don't know what kind of error handling will fit best into your application. Go read The Rust Programming Language's chapter on error handling to understand how to appropriately handle failure in your own program.
Rust 1.26 and onwards
If you don't want to care about the underlying details, there are one-line functions for reading and writing.
Read a file to a String
use std::fs;
fn main() {
let data = fs::read_to_string("/etc/hosts").expect("Unable to read file");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs;
fn main() {
let data = fs::read("/etc/hosts").expect("Unable to read file");
println!("{}", data.len());
}
Write a file
use std::fs;
fn main() {
let data = "Some data!";
fs::write("/tmp/foo", data).expect("Unable to write file");
}
Rust 1.0 and onwards
These forms are slightly more verbose than the one-line functions that allocate a String or Vec for you, but are more powerful in that you can reuse allocated data or append to an existing object.
Reading data
Reading a file requires two core pieces: File and Read.
Read a file to a String
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = String::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
Read a file as a Vec<u8>
use std::fs::File;
use std::io::Read;
fn main() {
let mut data = Vec::new();
let mut f = File::open("/etc/hosts").expect("Unable to open file");
f.read_to_end(&mut data).expect("Unable to read data");
println!("{}", data.len());
}
Write a file
Writing a file is similar, except we use the Write trait and we always write out bytes. You can convert a String / &str to bytes with as_bytes:
use std::fs::File;
use std::io::Write;
fn main() {
let data = "Some data!";
let mut f = File::create("/tmp/foo").expect("Unable to create file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered I/O
I felt a bit of a push from the community to use BufReader and BufWriter instead of reading straight from a file
A buffered reader (or writer) uses a buffer to reduce the number of I/O requests. For example, it's much more efficient to access the disk once to read 256 bytes instead of accessing the disk 256 times.
That being said, I don't believe a buffered reader/writer will be useful when reading the entire file. read_to_end seems to copy data in somewhat large chunks, so the transfer may already be naturally coalesced into fewer I/O requests.
Here's an example of using it for reading:
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
let mut data = String::new();
let f = File::open("/etc/hosts").expect("Unable to open file");
let mut br = BufReader::new(f);
br.read_to_string(&mut data).expect("Unable to read string");
println!("{}", data);
}
And for writing:
use std::fs::File;
use std::io::{BufWriter, Write};
fn main() {
let data = "Some data!";
let f = File::create("/tmp/foo").expect("Unable to create file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}
A BufReader is more useful when you want to read line-by-line:
use std::fs::File;
use std::io::{BufRead, BufReader};
fn main() {
let f = File::open("/etc/hosts").expect("Unable to open file");
let f = BufReader::new(f);
for line in f.lines() {
let line = line.expect("Unable to read line");
println!("Line: {}", line);
}
}

For anybody who is writing to a file, the accepted answer is good but if you need to append to the file you have to use the OpenOptions struct instead:
use std::io::Write;
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let mut f = OpenOptions::new()
.append(true)
.create(true) // Optionally create the file if it doesn't already exist
.open("/tmp/foo")
.expect("Unable to open file");
f.write_all(data.as_bytes()).expect("Unable to write data");
}
Buffered writing still works the same way:
use std::io::{BufWriter, Write};
use std::fs::OpenOptions;
fn main() {
let data = "Some data!\n";
let f = OpenOptions::new()
.append(true)
.open("/tmp/foo")
.expect("Unable to open file");
let mut f = BufWriter::new(f);
f.write_all(data.as_bytes()).expect("Unable to write data");
}

By using the Buffered I/O you can copy the file size is greater than the actual memory.
use std::fs::{File, OpenOptions};
use std::io::{BufReader, BufWriter, Write, BufRead};
fn main() {
let read = File::open(r#"E:\1.xls"#);
let write = OpenOptions::new().write(true).create(true).open(r#"E:\2.xls"#);
let mut reader = BufReader::new(read.unwrap());
let mut writer = BufWriter::new(write.unwrap());
let mut length = 1;
while length > 0 {
let buffer = reader.fill_buf().unwrap();
writer.write(buffer);
length = buffer.len();
reader.consume(length);
}
}

Related

Fastest method of streaming csv formatted data from ChildStdout to File

I'm looking for ways to speed up writing data to a file, and I've tried several methods without much success.
Essentially the process is
Spawn a child process that sends a request to BigQuery through the BQ
CLI API.
Stream data to a MutexGuard<Box<dyn Write>> using std::io::copy.
Here is a small reproduction of the code as I have it currently. This probably won't compile on its own, but I think it will be enough to explain what I'm doing in detail.
use std::process::{Command, ChildStdout, Child, Stdio};
use std::time::Duration;
use std::io::{BufReader, BufWriter, Read, Write};
use std::sync::MutexGuard;
use std::box::Box;
const CAP: usize = 1024 * 8;
fn main() {
let cmd = "bq --format csv query";
let cmd = cmd.split_whitespace();
let query = "path/to/query.sql"
let (mut child, stdout) = spawn(cmd, query).await?
let save_path = "path/to/save/file.csv"
let file = File::open(save_path).unwrap();
let mut rdr = BufReader::with_capacity(CAP, stdout);
let wtr = Mutex::new(Box::new(BufWriter::with_capacity(CAP, file)));
write(&mut child, &mut rdr, &wtr).unwrap();
}
fn create_cmd(args: Vec<&str>) -> Result<Command> {
let mut v_itr = args.into_iter();
let mut cmd = match v_itr.next() {
Some(s) => Command::new(which(s).unwrap()),
None => return Err("Must supply at least one command argument.".into())
};
while let Some(s) = v_itr.next() {
// This is essentially Vec's "push", but for adding arguments to a shell command.
cmd.arg(s);
};
Ok(cmd)
}
fn spawn(cmd: Vec<u8>, query: &str) -> Result<ChildStdout> {
let cmd = create_cmd(cmd)
let query = std::fs::read_to_string(query).unwrap();
let query = query.trim();
cmd.arg(query);
let mut child = match cmd
.stdout(Stdio::piped())
.spawn()
{
Ok(c) => c,
Err(e) => return Err(e.into()),
};
// Sleep while waiting for the child process to fully start,
// otherwise we may get an error when getting a handle to stdout.
let sleep_time = Duration::new(2, 0);
sleep(sleep_time);
Ok(child, stdout)
}
fn write<R: Read, W: Write>(child: &mut Child, rdr: &mut R, wtr: Mutex<Box<W>>) -> Result<()> {
let mut wtr_lock = wtr.lock().unwrap();
let mut wtr_ref = &mut *wtr_lock;
while let Ok(None) = child.try_wait() {
std::io::copy(rdr, wtr_ref).unwrap();
}
Ok(())
}
So far I've tried:
Using tokio equivalents of the std BufWriter and BufReader
Not using any buffered input
Using LineWriter for output
Using buffered writing with no buffered reading
Implementing an async equivalent of stack_buffer_copy from std but with Tokio (this was extremely slow - I basically copied almost word for word what was in std and then implemented BufMut for a struct that wrapped ReadBuf)
The data sizes I'm writing can really be anything up to a few GiB, but I'm not worried about speed for anything below 20 MiB. Currently, I get speeds of a little over 1 MiB/s, which is much slower than I want. I don't mind if the solution is complex, I just want to see what's possible (I'm mainly doing this as a learning opportunity).
As a side note, the response from BQ usually only takes about 5~10 seconds (the BQ command actually prints something like Waiting on bqjob_**** ... (10s) Current status: DONE to stderr) from start to finish, so I know that isn't a bottleneck in this case. The computer I'm using is a Mac M1 Air with 16GiB of RAM. I'm compiling using the following command: RUSTFLAGS="--emit=asm" cargo build --release.
If you don't do any processing of the data, I'd try to avoid piping and copying the data, and just let the child process write to that file directly.
So instead of this:
cmd.stdout(Stdio::piped())
you can pass your file:
cmd.stdout(file)
This should work, because Stdio implements From<File>. There's an example of this here (although with stdin, not stdout).
To measure your baseline speed, you can run the command from the terminal:
$ bq ... > path/to/save/file.csv
Then you'd know what to expect in terms of speed.
P.S. If you are curious why your program is slow, try using flamegraph.

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

Return lazy iterator that depends on data allocated within the function

I am new to Rust and reading The Rust Programming Language, and in the Error Handling section there is a "case study" describing a program to read data from a CSV file using the csv and rustc-serialize libraries (using getopts for argument parsing).
The author writes a function search that steps through the rows of the csv file using a csv::Reader object and collect those entries whose 'city' field match a specified value into a vector and returns it. I've taken a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:
extern crate csv;
extern crate rustc_serialize;
use std::path::Path;
use std::fs::File;
fn search<P>(data_path: P, city: &str) -> Vec<DataRow>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.collect()
}
where the DataRow type is just a record,
#[derive(Debug, RustcDecodable)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>
}
Now, the author poses, as the dreaded "exercise to the reader", the problem of modifying this function to return an iterator instead of a vector (eliminating the call to collect). My question is: How can this be done at all, and what are the most concise and idiomatic ways of doing it?
A simple attempt that i think gets the type signature right is
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
Box::new(reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city))
}
I return a trait object of type Box<Iterator<Item=DataRow> + 'a> so as not to have to expose the internal Filter type, and where the lifetime 'a is introduced just to avoid having to make a local clone of city. But this fails to compile because reader does not live long enough; it's allocated on the stack and so is deallocated when the function returns.
I guess this means that reader has to be allocated on the heap (i.e. boxed) from the beginning, or somehow moved off the stack before the function ends. If I were returning a closure, this is exactly the problem that would be solved by making it a move closure. But I don't know how to do something similar when I'm not returning a function. I've tried defining a custom iterator type containing the needed data, but I couldn't get it to work, and it kept getting uglier and more contrived (don't make too much of this code, I'm only including it to show the general direction of my attempts):
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
struct ResultIter<'a> {
reader: csv::Reader<File>,
wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>>
}
impl<'a> Iterator for ResultIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<DataRow>
{ self.wrapped_iterator.unwrap().next() }
}
let file = File::open(data_path).expect("Opening file failed!");
// Incrementally initialise
let mut result_iter = ResultIter {
reader: csv::Reader::from_reader(file).has_headers(true),
wrapped_iterator: None // Uninitialised
};
result_iter.wrapped_iterator =
Some(Box::new(result_iter.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|&row: &DataRow| row.city == city)));
Box::new(result_iter)
}
This question seems to concern the same problem, but the author of the answer solves it by making the concerned data static, which I don't think is an alternative for this question.
I am using Rust 1.10.0, the current stable version from the Arch Linux package rust.
CSV 1.0
As I alluded to in the answer for older versions of the crate, the best way of solving this is for the CSV crate to have an owning iterator, which it now does: DeserializeRecordsIntoIter
use csv::ReaderBuilder; // 1.1.1
use serde::Deserialize; // 1.0.104
use std::{fs::File, path::Path};
#[derive(Debug, Deserialize)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>,
}
fn search_iter(data_path: impl AsRef<Path>, city: &str) -> impl Iterator<Item = DataRow> + '_ {
let file = File::open(data_path).expect("Opening file failed");
ReaderBuilder::new()
.has_headers(true)
.from_reader(file)
.into_deserialize::<DataRow>()
.map(|row| row.expect("Failed decoding row"))
.filter(move |row| row.city == city)
}
Before version 1.0
The straightest path to convert the original function would be to simply wrap the iterator. However, doing so directly will lead to problems because you cannot return an object that refers to itself and the result of decode refers to the Reader. If you could surmount that, you cannot have an iterator return references to itself.
One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:
fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a>
where
P: AsRef<Path>,
{
let file = File::open(data_path).expect("Opening file failed!");
MyIter {
reader: csv::Reader::from_reader(file).has_headers(true),
city: city,
}
}
struct MyIter<'a> {
reader: csv::Reader<File>,
city: &'a str,
}
impl<'a> Iterator for MyIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<Self::Item> {
let city = self.city;
self.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.next()
}
}
This could have overhead associated with it, depending on the implementation of decode. Additionally, this might "rewind" back to the beginning of the input — if you substituted a Vec instead of a csv::Reader, you would see this. However, it happens to work in this case.
Beyond that, I'd normally open the file and create the csv::Reader outside of the function and pass in the DecodedRecords iterator and transform it, returning a newtype / box / type alias around the underlying iterator. I prefer this because the structure of your code mirrors the lifetimes of the objects.
I'm a little surprised that there isn't an implementation of IntoIterator for csv::Reader, which would also solve the problem because there would not be any references.
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?
What is the correct way to return an Iterator (or any other trait)?

Reading an integer from input and assigning it to a variable

I've been trying to find an easy way to read variables in Rust, but haven't had any luck so far. All the examples in the Rust Book deal with strings AFAIK, I couldn't find anything concerning integers or floats that would work.
I don't have a Rust compiler on this machine, but based in part on this answer that comes close, you want something like...
let user_val = match input_string.parse::<i32>() {
Ok(x) => x,
Err(_) => -1,
};
Or, as pointed out in the comments,
let user_val = input_string.parse::<i32>().unwrap_or(-1);
...though your choice in integer size and default value might obviously be different, and you don't always need that type qualifier (::<i32>) for parse() where the type can be inferred from the assignment.
To read user input, you always read a set of bytes. Sometimes, you can interpret those bytes as a UTF-8 string. You can then further interpret the string as an integral or floating point number (or lots of other things, like an IP address).
Here's a complete example of reading a single line of input and parsing it as a 32-bit signed integer:
use std::io;
fn main() {
let mut input = String::new();
io::stdin().read_line(&mut input).expect("Not a valid string");
let input_num: i32 = input.trim().parse().expect("Not a valid number");
println!("Your number plus one is {}", input_num + 1);
}
Note that no user-friendly error handling is taking place. The program simply panics if reading input or parsing fails. Running the program produces:
$ ./input
41
Your number plus one is 42
A set of bytes comprises an input. In Rust, you accept the input as a UTF-8 String. Then you parse the string to an integer or floating point number. In simple ways you accept the string and parse it, then write an expect`` statement for both, to display a message to the user what went wrong when the program panics during runtime.
fn main() {
let mut x = String::new();
std::io::stdin().read_line(&mut x)
.expect("Failed to read input.");
let x: u32 = x.trim().parse()
.expect("Enter a number not a string.");
println!("{:?}", x);
}
If the program fails to parse the input string then it panics and displays an error message. Notice that the program still panics and we are not handling an error perfectly. One more thing to notice is that we can use the same variable name x and not some x_int because of the variable shadowing feature. To handle the error better we can use the match construct.
fn main() {
let mut x = String::new();
match std::io::stdin().read_line(&mut x) {
Ok(_) => println!("String has been taken in."),
Err(_) => {
println!("Failed to read input.");
return;
},
};
let x: u32 = match x.trim().parse() {
Ok(n) => {
println!("Converted string to int.");
n
},
Err(_) => {
println!("Failed to parse.");
return;
},
};
println!("{:?}", x);
}
This is longer way but a nicer way to handle errors and input and parse a number.

Unable to read file contents to string - Result does not implement any method in scope named `read_to_string`

I followed the code to open a file from Rust by Example:
use std::{env, fs::File, path::Path};
fn main() {
let args: Vec<_> = env::args().collect();
let pattern = &args[1];
if let Some(a) = env::args().nth(2) {
let path = Path::new(&a);
let mut file = File::open(&path);
let mut s = String::new();
file.read_to_string(&mut s);
println!("{:?}", s);
} else {
//do something
}
}
However, I got a message like this:
error[E0599]: no method named `read_to_string` found for type `std::result::Result<std::fs::File, std::io::Error>` in the current scope
--> src/main.rs:11:14
|
11 | file.read_to_string(&mut s);
| ^^^^^^^^^^^^^^ method not found in `std::result::Result<std::fs::File, std::io::Error>`
What am I doing wrong?
Let's look at your error message:
error[E0599]: no method named `read_to_string` found for type `std::result::Result<std::fs::File, std::io::Error>` in the current scope
--> src/main.rs:11:14
|
11 | file.read_to_string(&mut s);
| ^^^^^^^^^^^^^^ method not found in `std::result::Result<std::fs::File, std::io::Error>`
The error message is pretty much what it says on the tin - the type Result does not have the method read_to_string. That's actually a method on the trait Read.
You have a Result because File::open(&path) can fail. Failure is represented with the Result type. A Result may be either an Ok, which is the success case, or an Err, the failure case.
You need to handle the failure case somehow. The easiest is to just die on failure, using expect:
let mut file = File::open(&path).expect("Unable to open");
You'll also need to bring Read into scope to have access to read_to_string:
use std::io::Read;
I'd highly recommend reading through The Rust Programming Language and working the examples. The chapter Recoverable Errors with Result will be highly relevant. I think these docs are top-notch!
If your method returns Result<String, io::Error>, you can use ? on the functions that return a Result:
fn read_username_from_file() -> Result<String, io::Error> {
let mut f = File::open("hello.txt")?;
let mut s = String::new();
f.read_to_string(&mut s)?;
Ok(s)
}
If you cannot return a Result<String, io::Error> then you have to handle error case using expect as mentioned in the accepted answer or matching on the Result and panicking:
let file = File::open(&opt_raw.config);
let file = match file {
Ok(file) => file,
Err(error) => {
panic!("Problem opening the file: {:?}", error)
}
};
For more information, please refer to Recoverable Errors with Result.