How to extract values from &mut iterator? - iterator

I am trying to make an iterator that maps a string to an integer:
fn main() {
use std::collections::HashMap;
let mut word_map = HashMap::new();
word_map.insert("world!", 0u32);
let sentence: Vec<&str> = vec!["Hello", "world!"];
let int_sentence: Vec<u32> = sentence.into_iter()
.map(|x| word_map.entry(x).or_insert(word_map.len() as u32))
.collect();
}
(Rust playground)
This fails with
the trait core::iter::FromIterator<&mut u32> is not implemented for the type collections::vec::Vec<u32>
Adding a dereference operator around the word_map.entry().or_insert() expression does not work as it complains about borrowing which is surprising to me as I'm just trying to copy the value.

The borrow checker uses lexical lifetime rules, so you can't have conflicting borrows in a single expression. The solution is to extract getting the length into a separate let statement:
let int_sentence: Vec<u32> = sentence.into_iter()
.map(|x| *({let len = word_map.len() as u32;
word_map.entry(x).or_insert(len)}))
.collect();
Such issues will hopefully go away when Rust supports non-lexical lifetimes.

Related

Why does indexing a HashMap not return a reference?

I am writing the follwing test code.
fn test() {
let mut m = HashMap::new();
m.insert("aaa".to_string(), "bbb".to_string());
let a = m["aaa"]; // error [E0507] cannot move out of index of `HashMap<String, String>`
let a = m.index("aaa"); // ok, the type of a is &String. I think The compile will add & to m;
let a :&String = (&m).index("aaa"); // ok, the type of a is &String.
println!("{:?}", m["aaa"]); // ok
}
I am not understand why the return type of m["aaa"] is String, not &String. Because the index(&self, key: &Q) -> &V of the trait Index has a &self parameter, I think the compile will add a & to m, and the return type of m["aaa"] should be &String, so String "bbb" will not be moved out of m.
If the compile does not add & to m, it will not find the index() method, the error should be like m cannot be indexed by "bbb";
From the docs for Index:
container[index] is actually syntactic sugar for *container.index(index)
So what happens is that when you write m["aaa"], the compiler is actually adding a * that dereferences the value returned by Index::index, whereas when you call m.index ("aaa"), you get the &String reference directly.
As pointed out by #user4815162342, programmers are supposed to make their intent explicit by writing either &m["aaa"] or m["aaa"].clone().
Moreover println!("{:?}", m["aaa"]); works because the println! macro does add a & to all the values it accesses¹ to prevent accidental moves caused by display, and this cancels out the * added by the compiler.
(1) This is indirectly documented in the docs for the format_args! macro.

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

Return lazy iterator that depends on data allocated within the function

I am new to Rust and reading The Rust Programming Language, and in the Error Handling section there is a "case study" describing a program to read data from a CSV file using the csv and rustc-serialize libraries (using getopts for argument parsing).
The author writes a function search that steps through the rows of the csv file using a csv::Reader object and collect those entries whose 'city' field match a specified value into a vector and returns it. I've taken a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:
extern crate csv;
extern crate rustc_serialize;
use std::path::Path;
use std::fs::File;
fn search<P>(data_path: P, city: &str) -> Vec<DataRow>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.collect()
}
where the DataRow type is just a record,
#[derive(Debug, RustcDecodable)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>
}
Now, the author poses, as the dreaded "exercise to the reader", the problem of modifying this function to return an iterator instead of a vector (eliminating the call to collect). My question is: How can this be done at all, and what are the most concise and idiomatic ways of doing it?
A simple attempt that i think gets the type signature right is
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
Box::new(reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city))
}
I return a trait object of type Box<Iterator<Item=DataRow> + 'a> so as not to have to expose the internal Filter type, and where the lifetime 'a is introduced just to avoid having to make a local clone of city. But this fails to compile because reader does not live long enough; it's allocated on the stack and so is deallocated when the function returns.
I guess this means that reader has to be allocated on the heap (i.e. boxed) from the beginning, or somehow moved off the stack before the function ends. If I were returning a closure, this is exactly the problem that would be solved by making it a move closure. But I don't know how to do something similar when I'm not returning a function. I've tried defining a custom iterator type containing the needed data, but I couldn't get it to work, and it kept getting uglier and more contrived (don't make too much of this code, I'm only including it to show the general direction of my attempts):
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
struct ResultIter<'a> {
reader: csv::Reader<File>,
wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>>
}
impl<'a> Iterator for ResultIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<DataRow>
{ self.wrapped_iterator.unwrap().next() }
}
let file = File::open(data_path).expect("Opening file failed!");
// Incrementally initialise
let mut result_iter = ResultIter {
reader: csv::Reader::from_reader(file).has_headers(true),
wrapped_iterator: None // Uninitialised
};
result_iter.wrapped_iterator =
Some(Box::new(result_iter.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|&row: &DataRow| row.city == city)));
Box::new(result_iter)
}
This question seems to concern the same problem, but the author of the answer solves it by making the concerned data static, which I don't think is an alternative for this question.
I am using Rust 1.10.0, the current stable version from the Arch Linux package rust.
CSV 1.0
As I alluded to in the answer for older versions of the crate, the best way of solving this is for the CSV crate to have an owning iterator, which it now does: DeserializeRecordsIntoIter
use csv::ReaderBuilder; // 1.1.1
use serde::Deserialize; // 1.0.104
use std::{fs::File, path::Path};
#[derive(Debug, Deserialize)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>,
}
fn search_iter(data_path: impl AsRef<Path>, city: &str) -> impl Iterator<Item = DataRow> + '_ {
let file = File::open(data_path).expect("Opening file failed");
ReaderBuilder::new()
.has_headers(true)
.from_reader(file)
.into_deserialize::<DataRow>()
.map(|row| row.expect("Failed decoding row"))
.filter(move |row| row.city == city)
}
Before version 1.0
The straightest path to convert the original function would be to simply wrap the iterator. However, doing so directly will lead to problems because you cannot return an object that refers to itself and the result of decode refers to the Reader. If you could surmount that, you cannot have an iterator return references to itself.
One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:
fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a>
where
P: AsRef<Path>,
{
let file = File::open(data_path).expect("Opening file failed!");
MyIter {
reader: csv::Reader::from_reader(file).has_headers(true),
city: city,
}
}
struct MyIter<'a> {
reader: csv::Reader<File>,
city: &'a str,
}
impl<'a> Iterator for MyIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<Self::Item> {
let city = self.city;
self.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.next()
}
}
This could have overhead associated with it, depending on the implementation of decode. Additionally, this might "rewind" back to the beginning of the input — if you substituted a Vec instead of a csv::Reader, you would see this. However, it happens to work in this case.
Beyond that, I'd normally open the file and create the csv::Reader outside of the function and pass in the DecodedRecords iterator and transform it, returning a newtype / box / type alias around the underlying iterator. I prefer this because the structure of your code mirrors the lifetimes of the objects.
I'm a little surprised that there isn't an implementation of IntoIterator for csv::Reader, which would also solve the problem because there would not be any references.
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?
What is the correct way to return an Iterator (or any other trait)?

Chaining iterators of different types

I get type errors when chaining different types of Iterator.
let s = Some(10);
let v = (1..5).chain(s.iter())
.collect::<Vec<_>>();
Output:
<anon>:23:20: 23:35 error: type mismatch resolving `<core::option::Iter<'_, _> as core::iter::IntoIterator>::Item == _`:
expected &-ptr,
found integral variable [E0271]
<anon>:23 let v = (1..5).chain(s.iter())
^~~~~~~~~~~~~~~
<anon>:23:20: 23:35 help: see the detailed explanation for E0271
<anon>:24:14: 24:33 error: no method named `collect` found for type `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>>` in the current scope
<anon>:24 .collect::<Vec<_>>();
^~~~~~~~~~~~~~~~~~~
<anon>:24:14: 24:33 note: the method `collect` exists but the following trait bounds were not satisfied: `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>> : core::iter::Iterator`
error: aborting due to 2 previous errors
But it works fine when zipping:
let s = Some(10);
let v = (1..5).zip(s.iter())
.collect::<Vec<_>>();
Output:
[(1, 10)]
Why is Rust able to infer the correct types for zip but not for chain and how can I fix it? n.b. I want to be able to do this for any iterator, so I don't want a solution that just works for Range and Option.
First, note that the iterators yield different types. I've added an explicit u8 to the numbers to make the types more obvious:
fn main() {
let s = Some(10u8);
let r = (1..5u8);
let () = s.iter().next(); // Option<&u8>
let () = r.next(); // Option<u8>
}
When you chain two iterators, both iterators must yield the same type. This makes sense as the iterator cannot "switch" what type it outputs when it gets to the end of one and begins on the second:
fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter>
where U: IntoIterator<Item=Self::Item>
// ^~~~~~~~~~~~~~~ This means the types must match
So why does zip work? Because it doesn't have that restriction:
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter>
where U: IntoIterator
// ^~~~ Nothing here!
This is because zip returns a tuple with one value from each iterator; a new type, distinct from either source iterator's type. One iterator could be an integral type and the other could return your own custom type for all zip cares.
Why is Rust able to infer the correct types for zip but not for chain
There is no type inference happening here; that's a different thing. This is just plain-old type mismatching.
and how can I fix it?
In this case, your inner iterator yields a reference to an integer, a Clone-able type, so you can use cloned to make a new iterator that clones each value and then both iterators would have the same type:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.iter().cloned()).collect();
}
If you are done with the option, you can also use a consuming iterator with into_iter:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.into_iter()).collect();
}

Reading an integer from input and assigning it to a variable

I've been trying to find an easy way to read variables in Rust, but haven't had any luck so far. All the examples in the Rust Book deal with strings AFAIK, I couldn't find anything concerning integers or floats that would work.
I don't have a Rust compiler on this machine, but based in part on this answer that comes close, you want something like...
let user_val = match input_string.parse::<i32>() {
Ok(x) => x,
Err(_) => -1,
};
Or, as pointed out in the comments,
let user_val = input_string.parse::<i32>().unwrap_or(-1);
...though your choice in integer size and default value might obviously be different, and you don't always need that type qualifier (::<i32>) for parse() where the type can be inferred from the assignment.
To read user input, you always read a set of bytes. Sometimes, you can interpret those bytes as a UTF-8 string. You can then further interpret the string as an integral or floating point number (or lots of other things, like an IP address).
Here's a complete example of reading a single line of input and parsing it as a 32-bit signed integer:
use std::io;
fn main() {
let mut input = String::new();
io::stdin().read_line(&mut input).expect("Not a valid string");
let input_num: i32 = input.trim().parse().expect("Not a valid number");
println!("Your number plus one is {}", input_num + 1);
}
Note that no user-friendly error handling is taking place. The program simply panics if reading input or parsing fails. Running the program produces:
$ ./input
41
Your number plus one is 42
A set of bytes comprises an input. In Rust, you accept the input as a UTF-8 String. Then you parse the string to an integer or floating point number. In simple ways you accept the string and parse it, then write an expect`` statement for both, to display a message to the user what went wrong when the program panics during runtime.
fn main() {
let mut x = String::new();
std::io::stdin().read_line(&mut x)
.expect("Failed to read input.");
let x: u32 = x.trim().parse()
.expect("Enter a number not a string.");
println!("{:?}", x);
}
If the program fails to parse the input string then it panics and displays an error message. Notice that the program still panics and we are not handling an error perfectly. One more thing to notice is that we can use the same variable name x and not some x_int because of the variable shadowing feature. To handle the error better we can use the match construct.
fn main() {
let mut x = String::new();
match std::io::stdin().read_line(&mut x) {
Ok(_) => println!("String has been taken in."),
Err(_) => {
println!("Failed to read input.");
return;
},
};
let x: u32 = match x.trim().parse() {
Ok(n) => {
println!("Converted string to int.");
n
},
Err(_) => {
println!("Failed to parse.");
return;
},
};
println!("{:?}", x);
}
This is longer way but a nicer way to handle errors and input and parse a number.