Why does the Fuse iterator adapter not work as expected? - iterator

I'm experimenting with the Fuse iterator adapter and am getting unexpected results (Playground link):
fn main() {
let mut i1 = (1..3).scan(1, |_, x| {
if x < 2 { None } else { Some(x) }
});
println!("{:?}", i1.next());
println!("{:?}", i1.next());
println!("{:?}", i1.next());
println!("");
let mut i2 = (1..3).scan(1, |_, x| {
if x < 2 { None } else { Some(x) }
}).fuse();
println!("{:?}", i2.next());
println!("{:?}", i2.next()); // This should print None
println!("{:?}", i2.next());
println!("");
}
Which prints:
None
Some(2)
None
None
Some(2)
None
Iterator i1 is returning what I expect. It returns None, then Some(2), then None. i2 is the same iterator adapted with fuse(). Fuse should make it return None after the first None, and since the first value it returns is None that should be the only value it returns. However, it behaves the same as i1. What am I doing wrong?

TL;DR Summary: This was a bug and is fixed in Rust 1.19 and newer.
I'm pretty sure you are doing nothing wrong. This appears to be either a bug (my guess) or a very confusing interaction. Check out this expanded example:
#![feature(fused)]
fn dump<I: Iterator<Item = i32>>(label: &str, mut iter: I) {
println!("= Running: {}", label);
for _ in 0..10 {
println!("{:?}", iter.next());
}
println!("");
}
fn boxed_internal_fuse() -> Box<Iterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) })
.fuse())
}
fn boxed_no_fuse() -> Box<Iterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) }))
}
use std::iter::FusedIterator;
fn boxed_no_fuse_but_fused() -> Box<FusedIterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) }))
}
fn main() {
let i1 = (1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) });
dump("Scan", i1);
let i2 = (1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) })
.fuse();
dump("Fuse<Scan>", i2);
dump("Box<Fuse<Scan>>", boxed_internal_fuse());
dump("Fuse<Box<Iterator>>", boxed_no_fuse().fuse()); // All `None`s
dump("Fuse<Box<FusedIterator>>", boxed_no_fuse_but_fused().fuse());
}
The trick is that FusedIterator is a unstable trait that is aimed at improving the efficiency. It lets Iterator::fuse know that it's a no-op.
However, in this case, the conditions are necessary but not sufficient:
impl<B, I, St, F> FusedIterator for Scan<I, St, F>
where I: FusedIterator, F: FnMut(&mut St, I::Item) -> Option<B> {}
It's true that if the underlying iterator is FusedIterator and starts returning None, scan will keep returning None. However, that's not the only way to get None — the closure can also return None!

Related

How to apply a function to multiple columns of a polars DataFrame in Rust

I'd like to apply a user-define function which takes a few inputs (corresponding some columns in a polars DataFrame) to some columns of a polars DataFrame in Rust. The pattern that I'm using is as below. I wonder is this the best practice?
fn my_filter_func(col1: &Series, col2: &Series, col2: &Series) -> ReturnType {
let it = (0..n).map(|i| {
let col1 = match col.get(i) {
AnyValue::UInt64(val) => val,
_ => panic!("Wrong type of col1!"),
};
// similar for col2 and col3
// apply user-defined function to col1, col2 and col3
}
// convert it to a collection of the required type
}
You can downcast the Series to the proper type you want to iterate over, and then use rust iterators to apply your logic.
fn my_black_box_function(a: f32, b: f32) -> f32 {
// do something
a
}
fn apply_multiples(col_a: &Series, col_b: &Series) -> Float32Chunked {
match (col_a.dtype(), col_b.dtype()) {
(DataType::Float32, DataType::Float32) => {
let a = col_a.f32().unwrap();
let b = col_b.f32().unwrap();
a.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect()
}
_ => panic!("unpexptected dtypes"),
}
}
Lazy API
You don't have to leave the lazy API to be able to access my_black_box_function.
We can collect the columns we want to apply in a Struct data type and then apply a closure over that Series.
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
df![
"a" => [1.0, 2.0, 3.0],
"b" => [3.0, 5.1, 0.3]
]?
.lazy()
.select([concat_lst(["col_a", "col_b"]).map(
|s| {
let ca = s.struct_()?;
let b = ca.field_by_name("col_a")?;
let a = ca.field_by_name("col_b")?;
let a = a.f32()?;
let b = b.f32()?;
let out: Float32Chunked = a
.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Float32),
)])
.collect()
}
The solution I found working for me is with map_multiple(my understanding - this to be used if no groupby/agg) or apply_multiple(my understanding - whenerver you have groupby/agg). Alternatively, you could also use map_many or apply_many. See below.
use polars::prelude::*;
use polars::df;
fn main() {
let df = df! [
"names" => ["a", "b", "a"],
"values" => [1, 2, 3],
"values_nulls" => [Some(1), None, Some(3)],
"new_vals" => [Some(1.0), None, Some(3.0)]
].unwrap();
println!("{:?}", df);
//df.try_apply("values_nulls", |s: &Series| s.cast(&DataType::Float64)).unwrap();
let df = df.lazy()
.groupby([col("names")])
.agg( [
total_delta_sens().sum()
]
);
println!("{:?}", df.collect());
}
pub fn total_delta_sens () -> Expr {
let s: &mut [Expr] = &mut [col("values"), col("values_nulls"), col("new_vals")];
fn sum_fa(s: &mut [Series])->Result<Series>{
let mut ss = s[0].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap().clone();
for i in 1..s.len(){
ss = ss.add_to(&s[i].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap()).unwrap();
}
Ok(ss)
}
let o = GetOutput::from_type(DataType::Float64);
map_multiple(sum_fa, s, o)
}
Here total_delta_sens is just a wrapper function for convenience. You don't have to use it.You can do directly this within your .agg([]) or .with_columns([]) :
lit::<f64>(0.0).map_many(sum_fa, &[col("norm"), col("uniform")], o)
Inside sum_fa you can as Richie already mentioned downcast to ChunkedArray and .iter() or even .par_iter()
Hope that helps

Trying to iterate through a mesh structure results in "borrowed value does not live long enough"

I am trying to create a data structure that can store monitor location data.
The approach I'm taking is that each screen has references to 4 neighbors stored in a HashMap<Direction, Rc<RefCell<Screen>>>.
What I'm struggling with is how to "move"/iterate in a direction until I reach the edge of the mesh:
use std::cell::RefCell;
use std::collections::HashMap;
use std::rc::Rc;
#[derive(Debug, Copy, Clone, Default)]
pub struct ScreenCoord {
left: i32,
right: i32,
bottom: i32,
top: i32,
}
impl ScreenCoord {
fn new_primary_from_dimensions(width: i32, height: i32) -> ScreenCoord {
ScreenCoord {
left: 0,
right: width,
top: 0,
bottom: height,
}
}
}
pub struct ScreenNetwork {
primary_screen: Rc<RefCell<Screen>>,
}
impl ScreenNetwork {
pub fn new(width: i32, height: i32) -> ScreenNetwork {
ScreenNetwork {
primary_screen: Rc::new(RefCell::new(Screen::new(
ScreenCoord::new_primary_from_dimensions(width, height),
))),
}
}
pub fn add_screen(&mut self, new_width: i32, new_height: i32, direction: Direction) {
let mut new_scrn = Screen::new(ScreenCoord::new_primary_from_dimensions(
new_width, new_height,
));
let mut original_screen = &self.primary_screen;
while let Some(next) = original_screen.borrow().neighbours.get(&direction) {
original_screen = next;
}
// Do stuff with original_screen
// new_scrn.neighbours.insert(Direction::from_u8((direction.clone() as u8) ^ 0b11).unwrap(), original_screen.clone());
// original_screen.borrow_mut().neighbours.insert(direction, Rc::new(RefCell::new(new_scrn)));
}
}
/// Screen with information about adjacent screens
#[derive(Default, Debug)]
pub struct Screen {
neighbours: HashMap<Direction, Rc<RefCell<Screen>>>,
coordinates: ScreenCoord,
}
impl Screen {
pub fn new(coord: ScreenCoord) -> Screen {
Screen {
coordinates: coord,
neighbours: HashMap::new(),
}
}
}
#[derive(Debug, PartialEq, Hash, Eq, Clone)]
pub enum Direction {
Left = 0,
Top,
Bottom,
Right,
}
fn main() {}
playground
error[E0597]: borrowed value does not live long enough
--> src/main.rs:43:32
|
43 | while let Some(next) = original_screen.borrow().neighbours.get(&direction) {
| ^^^^^^^^^^^^^^^^^^^^^^^^ - temporary value dropped here while still borrowed
| |
| temporary value does not live long enough
...
51 | }
| - temporary value needs to live until here
I understand why the error is happening, but I have no clue as to how I can fix it. I tried to clone original_screen and next, but then the compiler complains that I am trying to set original_screen while still borrowing it.
You are using Rc, so take advantage of it. Generally speaking, taking a reference to a Rc is asking for problems.
First try: clone Rc when needed:
let mut original_screen = self.primary_screen.clone();
while let Some(next) = original_screen.borrow().neighbours.get(&direction) {
original_screen = next.clone();
}
It fails with the message:
cannot assign to original_screen because it is borrowed
True enough: having a borrow in the while condition will not let you update it, but it can be rewritten easily:
loop {
if let Some(next) = original_screen.borrow().neighbours.get(&direction) {
original_screen = next.clone()
} else {
break;
}
}
That code is equivalent, and indeed it fails with the same error message! But now we are closer to a solution:
loop {
let next = if let Some(next) = original_screen.borrow().neighbours.get(&direction) {
next.clone()
} else {
break;
};
original_screen = next;
}
Now it compiles because the assignment is done after the borrow is released.

Remove unwrap from vec pop

This is a function (that works) that produces a vector of things but there is some ugly unwrapping in there as it involves popping another vector.
fn load_into_vec(file_buf: String) -> Vec<Data> {
let mut data_vec: Vec<Data> = Vec::new();
for line_iter in file_buf.lines() {
let mut line_vec: Vec<&str> = line_iter.split(' ').collect();
let (t1, t2, t3, t4): (u32, u32, u32, u32) =
(
/** ANOTHER WAY TO WRITE THIS? **/
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap()
);
let mut data_node = Data::new();
data_node.load((t4, t3, t2, t1));
data_vec.push(data_node);
}
data_vec
}
Is there was another way to rewrite the above block without using unwrap or to use unwrap in a way that it will not panic if it encounters None?
You can use iterators and match to write a more idiomatic version of your loop body. Instead of collecting into an intermediate Vec, you match on calling iter.next() four times, to extract your four integers. In case one of the iter.next() calls doesn't succeed, you can panic.
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
I would even go as far as to rewrite the entire function:
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
data_node
})
.collect()
Even better would be to have the function return a Result that denotes when an error has occurred:
enum MyError {
NotAnInt,
TooFewNumbers,
TooManyNumbers,
}
fn load_into_vec2(file_buf: String) -> Result<Vec<Data>, MyError> {
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.fuse();
match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(Ok(t1)), Some(Ok(t2)), Some(Ok(t3)), Some(Ok(t4))) => if iter.next().is_some() {
Err(MyError::TooManyNumbers)
} else {
let mut data_node = Data::new();
data_node.load((t1, t2, t3, t4));
Ok(data_node)
},
(None, _, _, _) |
(_, None, _, _) |
(_, _, None, _) |
(_, _, _, None) => Err(MyError::TooFewNumbers),
(Some(Err(_)), _, _, _) |
(_, Some(Err(_)), _, _) |
(_, _, Some(Err(_)), _) |
(_, _, _, Some(Err(_))) => Err(MyError::NotAnInt),
}
})
.collect()
}

How to handle errors from the Read::read_to_end method?

I am reading a file and parsing the data:
fn main() {
parse_torrentfile("ubuntu-16.04.1-server-amd64.torrent");
}
fn parse_file(filename: &'static str) {
let mut f = File::open(&Path::new(filename)).unwrap();
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
}
If ubuntu-16.04.1-server-amd64.torrent is present, this works well, but if it isn't present, this error occurs:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 2, message: "No such file or directory" } }'
How can I print "Not Exist File" or "process file parsing"?
You should be matching on the Result instead of calling unwrap() or expect(). Once you've got the error, you can either print it out or look at the ErrorKind if you want to handle different errors differently.
use std::fs::File;
use std::path::Path;
use std::io::Read;
fn main() {
parse_file("doesnt_exist.txt");
}
fn parse_file(filename: &'static str) {
let mut f = match File::open(&Path::new(filename)) {
Ok(f) => f,
Err(e) => {
use std::io::ErrorKind::*;
println!("Got error: {}", e);
match e.kind() {
NotFound => {
println!("File not found");
}
k => {
println!("Error: {:?}", k);
}
}
return;
}
};
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
}
Playground
Probably the most idiomatic approach would be propagating the result outside parse_file, printing the error in main.
use std::fs::File;
use std::io::{Read, Error as IoError};
use std::path::Path;
fn main() {
match parse_file("ubuntu-16.04.1-server-amd64.torrent") {
Err(e) => println!("{}", e),
_ => {}
}
}
fn parse_file(filename: &'static str) -> Result<(), IoError> {
let mut f = File::open(&Path::new(filename))?;
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
Ok(())
}
If you want to print the error in one line and then propagating it, you could do something like this instead:
fn main() {
parse_file("ubuntu-16.04.1-server-amd64.torrent");
}
fn parse_file(filename: &'static str) -> Result<(), IoError> {
let mut f = File::open(&Path::new(filename)).map_err(|e| {
println!("{}", e);
e
})?;
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
Ok(())
}

Can Rust consume an iterator passed into a function?

I'm trying to implement a simple REPL calculator in Rust and I'm hitting brick walls all over the place.
I'm consuming chars while iterating over a hardcoded string. When I hit a numeric character I want to pass control over to a function that will consume the rest of the number (assuming the number has more than one digit) and return the number, converted to an Integer.
I'm having trouble with passing Chars iterator to a function. The error I'm getting is use of moved value: 'iter'.
I understand that I can't mutate something that I gave to someone else - something that had its ownership moved - but I don't know any other way of doing this, especially since the Chars iterator is non-copyable.
#[derive(Clone, Debug)]
enum Token {
Addition,
Substraction,
Multiplication,
Division,
Integer(i32),
Error,
}
fn consume_number(mut iter: std::str::Chars) -> Option<i32> {
while let Some(item) = iter.next() {
println!("{:?}", item);
}
return Some(1337);
}
fn tokenize(line: &str) -> Vec<Token> {
let mut iter = line.chars();
let mut tokens = Vec::new();
let mut token;
while let Some(c) = iter.next() {
if c.is_whitespace() { continue };
if c.is_digit(10) {
token = match consume_number(iter) {
Some(i32) => Token::Integer(i32),
None => Token::Error,
};
} else {
token = match c {
'+' => Token::Addition,
'-' => Token::Substraction,
'*' => Token::Multiplication,
'/' => Token::Division,
_ => Token::Error,
};
};
tokens.push(token);
}
return tokens;
}
fn main() {
let line = "631 * 32 + 212 - 15 / 89";
println!("{:?}", tokenize(&line));
}
The answer is yes, it's done in the FromIterator trait.
What you experience here is much more basic:
fn consume_number(mut iter: std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(iter)
...
}
When calling match_consume_number you are transferring ownership of the iterator to it. It means that at the next iteration of the loop body, this iter variable is no longer available.
If the iterator is meant to still be usable afterward, you should pass a reference to it:
fn consume_number(iter: &mut std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(&mut iter)
...
}
You were close!