Remove unwrap from vec pop - error-handling

This is a function (that works) that produces a vector of things but there is some ugly unwrapping in there as it involves popping another vector.
fn load_into_vec(file_buf: String) -> Vec<Data> {
let mut data_vec: Vec<Data> = Vec::new();
for line_iter in file_buf.lines() {
let mut line_vec: Vec<&str> = line_iter.split(' ').collect();
let (t1, t2, t3, t4): (u32, u32, u32, u32) =
(
/** ANOTHER WAY TO WRITE THIS? **/
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap()
);
let mut data_node = Data::new();
data_node.load((t4, t3, t2, t1));
data_vec.push(data_node);
}
data_vec
}
Is there was another way to rewrite the above block without using unwrap or to use unwrap in a way that it will not panic if it encounters None?

You can use iterators and match to write a more idiomatic version of your loop body. Instead of collecting into an intermediate Vec, you match on calling iter.next() four times, to extract your four integers. In case one of the iter.next() calls doesn't succeed, you can panic.
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
I would even go as far as to rewrite the entire function:
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
data_node
})
.collect()
Even better would be to have the function return a Result that denotes when an error has occurred:
enum MyError {
NotAnInt,
TooFewNumbers,
TooManyNumbers,
}
fn load_into_vec2(file_buf: String) -> Result<Vec<Data>, MyError> {
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.fuse();
match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(Ok(t1)), Some(Ok(t2)), Some(Ok(t3)), Some(Ok(t4))) => if iter.next().is_some() {
Err(MyError::TooManyNumbers)
} else {
let mut data_node = Data::new();
data_node.load((t1, t2, t3, t4));
Ok(data_node)
},
(None, _, _, _) |
(_, None, _, _) |
(_, _, None, _) |
(_, _, _, None) => Err(MyError::TooFewNumbers),
(Some(Err(_)), _, _, _) |
(_, Some(Err(_)), _, _) |
(_, _, Some(Err(_)), _) |
(_, _, _, Some(Err(_))) => Err(MyError::NotAnInt),
}
})
.collect()
}

Related

How to apply a function to multiple columns of a polars DataFrame in Rust

I'd like to apply a user-define function which takes a few inputs (corresponding some columns in a polars DataFrame) to some columns of a polars DataFrame in Rust. The pattern that I'm using is as below. I wonder is this the best practice?
fn my_filter_func(col1: &Series, col2: &Series, col2: &Series) -> ReturnType {
let it = (0..n).map(|i| {
let col1 = match col.get(i) {
AnyValue::UInt64(val) => val,
_ => panic!("Wrong type of col1!"),
};
// similar for col2 and col3
// apply user-defined function to col1, col2 and col3
}
// convert it to a collection of the required type
}
You can downcast the Series to the proper type you want to iterate over, and then use rust iterators to apply your logic.
fn my_black_box_function(a: f32, b: f32) -> f32 {
// do something
a
}
fn apply_multiples(col_a: &Series, col_b: &Series) -> Float32Chunked {
match (col_a.dtype(), col_b.dtype()) {
(DataType::Float32, DataType::Float32) => {
let a = col_a.f32().unwrap();
let b = col_b.f32().unwrap();
a.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect()
}
_ => panic!("unpexptected dtypes"),
}
}
Lazy API
You don't have to leave the lazy API to be able to access my_black_box_function.
We can collect the columns we want to apply in a Struct data type and then apply a closure over that Series.
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
df![
"a" => [1.0, 2.0, 3.0],
"b" => [3.0, 5.1, 0.3]
]?
.lazy()
.select([concat_lst(["col_a", "col_b"]).map(
|s| {
let ca = s.struct_()?;
let b = ca.field_by_name("col_a")?;
let a = ca.field_by_name("col_b")?;
let a = a.f32()?;
let b = b.f32()?;
let out: Float32Chunked = a
.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Float32),
)])
.collect()
}
The solution I found working for me is with map_multiple(my understanding - this to be used if no groupby/agg) or apply_multiple(my understanding - whenerver you have groupby/agg). Alternatively, you could also use map_many or apply_many. See below.
use polars::prelude::*;
use polars::df;
fn main() {
let df = df! [
"names" => ["a", "b", "a"],
"values" => [1, 2, 3],
"values_nulls" => [Some(1), None, Some(3)],
"new_vals" => [Some(1.0), None, Some(3.0)]
].unwrap();
println!("{:?}", df);
//df.try_apply("values_nulls", |s: &Series| s.cast(&DataType::Float64)).unwrap();
let df = df.lazy()
.groupby([col("names")])
.agg( [
total_delta_sens().sum()
]
);
println!("{:?}", df.collect());
}
pub fn total_delta_sens () -> Expr {
let s: &mut [Expr] = &mut [col("values"), col("values_nulls"), col("new_vals")];
fn sum_fa(s: &mut [Series])->Result<Series>{
let mut ss = s[0].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap().clone();
for i in 1..s.len(){
ss = ss.add_to(&s[i].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap()).unwrap();
}
Ok(ss)
}
let o = GetOutput::from_type(DataType::Float64);
map_multiple(sum_fa, s, o)
}
Here total_delta_sens is just a wrapper function for convenience. You don't have to use it.You can do directly this within your .agg([]) or .with_columns([]) :
lit::<f64>(0.0).map_many(sum_fa, &[col("norm"), col("uniform")], o)
Inside sum_fa you can as Richie already mentioned downcast to ChunkedArray and .iter() or even .par_iter()
Hope that helps

Why does the Fuse iterator adapter not work as expected?

I'm experimenting with the Fuse iterator adapter and am getting unexpected results (Playground link):
fn main() {
let mut i1 = (1..3).scan(1, |_, x| {
if x < 2 { None } else { Some(x) }
});
println!("{:?}", i1.next());
println!("{:?}", i1.next());
println!("{:?}", i1.next());
println!("");
let mut i2 = (1..3).scan(1, |_, x| {
if x < 2 { None } else { Some(x) }
}).fuse();
println!("{:?}", i2.next());
println!("{:?}", i2.next()); // This should print None
println!("{:?}", i2.next());
println!("");
}
Which prints:
None
Some(2)
None
None
Some(2)
None
Iterator i1 is returning what I expect. It returns None, then Some(2), then None. i2 is the same iterator adapted with fuse(). Fuse should make it return None after the first None, and since the first value it returns is None that should be the only value it returns. However, it behaves the same as i1. What am I doing wrong?
TL;DR Summary: This was a bug and is fixed in Rust 1.19 and newer.
I'm pretty sure you are doing nothing wrong. This appears to be either a bug (my guess) or a very confusing interaction. Check out this expanded example:
#![feature(fused)]
fn dump<I: Iterator<Item = i32>>(label: &str, mut iter: I) {
println!("= Running: {}", label);
for _ in 0..10 {
println!("{:?}", iter.next());
}
println!("");
}
fn boxed_internal_fuse() -> Box<Iterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) })
.fuse())
}
fn boxed_no_fuse() -> Box<Iterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) }))
}
use std::iter::FusedIterator;
fn boxed_no_fuse_but_fused() -> Box<FusedIterator<Item = i32>> {
Box::new((1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) }))
}
fn main() {
let i1 = (1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) });
dump("Scan", i1);
let i2 = (1..3)
.scan(1, |_, x| if x < 2 { None } else { Some(x) })
.fuse();
dump("Fuse<Scan>", i2);
dump("Box<Fuse<Scan>>", boxed_internal_fuse());
dump("Fuse<Box<Iterator>>", boxed_no_fuse().fuse()); // All `None`s
dump("Fuse<Box<FusedIterator>>", boxed_no_fuse_but_fused().fuse());
}
The trick is that FusedIterator is a unstable trait that is aimed at improving the efficiency. It lets Iterator::fuse know that it's a no-op.
However, in this case, the conditions are necessary but not sufficient:
impl<B, I, St, F> FusedIterator for Scan<I, St, F>
where I: FusedIterator, F: FnMut(&mut St, I::Item) -> Option<B> {}
It's true that if the underlying iterator is FusedIterator and starts returning None, scan will keep returning None. However, that's not the only way to get None — the closure can also return None!

How to handle errors from the Read::read_to_end method?

I am reading a file and parsing the data:
fn main() {
parse_torrentfile("ubuntu-16.04.1-server-amd64.torrent");
}
fn parse_file(filename: &'static str) {
let mut f = File::open(&Path::new(filename)).unwrap();
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
}
If ubuntu-16.04.1-server-amd64.torrent is present, this works well, but if it isn't present, this error occurs:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 2, message: "No such file or directory" } }'
How can I print "Not Exist File" or "process file parsing"?
You should be matching on the Result instead of calling unwrap() or expect(). Once you've got the error, you can either print it out or look at the ErrorKind if you want to handle different errors differently.
use std::fs::File;
use std::path::Path;
use std::io::Read;
fn main() {
parse_file("doesnt_exist.txt");
}
fn parse_file(filename: &'static str) {
let mut f = match File::open(&Path::new(filename)) {
Ok(f) => f,
Err(e) => {
use std::io::ErrorKind::*;
println!("Got error: {}", e);
match e.kind() {
NotFound => {
println!("File not found");
}
k => {
println!("Error: {:?}", k);
}
}
return;
}
};
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
}
Playground
Probably the most idiomatic approach would be propagating the result outside parse_file, printing the error in main.
use std::fs::File;
use std::io::{Read, Error as IoError};
use std::path::Path;
fn main() {
match parse_file("ubuntu-16.04.1-server-amd64.torrent") {
Err(e) => println!("{}", e),
_ => {}
}
}
fn parse_file(filename: &'static str) -> Result<(), IoError> {
let mut f = File::open(&Path::new(filename))?;
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
Ok(())
}
If you want to print the error in one line and then propagating it, you could do something like this instead:
fn main() {
parse_file("ubuntu-16.04.1-server-amd64.torrent");
}
fn parse_file(filename: &'static str) -> Result<(), IoError> {
let mut f = File::open(&Path::new(filename)).map_err(|e| {
println!("{}", e);
e
})?;
let mut v: Vec<u8> = Vec::new();
let file_content = f.read_to_end(&mut v);
println!("{:?}", file_content);
Ok(())
}

Can Rust consume an iterator passed into a function?

I'm trying to implement a simple REPL calculator in Rust and I'm hitting brick walls all over the place.
I'm consuming chars while iterating over a hardcoded string. When I hit a numeric character I want to pass control over to a function that will consume the rest of the number (assuming the number has more than one digit) and return the number, converted to an Integer.
I'm having trouble with passing Chars iterator to a function. The error I'm getting is use of moved value: 'iter'.
I understand that I can't mutate something that I gave to someone else - something that had its ownership moved - but I don't know any other way of doing this, especially since the Chars iterator is non-copyable.
#[derive(Clone, Debug)]
enum Token {
Addition,
Substraction,
Multiplication,
Division,
Integer(i32),
Error,
}
fn consume_number(mut iter: std::str::Chars) -> Option<i32> {
while let Some(item) = iter.next() {
println!("{:?}", item);
}
return Some(1337);
}
fn tokenize(line: &str) -> Vec<Token> {
let mut iter = line.chars();
let mut tokens = Vec::new();
let mut token;
while let Some(c) = iter.next() {
if c.is_whitespace() { continue };
if c.is_digit(10) {
token = match consume_number(iter) {
Some(i32) => Token::Integer(i32),
None => Token::Error,
};
} else {
token = match c {
'+' => Token::Addition,
'-' => Token::Substraction,
'*' => Token::Multiplication,
'/' => Token::Division,
_ => Token::Error,
};
};
tokens.push(token);
}
return tokens;
}
fn main() {
let line = "631 * 32 + 212 - 15 / 89";
println!("{:?}", tokenize(&line));
}
The answer is yes, it's done in the FromIterator trait.
What you experience here is much more basic:
fn consume_number(mut iter: std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(iter)
...
}
When calling match_consume_number you are transferring ownership of the iterator to it. It means that at the next iteration of the loop body, this iter variable is no longer available.
If the iterator is meant to still be usable afterward, you should pass a reference to it:
fn consume_number(iter: &mut std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(&mut iter)
...
}
You were close!

How can I test stdin and stdout?

I'd like to write a prompt function that sends a passed-in string to stdout and then returns the string that it reads from stdin. How could I test it?
Here is an example of the function:
fn prompt(question: String) -> String {
let mut stdin = BufferedReader::new(stdin());
print!("{}", question);
match stdin.read_line() {
Ok(line) => line,
Err(e) => panic!(e),
}
}
And here is my testing attempt
#[test]
fn try_to_test_stdout() {
let writer: Vec<u8> = vec![];
set_stdout(Box::new(writer));
print!("testing");
// `writer` is now gone, can't check to see if "testing" was sent
}
Use dependency injection. Coupling it with generics and monomorphism, you don't lose any performance:
use std::io::{self, BufRead, Write};
fn prompt<R, W>(mut reader: R, mut writer: W, question: &str) -> String
where
R: BufRead,
W: Write,
{
write!(&mut writer, "{}", question).expect("Unable to write");
let mut s = String::new();
reader.read_line(&mut s).expect("Unable to read");
s
}
#[test]
fn test_with_in_memory() {
let input = b"I'm George";
let mut output = Vec::new();
let answer = prompt(&input[..], &mut output, "Who goes there?");
let output = String::from_utf8(output).expect("Not UTF-8");
assert_eq!("Who goes there?", output);
assert_eq!("I'm George", answer);
}
fn main() {
let stdio = io::stdin();
let input = stdio.lock();
let output = io::stdout();
let answer = prompt(input, output, "Who goes there?");
println!("was: {}", answer);
}
In many cases, you'd want to actually propagate the error back up to the caller instead of using expect, as IO is a very common place for failures to occur.
This can be extended beyond functions into methods:
use std::io::{self, BufRead, Write};
struct Quizzer<R, W> {
reader: R,
writer: W,
}
impl<R, W> Quizzer<R, W>
where
R: BufRead,
W: Write,
{
fn prompt(&mut self, question: &str) -> String {
write!(&mut self.writer, "{}", question).expect("Unable to write");
let mut s = String::new();
self.reader.read_line(&mut s).expect("Unable to read");
s
}
}
#[test]
fn test_with_in_memory() {
let input = b"I'm George";
let mut output = Vec::new();
let answer = {
let mut quizzer = Quizzer {
reader: &input[..],
writer: &mut output,
};
quizzer.prompt("Who goes there?")
};
let output = String::from_utf8(output).expect("Not UTF-8");
assert_eq!("Who goes there?", output);
assert_eq!("I'm George", answer);
}
fn main() {
let stdio = io::stdin();
let input = stdio.lock();
let output = io::stdout();
let mut quizzer = Quizzer {
reader: input,
writer: output,
};
let answer = quizzer.prompt("Who goes there?");
println!("was: {}", answer);
}