How to apply a function to multiple columns of a polars DataFrame in Rust - dataframe

I'd like to apply a user-define function which takes a few inputs (corresponding some columns in a polars DataFrame) to some columns of a polars DataFrame in Rust. The pattern that I'm using is as below. I wonder is this the best practice?
fn my_filter_func(col1: &Series, col2: &Series, col2: &Series) -> ReturnType {
let it = (0..n).map(|i| {
let col1 = match col.get(i) {
AnyValue::UInt64(val) => val,
_ => panic!("Wrong type of col1!"),
};
// similar for col2 and col3
// apply user-defined function to col1, col2 and col3
}
// convert it to a collection of the required type
}

You can downcast the Series to the proper type you want to iterate over, and then use rust iterators to apply your logic.
fn my_black_box_function(a: f32, b: f32) -> f32 {
// do something
a
}
fn apply_multiples(col_a: &Series, col_b: &Series) -> Float32Chunked {
match (col_a.dtype(), col_b.dtype()) {
(DataType::Float32, DataType::Float32) => {
let a = col_a.f32().unwrap();
let b = col_b.f32().unwrap();
a.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect()
}
_ => panic!("unpexptected dtypes"),
}
}
Lazy API
You don't have to leave the lazy API to be able to access my_black_box_function.
We can collect the columns we want to apply in a Struct data type and then apply a closure over that Series.
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
df![
"a" => [1.0, 2.0, 3.0],
"b" => [3.0, 5.1, 0.3]
]?
.lazy()
.select([concat_lst(["col_a", "col_b"]).map(
|s| {
let ca = s.struct_()?;
let b = ca.field_by_name("col_a")?;
let a = ca.field_by_name("col_b")?;
let a = a.f32()?;
let b = b.f32()?;
let out: Float32Chunked = a
.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Float32),
)])
.collect()
}

The solution I found working for me is with map_multiple(my understanding - this to be used if no groupby/agg) or apply_multiple(my understanding - whenerver you have groupby/agg). Alternatively, you could also use map_many or apply_many. See below.
use polars::prelude::*;
use polars::df;
fn main() {
let df = df! [
"names" => ["a", "b", "a"],
"values" => [1, 2, 3],
"values_nulls" => [Some(1), None, Some(3)],
"new_vals" => [Some(1.0), None, Some(3.0)]
].unwrap();
println!("{:?}", df);
//df.try_apply("values_nulls", |s: &Series| s.cast(&DataType::Float64)).unwrap();
let df = df.lazy()
.groupby([col("names")])
.agg( [
total_delta_sens().sum()
]
);
println!("{:?}", df.collect());
}
pub fn total_delta_sens () -> Expr {
let s: &mut [Expr] = &mut [col("values"), col("values_nulls"), col("new_vals")];
fn sum_fa(s: &mut [Series])->Result<Series>{
let mut ss = s[0].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap().clone();
for i in 1..s.len(){
ss = ss.add_to(&s[i].cast(&DataType::Float64).unwrap().fill_null(FillNullStrategy::Zero).unwrap()).unwrap();
}
Ok(ss)
}
let o = GetOutput::from_type(DataType::Float64);
map_multiple(sum_fa, s, o)
}
Here total_delta_sens is just a wrapper function for convenience. You don't have to use it.You can do directly this within your .agg([]) or .with_columns([]) :
lit::<f64>(0.0).map_many(sum_fa, &[col("norm"), col("uniform")], o)
Inside sum_fa you can as Richie already mentioned downcast to ChunkedArray and .iter() or even .par_iter()
Hope that helps

Related

Simplifying Rust matching with combinators

I have something like this:
match fnX(
fnY(x), // Returns Result<(), X>
) // Returns Result<(), Y>
.await
{
Ok(v) => {
if v.is_err() {
error!("error = {}", v);
}
}
Err(e) => error!("error = {}", e),
};
How can I write this with combinators so that I only have to error! once? I don't want to do anything with the Ok value, just print the error whether it comes from fnX or fnY.
I'm assuming that you meant to simplify something like this (removing the .await that is unrelated to the issue):
match fnX(x) { // Returns Result<X, EX>
Ok(y) => match fnY(y) { // Returns Result<Y, EY>
Ok(_) => println!("Success!"),
Err(e) => error!("error = {}", e),
},
Err(e) => error!("error = {}", e),
}
If the error types are the same, you can simplify the code with and_then:
match fnX(x).and_then(fnY) {
Ok(_) => println!("Success!"),
Err(e) => error!("error = {}", e),
}
If the error types are different, you can use map_err to convert them to a single type:
match fnX(x)
.map_err(MyError::from)
.and_then(|y| fnY(y).map_err(MyError::from))
{
Ok(_) => println!("Success!"),
Err(e) => error!("error = {}", e),
}
The latter can be simplified using the latest development version of the map_for crate:
match map_for!(y <- fnX (x);
v <- fnY (y);
=> v)
{
Ok(_) => println!("Success"),
Err(e # MyError { .. }) => error!("error = {}", e),
}
Note that the # MyError {..} annotation is only required if the compiler is unable to infer the error type automatically.
Full disclaimer: I am the author of the map_for crate.
You don't need such a "combinator".
fnX accepts an argument of type Result<(), X> and returns a Result<(), Y>
When the code is convoluted it may help to separate the expressions, making it more readable.
let result = fnY(x);
match fnX(result).await {
Ok(v) => {
// here v is ok value, in this case ()
}
Err(e) => error!("error = {}", e),
};

Multiple props in Ramda lens

Is there a way to apply transforms to multiple keys of an object in Ramda? I am aware this is achievable by R.evolve, but I am interested in knowing if this can be achieved by some modification of lenses.
E.g.:
const data = {
a: "100",
b: "non_numeric_string",
c: "0.5"
}
const numerize = x => +x
const mapping = {
a: numerize,
c: numerize
}
magicFunction(mapping, data)
output:
{
a: 100,
b: "non_numeric_string",
c: 0.5
}
The whole point of a lens is to focus on one part of a data structure. While it is not hard to write something using lensProp to achieve this, I'm don't think it's either very satisfying or a particularly appropriate use of lenses. Here's one Ramda solution:
const magicFunction = (mapping, data) =>
reduce
( (o, [k, fn]) => over (lensProp(k), fn, o)
, data
, toPairs (mapping)
)
const numerize = x => Number (x)
const mapping = {
a: numerize,
c: numerize
}
const data = {a: "100", b: "non_numeric_string", c: "0.5"}
console .log (
magicFunction (mapping, data)
)
<script src="//cdnjs.cloudflare.com/ajax/libs/ramda/0.26.1/ramda.js"></script>
<script> const { lensProp, over, reduce, toPairs } = R </script>
But note that a plain ES6 function does the job just as simply, without using lenses:
const magicFunction = (mapping, data) =>
Object.entries (mapping). reduce
( (o, [k, fn]) => ({...o, [k]: fn (o [k]) })
, data
)
Lenses simply don't gain you much here.

Remove unwrap from vec pop

This is a function (that works) that produces a vector of things but there is some ugly unwrapping in there as it involves popping another vector.
fn load_into_vec(file_buf: String) -> Vec<Data> {
let mut data_vec: Vec<Data> = Vec::new();
for line_iter in file_buf.lines() {
let mut line_vec: Vec<&str> = line_iter.split(' ').collect();
let (t1, t2, t3, t4): (u32, u32, u32, u32) =
(
/** ANOTHER WAY TO WRITE THIS? **/
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap(),
line_vec.pop().unwrap().trim().parse::<u32>().ok().unwrap()
);
let mut data_node = Data::new();
data_node.load((t4, t3, t2, t1));
data_vec.push(data_node);
}
data_vec
}
Is there was another way to rewrite the above block without using unwrap or to use unwrap in a way that it will not panic if it encounters None?
You can use iterators and match to write a more idiomatic version of your loop body. Instead of collecting into an intermediate Vec, you match on calling iter.next() four times, to extract your four integers. In case one of the iter.next() calls doesn't succeed, you can panic.
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
I would even go as far as to rewrite the entire function:
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.map(|s| s.expect("could not parse as u32"))
.fuse();
let tup = match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(t1), Some(t2), Some(t3), Some(t4)) => (t1, t2, t3, t4),
_ => panic!("line did not contain at least four numbers"),
};
let mut data_node = Data::new();
data_node.load(tup);
data_node
})
.collect()
Even better would be to have the function return a Result that denotes when an error has occurred:
enum MyError {
NotAnInt,
TooFewNumbers,
TooManyNumbers,
}
fn load_into_vec2(file_buf: String) -> Result<Vec<Data>, MyError> {
file_buf.lines()
.map(|line_iter| {
let mut iter = line_iter.split(' ')
.map(str::trim)
.map(str::parse::<u32>)
.fuse();
match (iter.next(), iter.next(), iter.next(), iter.next()) {
(Some(Ok(t1)), Some(Ok(t2)), Some(Ok(t3)), Some(Ok(t4))) => if iter.next().is_some() {
Err(MyError::TooManyNumbers)
} else {
let mut data_node = Data::new();
data_node.load((t1, t2, t3, t4));
Ok(data_node)
},
(None, _, _, _) |
(_, None, _, _) |
(_, _, None, _) |
(_, _, _, None) => Err(MyError::TooFewNumbers),
(Some(Err(_)), _, _, _) |
(_, Some(Err(_)), _, _) |
(_, _, Some(Err(_)), _) |
(_, _, _, Some(Err(_))) => Err(MyError::NotAnInt),
}
})
.collect()
}

Can Rust consume an iterator passed into a function?

I'm trying to implement a simple REPL calculator in Rust and I'm hitting brick walls all over the place.
I'm consuming chars while iterating over a hardcoded string. When I hit a numeric character I want to pass control over to a function that will consume the rest of the number (assuming the number has more than one digit) and return the number, converted to an Integer.
I'm having trouble with passing Chars iterator to a function. The error I'm getting is use of moved value: 'iter'.
I understand that I can't mutate something that I gave to someone else - something that had its ownership moved - but I don't know any other way of doing this, especially since the Chars iterator is non-copyable.
#[derive(Clone, Debug)]
enum Token {
Addition,
Substraction,
Multiplication,
Division,
Integer(i32),
Error,
}
fn consume_number(mut iter: std::str::Chars) -> Option<i32> {
while let Some(item) = iter.next() {
println!("{:?}", item);
}
return Some(1337);
}
fn tokenize(line: &str) -> Vec<Token> {
let mut iter = line.chars();
let mut tokens = Vec::new();
let mut token;
while let Some(c) = iter.next() {
if c.is_whitespace() { continue };
if c.is_digit(10) {
token = match consume_number(iter) {
Some(i32) => Token::Integer(i32),
None => Token::Error,
};
} else {
token = match c {
'+' => Token::Addition,
'-' => Token::Substraction,
'*' => Token::Multiplication,
'/' => Token::Division,
_ => Token::Error,
};
};
tokens.push(token);
}
return tokens;
}
fn main() {
let line = "631 * 32 + 212 - 15 / 89";
println!("{:?}", tokenize(&line));
}
The answer is yes, it's done in the FromIterator trait.
What you experience here is much more basic:
fn consume_number(mut iter: std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(iter)
...
}
When calling match_consume_number you are transferring ownership of the iterator to it. It means that at the next iteration of the loop body, this iter variable is no longer available.
If the iterator is meant to still be usable afterward, you should pass a reference to it:
fn consume_number(iter: &mut std::str::Chars) -> Option<i32> { ... }
while let Some(c) = iter.next() {
...
match_consume_number(&mut iter)
...
}
You were close!

How do I pass an iterator I am iterating on to a function?

I'm iterating through an array, and depending on the current value, I'd like to pass the iterator to a sub function and have it deal with a number of values, and upon exiting the sub function, carry on iterating through the array. Below is the closest I've managed to get so far, but I keep getting error: use of moved value: 'iter'.
I've tried looking into lifetimes, but that hasn't worked for me either. I've spent most of a day on this now, and can't seem to get anywhere with it. Any help would be greatly appreciated. Thanks.
enum Thing {
Three(char, char, char),
Four(char, char, char, char),
}
fn take_three <'a>(iter: &mut std::slice::Iter<'a, char>) -> Thing {
let a = iter.next().unwrap();
let b = iter.next().unwrap();
let c = iter.next().unwrap();
Thing::Three(*a,*b,*c)
}
fn take_four <'a>(iter: &mut std::slice::Iter<'a, char>) -> Thing {
let a = iter.next().unwrap();
let b = iter.next().unwrap();
let c = iter.next().unwrap();
let d = iter.next().unwrap();
Thing::Four(*a,*b,*c,*d)
}
fn parse_tokens (tokens: &Vec<char>) {
let mut iter = tokens.iter();
let mut things: Vec<Thing> = vec![];
for token in iter {
match token {
&'a' => things.push(take_three(&mut iter)),
&'b' => things.push(take_four(&mut iter)),
_ => {},
}
}
}
fn main() {
let tokens = vec!['a', '1', '2', '3', 'b', '1', '2', '3', '4', 'a', '4', '5', '6'];
parse_tokens(&tokens);
}
The for construct consumes the iterator, and doing what you want using it will be quite tricky (if not impossible, I'm really not sure about that).
However, you can have it working pretty easily by switching to a while let construct, like this:
fn parse_tokens (tokens: &Vec<char>) {
let mut iter = tokens.iter();
let mut things: Vec<Thing> = vec![];
while let Some(token) = iter.next() {
match token {
&'a' => things.push(take_three(&mut iter)),
&'b' => things.push(take_four(&mut iter)),
_ => {},
}
}
}