How to apply a function to elements of a Polars Series in Rust? - dataframe

I am trying to convert some code using an ndarray to a polars DataFrame.
I have a function that samples from multiple distributions. I create an ndarray with number_of_distributions x number_of_samples shape. Then I iterate over each column, and for each column I look up the corresponding mean and standard deviation from the arguments provided and sample n times from that distribution, updating the ndarray.
fn multi_rnorm(n: usize, means: Vec<f64>, sds: Vec<f64>) -> Array2<f64> {
let mut preds: Array2<f64> = Array2::zeros((means.len(), n));
preds.axis_iter_mut(ndarray::Axis(0)).into_par_iter().enumerate().for_each(|(i, mut row)| {
let mut rng = rand::thread_rng();
(0..n).into_iter().for_each(|j| {
let normal = Normal::new(means[i], sds[i]).unwrap();
row[j as usize] = normal.sample(&mut rng);
})
});
preds
}
I am trying to do it with a DataFrame, but cannot figure out how to change the values of the Series/columns. Series cannot be indexed, and I cannot find any setter() methods. The code below does not compile as series cannot be indexed.
fn multi_rnorm(n: usize, means: Vec<f64>, sds: Vec<f64>) -> Result<DataFrame, PolarsError> {
let mut df: DataFrame = DataFrame::new(c![Series::new(&f!("s{i}"), vec![0.0; 99_128]), for i in 0..n])?;
df.iter().enumerate().for_each(|(i, mut col)| {
let mut rng = rand::thread_rng();
col.iter().map(|j| {
let normal = Normal::new(means[i], sds[i]).unwrap();
col[j as usize] = normal.sample(&mut rng);
});
});
Ok(df)
}
Does anybody know how to mutate a specific value of a Polars Series in Rust?

Related

Implement a pairwise iterator

I have trouble writing code for a function that takes an iterator and returns an iterator that iterates in pairs (Option<T>, T) like so
a = [1,2,3]
assert pairwise(a) == `[(None, 1), (Some(1), 2), (Some(2), 3)]
fn pairwise<I, T>(&xs: &I) -> I
where
I: Iterator<Item = T>,
{
[None].iter().chain(xs.iter().map(Some)).zip(xs.iter())
}
fn main() {
let data: Vec<i32> = vec![1, 2, 3];
let newdata: Vec<Option<i32>, i32> = pairwise(&data).collect();
println!("{:?}", newdata);
}
error[E0599]: no method named `iter` found for type `I` in the current scope
--> src/main.rs:3:28
|
3 | [None].iter().chain(xs.iter().map(Some)).zip(xs.iter())
| ^^^^
|
Not sure why xs isn't iterable. I've stated it in the where clause haven't I?
fn pairwise<I, T>(&xs: &I) -> I
This doesn't make sense. See What is the correct way to return an Iterator (or any other trait)? and What is the difference between `e1` and `&e2` when used as the for-loop variable?.
I: Iterator<Item = T>,
There's no reason to specify that the Item is a T.
[None].iter()
It's better to use iter::once.
xs.iter()
There's no trait in the standard library that defines an iter method. Perhaps you meant IntoIterator?
let data: Vec<i32> = vec![1, 2, 3]
There's no reason to specify the type here; i32 is the default integral type.
Vec<Option<i32>, i32>
Vec<Option<i32>, i32>> // original version
This is not a valid type for Vec, and your original form doesn't even have balanced symbols.
After all that, you are faced with tough choices. Your example code passes in an iterator which has references to the slice but you've written your assertion such that you expect to get non-references back. You've also attempted to use an arbitrary iterator twice; there's no guarantee that such a thing is viable.
The most generic form I see is:
use std::iter;
fn pairwise<I>(right: I) -> impl Iterator<Item = (Option<I::Item>, I::Item)>
where
I: IntoIterator + Clone,
{
let left = iter::once(None).chain(right.clone().into_iter().map(Some));
left.zip(right)
}
fn main() {
let data = vec![1, 2, 3];
let newdata: Vec<_> = pairwise(&data).collect();
assert_eq!(newdata, [(None, &1), (Some(&1), &2), (Some(&2), &3)]);
let newdata: Vec<_> = pairwise(data.iter().copied()).collect();
assert_eq!(newdata, [(None, 1), (Some(1), 2), (Some(2), 3)]);
}
See also:
Iterating over a slice's values instead of references in Rust?
How to iterate over and filter an array?
How to create a non consuming iterator from a Vector
Why can I iterate over a slice twice, but not a vector?
The compiler suggests I add a 'static lifetime because the parameter type may not live long enough, but I don't think that's what I want
What is the correct way to return an Iterator (or any other trait)?
I know OP asked for "outer pairwise" ([(None, 1), (Some(1), 2), (Some(2), 3)]), but here is how I adapted it for "inner pairwise" ([(1, 2), (2, 3)]):
fn inner_pairwise<I>(right: I) -> impl Iterator<Item = (I::Item, I::Item)>
where
I: IntoIterator + Clone,
{
let left = right.clone().into_iter().skip(1);
left.zip(right)
}
For anyone here for "inner pairwise", you're looking for Itertools::tuple_windows.

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

How to extract values from &mut iterator?

I am trying to make an iterator that maps a string to an integer:
fn main() {
use std::collections::HashMap;
let mut word_map = HashMap::new();
word_map.insert("world!", 0u32);
let sentence: Vec<&str> = vec!["Hello", "world!"];
let int_sentence: Vec<u32> = sentence.into_iter()
.map(|x| word_map.entry(x).or_insert(word_map.len() as u32))
.collect();
}
(Rust playground)
This fails with
the trait core::iter::FromIterator<&mut u32> is not implemented for the type collections::vec::Vec<u32>
Adding a dereference operator around the word_map.entry().or_insert() expression does not work as it complains about borrowing which is surprising to me as I'm just trying to copy the value.
The borrow checker uses lexical lifetime rules, so you can't have conflicting borrows in a single expression. The solution is to extract getting the length into a separate let statement:
let int_sentence: Vec<u32> = sentence.into_iter()
.map(|x| *({let len = word_map.len() as u32;
word_map.entry(x).or_insert(len)}))
.collect();
Such issues will hopefully go away when Rust supports non-lexical lifetimes.

Chaining iterators of different types

I get type errors when chaining different types of Iterator.
let s = Some(10);
let v = (1..5).chain(s.iter())
.collect::<Vec<_>>();
Output:
<anon>:23:20: 23:35 error: type mismatch resolving `<core::option::Iter<'_, _> as core::iter::IntoIterator>::Item == _`:
expected &-ptr,
found integral variable [E0271]
<anon>:23 let v = (1..5).chain(s.iter())
^~~~~~~~~~~~~~~
<anon>:23:20: 23:35 help: see the detailed explanation for E0271
<anon>:24:14: 24:33 error: no method named `collect` found for type `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>>` in the current scope
<anon>:24 .collect::<Vec<_>>();
^~~~~~~~~~~~~~~~~~~
<anon>:24:14: 24:33 note: the method `collect` exists but the following trait bounds were not satisfied: `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>> : core::iter::Iterator`
error: aborting due to 2 previous errors
But it works fine when zipping:
let s = Some(10);
let v = (1..5).zip(s.iter())
.collect::<Vec<_>>();
Output:
[(1, 10)]
Why is Rust able to infer the correct types for zip but not for chain and how can I fix it? n.b. I want to be able to do this for any iterator, so I don't want a solution that just works for Range and Option.
First, note that the iterators yield different types. I've added an explicit u8 to the numbers to make the types more obvious:
fn main() {
let s = Some(10u8);
let r = (1..5u8);
let () = s.iter().next(); // Option<&u8>
let () = r.next(); // Option<u8>
}
When you chain two iterators, both iterators must yield the same type. This makes sense as the iterator cannot "switch" what type it outputs when it gets to the end of one and begins on the second:
fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter>
where U: IntoIterator<Item=Self::Item>
// ^~~~~~~~~~~~~~~ This means the types must match
So why does zip work? Because it doesn't have that restriction:
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter>
where U: IntoIterator
// ^~~~ Nothing here!
This is because zip returns a tuple with one value from each iterator; a new type, distinct from either source iterator's type. One iterator could be an integral type and the other could return your own custom type for all zip cares.
Why is Rust able to infer the correct types for zip but not for chain
There is no type inference happening here; that's a different thing. This is just plain-old type mismatching.
and how can I fix it?
In this case, your inner iterator yields a reference to an integer, a Clone-able type, so you can use cloned to make a new iterator that clones each value and then both iterators would have the same type:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.iter().cloned()).collect();
}
If you are done with the option, you can also use a consuming iterator with into_iter:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.into_iter()).collect();
}

How to compose mutable Iterators?

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I would like to make an iterator that generates a stream of prime numbers. My general thought process was to wrap an iterator with successive filters so for example you start with
let mut n = (2..N)
Then for each prime number you mutate the iterator and add on a filter
let p1 = n.next()
n = n.filter(|&x| x%p1 !=0)
let p2 = n.next()
n = n.filter(|&x| x%p2 !=0)
I am trying to use the following code, but I can not seem to get it to work
struct Primes {
base: Iterator<Item = u64>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.next();
match p {
Some(n) => {
let prime = n.clone();
let step = self.base.filter(move |&: &x| {x%prime!=0});
self.base = &step as &Iterator<Item = u64>;
Some(n)
},
_ => None
}
}
}
I have toyed with variations of this, but I can't seem to get lifetimes and types to match up. Right now the compiler is telling me
I can't mutate self.base
the variable prime doesn't live long enough
Here is the error I am getting
solution.rs:16:17: 16:26 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:16 let p = self.base.next();
^~~~~~~~~
solution.rs:20:28: 20:37 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
^~~~~~~~~
solution.rs:21:30: 21:34 error: `step` does not live long enough
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
^~~~
solution.rs:15:39: 26:6 note: reference must be valid for the lifetime 'a as defined on the block at 15:38...
solution.rs:15 fn next(&mut self) -> Option<u64> {
solution.rs:16 let p = self.base.next();
solution.rs:17 match p {
solution.rs:18 Some(n) => {
solution.rs:19 let prime = n.clone();
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
...
solution.rs:20:71: 23:14 note: ...but borrowed value is only valid for the block suffix following statement 1 at 20:70
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
solution.rs:22 Some(n)
solution.rs:23 },
error: aborting due to 3 previous errors
Why won't Rust let me do this?
Here is a working version:
struct Primes<'a> {
base: Option<Box<Iterator<Item = u64> + 'a>>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.as_mut().unwrap().next();
p.map(|n| {
let base = self.base.take();
let step = base.unwrap().filter(move |x| x % n != 0);
self.base = Some(Box::new(step));
n
})
}
}
impl<'a> Primes<'a> {
#[inline]
pub fn new<I: Iterator<Item = u64> + 'a>(r: I) -> Primes<'a> {
Primes {
base: Some(Box::new(r)),
}
}
}
fn main() {
for p in Primes::new(2..).take(32) {
print!("{} ", p);
}
println!("");
}
I'm using a Box<Iterator> trait object. Boxing is unavoidable because the internal iterator must be stored somewhere between next() calls, and there is nowhere you can store reference trait objects.
I made the internal iterator an Option. This is necessary because you need to replace it with a value which consumes it, so it is possible that the internal iterator may be "absent" from the structure for a short time. Rust models absence with Option. Option::take replaces the value it is called on with None and returns whatever was there. This is useful when shuffling non-copyable objects around.
Note, however, that this sieve implementation is going to be both memory and computationally inefficient - for each prime you're creating an additional layer of iterators which takes heap space. Also the depth of stack when calling next() grows linearly with the number of primes, so you will get a stack overflow on a sufficiently large number:
fn main() {
println!("{}", Primes::new(2..).nth(10000).unwrap());
}
Running it:
% ./test1
thread '<main>' has overflowed its stack
zsh: illegal hardware instruction (core dumped) ./test1