Return a moving window of elements resulting from an iterator of Vec<u8> - iterator

I'm trying to figure out how to return a window of elements from a vector that I've first filtered without copying it to a new vector.
So this is the naive approach which works fine but I think will end up allocating a new vector from line 5 which I don't really want to do.
let mut buf = Vec::new();
file.read_to_end(&mut buf);
// Do some filtering of the read file and create a new vector for subsequent processing
let iter = buf.iter().filter(|&x| *x != 10 && *x != 13);
let clean_buf = Vec::from_iter(iter);
for iter in clean_buf.windows(13) {
print!("{}",iter.len());
}
Alternative approach where I could use a chain()? to achieve the same thing without copying into a new Vec
for iter in buf.iter().filter(|&x| *x != 10 && *x != 13) {
let window = ???
}

You can use Vec::retain instead of filter for this, which allows you to keep your Vec:
fn main() {
let mut buf = vec![
8, 9, 10, 11, 12, 13, 14,
8, 9, 10, 11, 12, 13, 14,
8, 9, 10, 11, 12, 13, 14,
];
println!("{:?}", buf);
buf.retain(|&x| x != 10 && x != 13);
println!("{:?}", buf);
for iter in buf.windows(13) {
print!("{}, ", iter.len());
}
println!("");
}

I don't see how this would be possible. You say:
elements from a vector that I've first filtered
But once you've filtered a vector, you don't have a vector anymore - you just have an Iterator. Iterators only have the concept of the next item.
To be most efficient, you'd have to create a small buffer of the size of your window. Unfortunately, you cannot write an iterator that returns a reference to itself, so you'd have to pass in a buffer to a hypothetical Iterator::windows method. In that case, you'd run into the problem of having a mutable reference (so you could populate the buffer) and an immutable reference (so you could return a slice), which won't fly.
The only close solution I can think of is to have multiple iterators over the same vector that you then zip together:
fn main() {
let nums: Vec<u8> = (1..100).collect();
fn is_even(x: &&u8) -> bool { **x % 2 == 0 }
let a = nums.iter().filter(is_even);
let b = nums.iter().filter(is_even).skip(1);
let c = nums.iter().filter(is_even).skip(2);
for z in a.zip(b).zip(c).map(|((a, b), c)| (a,b,c)) {
println!("{:?}", z);
}
}
This has the distinct downside of needing to apply the filtering condition multiple times, and the ugliness of the nested zips (you can fix the latter with use of itertools though).
Personally, I'd probably just collect into a Vec, as you have already done.

Related

How do I chain operators over lists in rust? Looking for equivalent to kotlin code

I have the following code in kotlin and I'm trying to find a rust equivalent, but don't understand the chaining mechanism in rust to convert.
val windowSize = 2
val result = listOf(1, 2, 3, 4, 5, 6)
.windowed(windowSize, 1) ; [[1,2], [2,3], [3,4], [4,5], [5,6]]
.map { it.sum() } ; [ 3, 5, 7, 9, 11]
.windowed(2, 1) ; [[3,5], [5,7], [7,9], [9,11] ]
.count { it[0] < it[1] } ; 4
;; result = 4, as there are 4 sequences that have first number less than 2nd,
;; when considering a sliding window over the original data of 2 items at a time.
It just takes a list of integers, splits them into pairs (but the windowSize will be a function parameter), sums those groups, splits the sums into pairs again, and finds where each second element is bigger than the previous, so finding increasing values over moving windows.
I'm converting this to the rust equivalent, but struggling to understand how to chain operations together.
What I've got so far is:
let input = [1, 2, 3, 4, 5, 6];
input.windows(2)
.map(|es| es.iter().sum())
// what goes here to do the next windows(2) operation?
.for_each(|x: u32| println!("{}", x));
I can "for_each" over the map to do things on the iteration, but I can't split it with another "windows()", or don't know the magic to make that possible. IntelliJ is showing me the return type from map is impl Iterator<Item=?>
Can anyone enlighten me please? I am an absolute beginner on rust, so this is undoubtedly to do with my understanding of the language as a whole.
The Itertools crate provides a reasonably convenient way to do this with the tuple_windows method.
use itertools::Itertools;
fn main() {
let input = [1i32, 2, 3, 4, 5, 6];
let output: usize = input
.windows(2)
.map(|es| es.iter().sum::<i32>())
.tuple_windows()
.filter(|(a, b)| a < b)
.count();
println!("{}", output);
}
Playground
The standard library does not have a way to do this without collecting the iterator first, which requires two passes through the data.
It is a bit convoluted to chain everything. You need to collect into a vec so you can access windows again. Then you can flat_map the windows to array references (taken from this other answer) to complete what you want to do:
fn main() {
let input = [1usize, 2, 3, 4, 5, 6];
let res = input
.windows(2)
.map(|es| es.iter().sum::<usize>())
.collect::<Vec<_>>()
.windows(2)
.flat_map(<[usize; 2]>::try_from)
.filter(|[a, b]| a < b)
.count();
println!("{}", res);
}
Playground
Note: Nightly feature array_windows that use const generic allow to remove the .flat_map(<&[usize; 2]>::try_from) call
As stated in #Aiden4's answer, the best solution is to use itertools::tuple_windows. It is however possible using just the standard library and without collecting to an intermediate vector using Iterator::scan:
fn main() {
let input = [1i32, 2, 3, 4, 5, 6];
let output: usize = input
.windows(2)
.map(|es| es.iter().sum())
.scan(0, |prev, cur| {
let res = (*prev, cur);
*prev = cur;
Some(res)
})
.skip(1)
.filter(|(a, b)| a < b)
.count();
println!("{}", output);
}
Playground
Using std and stable only:
fn main() {
let input = [1i32, 2, 3, 4, 5, 6];
let mut iter = input.windows(2).map(|es| es.iter().sum::<i32>());
let n = if let Some(mut prev) = iter.next() {
iter.map(|i| {
let ret = (prev, i);
prev = i;
ret
})
.filter(|(a, b)| a < b)
.count()
} else {
0
};
println!("{}", n);
}
This should be very fast.

Implement a pairwise iterator

I have trouble writing code for a function that takes an iterator and returns an iterator that iterates in pairs (Option<T>, T) like so
a = [1,2,3]
assert pairwise(a) == `[(None, 1), (Some(1), 2), (Some(2), 3)]
fn pairwise<I, T>(&xs: &I) -> I
where
I: Iterator<Item = T>,
{
[None].iter().chain(xs.iter().map(Some)).zip(xs.iter())
}
fn main() {
let data: Vec<i32> = vec![1, 2, 3];
let newdata: Vec<Option<i32>, i32> = pairwise(&data).collect();
println!("{:?}", newdata);
}
error[E0599]: no method named `iter` found for type `I` in the current scope
--> src/main.rs:3:28
|
3 | [None].iter().chain(xs.iter().map(Some)).zip(xs.iter())
| ^^^^
|
Not sure why xs isn't iterable. I've stated it in the where clause haven't I?
fn pairwise<I, T>(&xs: &I) -> I
This doesn't make sense. See What is the correct way to return an Iterator (or any other trait)? and What is the difference between `e1` and `&e2` when used as the for-loop variable?.
I: Iterator<Item = T>,
There's no reason to specify that the Item is a T.
[None].iter()
It's better to use iter::once.
xs.iter()
There's no trait in the standard library that defines an iter method. Perhaps you meant IntoIterator?
let data: Vec<i32> = vec![1, 2, 3]
There's no reason to specify the type here; i32 is the default integral type.
Vec<Option<i32>, i32>
Vec<Option<i32>, i32>> // original version
This is not a valid type for Vec, and your original form doesn't even have balanced symbols.
After all that, you are faced with tough choices. Your example code passes in an iterator which has references to the slice but you've written your assertion such that you expect to get non-references back. You've also attempted to use an arbitrary iterator twice; there's no guarantee that such a thing is viable.
The most generic form I see is:
use std::iter;
fn pairwise<I>(right: I) -> impl Iterator<Item = (Option<I::Item>, I::Item)>
where
I: IntoIterator + Clone,
{
let left = iter::once(None).chain(right.clone().into_iter().map(Some));
left.zip(right)
}
fn main() {
let data = vec![1, 2, 3];
let newdata: Vec<_> = pairwise(&data).collect();
assert_eq!(newdata, [(None, &1), (Some(&1), &2), (Some(&2), &3)]);
let newdata: Vec<_> = pairwise(data.iter().copied()).collect();
assert_eq!(newdata, [(None, 1), (Some(1), 2), (Some(2), 3)]);
}
See also:
Iterating over a slice's values instead of references in Rust?
How to iterate over and filter an array?
How to create a non consuming iterator from a Vector
Why can I iterate over a slice twice, but not a vector?
The compiler suggests I add a 'static lifetime because the parameter type may not live long enough, but I don't think that's what I want
What is the correct way to return an Iterator (or any other trait)?
I know OP asked for "outer pairwise" ([(None, 1), (Some(1), 2), (Some(2), 3)]), but here is how I adapted it for "inner pairwise" ([(1, 2), (2, 3)]):
fn inner_pairwise<I>(right: I) -> impl Iterator<Item = (I::Item, I::Item)>
where
I: IntoIterator + Clone,
{
let left = right.clone().into_iter().skip(1);
left.zip(right)
}
For anyone here for "inner pairwise", you're looking for Itertools::tuple_windows.

How to debug Kotlin sequences / collections

Take the following one-liner, which can be expressed as a series of operations on a collection or a sequence:
val nums = (10 downTo 1)
// .asSequence() if we want this to be a sequence
.filter { it % 2 == 0 }
.map { it * it }
.sorted()
// .asList() if declaring it a sequence
println(nums) // [4, 16, 36, 64, 100]
Let's say I want to see the elements at each step, they would be (from deduction):
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, 8, 6, 4, 2]
[100, 64, 36, 16, 4]
[4, 16, 36, 64, 100]
Unfortunately, there's no good way to either debug this with a debugger or log these values for later inspection. With good functional programming constructs, entire methods can be rewritten as single statements like this but there seems to be no good way to inspect intermediate states, even counts (10, 5, 5, 5 here).
What's the best way to debug these?
You can log the intermediate values (lists) with
fun <T> T.log(): T { println(this); this }
//USAGE:
val nums = (10 downTo 1)
.filter { it % 2 == 0 }.log()
.map { it * it }.log()
.sorted().log()
This will work as desired since in your example you work with collections, not sequences. For lazy Sequence you need:
// coming in 1.1
public fun <T> Sequence<T>.onEach(action: (T) -> Unit): Sequence<T> {
return map {
action(it)
it
}
}
fun <T> Sequence<T>.log() = onEach {print(it)}
//USAGE:
val nums = (10 downTo 1).asSequance()
.filter { it % 2 == 0 }
.map { it * it }.log()
.sorted()
.toList()
In latest Intellij Idea when adding a breakpoint you have an option to set it to not inspect whole expression but only a Lambda body.
Then in the debug itself you can see what is happening inside of your Lambda.
But this is not the only way. You can also use Run to cursor (Alt + F9).
I think the current correct answer is that you want the Kotlin Sequence Debugger plugin, which lets you use IntelliJ's lovely Java stream debugger with Kotlin sequences.
Note that (unless I'm doing something wrong) it doesn't appear to work with collections, so you will have to convert the collection to a sequence in order to debug it. Easy enough using Iterable.asSequence, and a small price to pay -- you can always revert that change once you are done debugging.
you may use the also inline function to log, print at any sequence stage as explained by Andrey Breslav at Google I/O '18
(1..10)
.filter { it % 2 == 0 }
.also { e -> println(e) /* do your debug or print here */ }
.map { it * 2 }
.toList()

Iterate over copy types

It is clear that iterators pass around a references to avoid moving objects into iterator or it's closure argument, but what with Copy types? Let me show you a small snippet:
fn is_odd(x: &&i32) -> bool { *x & 1 == 1 }
// [1] fn is_odd(x: &i32) -> bool { x & 1 == 1 }
// [2] fn is_odd(x: i32) -> bool { x & 1 == 1 }
fn main() {
let xs = &[ 10, 20, 13, 14 ];
for x in xs.iter().filter(is_odd) {
assert_eq!(13, *x);
}
// [1] ...is slightly better, but not ideal
// for x in xs.iter().cloned().filter(is_odd) {
// assert_eq!(13, x);
// }
}
Am I right that .cloned() is preferred when we iterate over something like &[i32] or &[u8], where extra indirection is involved instead of just copying the tiny data unit?
But it looks like I can not avoid references passed into is_odd function.
Is there a way to make [2] function from above snippet work for higher-level functions like filter?
Assume that I understand that moving non-Copy type into predicate function is silly. But not all types use move semantics by default, right?
It is clear that iterators pass around a references
This blanket statement is not true, iterators are more than capable of yielding a non-reference. filter will provide a reference to the closure because it doesn't want to give ownership of the item to the closure. In your example, your iterated value is a &i32, and then filter provides a &&i32.
Is there a way to make [2] function from above snippet work for higher-level functions like filter?
Certainly, just provide a closure that does the dereferencing:
fn is_odd(x: i32) -> bool { x & 1 == 1 }
fn main() {
let xs = &[ 10, 20, 13, 14 ];
for x in xs.iter().filter(|&&x| is_odd(x)) {
assert_eq!(13, *x);
}
}

Using an iterator, how do I skip a number of values and then display the rest?

Random access to the elements is not allowed.
let vec = vec![1,2,3,4,5,6,7,8,9,0];
let n = 3;
for v in vec.iter().rev().take(n) {
println!("{}", v);
}
// this printed: 0, 9, 8
// need: 8, 9, 0
for v in vec.iter().rev().skip(n).rev() does not work.
I think the code you wrote does what you're asking it to.
You are reversing the vec with rev() and then you're taking the first 3 elements of the reversed vector (therefore 0, 9, 8)
To obtain the last 3 in non-reversed order you can skip to the end of the vector minus 3 elements, without reversing it:
let vec = vec![1,2,3,4,5,6,7,8,9,0];
let n = vec.len() - 3;
for v in vec.iter().skip(n) {
println!("{}", v);
}
Neither skip nor take yield DoubleEndIterator, you have to either:
skip, which is O(N) in the number of skipped items
collect the result of .rev().take(), and then rev it, which is O(N) in the number of items to be printed, and requires allocating memory for them
The skip is obvious, so let me illustrate the collect:
let vec = vec![1,2,3,4,5,6,7,8,9,0];
let vec: Vec<_> = vec.iter().rev().take(3).collect();
for v in vec.iter().rev() {
println!("{}", v);
}
Of course, the inefficiency is due to you shooting yourself in the foot by avoiding random access in the first place...
Based on the comments, I guess you want to iterate specifically through the elements of a Vec or slice. If that is the case, you could use range slicing, as shown below:
let vec = vec![1,2,3,4,5,6,7,8,9,0];
let n = vec.len() - 3;
for v in &vec[n..] {
println!("{}", v);
}
The big advantage of this approach is that it doesn't require to skip through elements you are not interested in (which may have a big cost if not optimized away). It will just make a new slice and then iterate through it. In other words, you have the guarantee that it will be fast.