I am trying to use a pattern with iterators in Rust and falling down somewhere, apparently simple.
I would like to iterate through a container and find an element with a predicate [A] (simple), but then look forward using another predicate and get that value [B] and use [B] to mutate [A] in some way. In this case [A] is mutable and [B] can be immutable; this makes no difference to me, only to the borrow checker (rightly).
It would help to understand this with a simple scenario, so I have added a small snippet to let folk see the issue/attempted goal. I have played with itertools and breaking into for/while loops, although I want to remain as idiomatic as possible.
Silly Example scenario
Lookup an even number, find next number that is divisible by 3 and add to the initial number.
#[allow(unused)]
fn is_div_3(num: &u8) -> bool {
num % 3 == 0
}
fn main() {
let mut data: Vec<u8> = (0..100).collect();
let count = data.iter_mut()
.map(|x| {
if *x % 2 == 0 {
// loop through numbers forward to next is_div_3,
// then x = x + that number
}
true
})
.count();
println!("data {:?}, count was {} ", data, count);
}
playground
Sadly I'm a bit late, but here goes.
It's not totally pretty, but it's not as bad as the other suggestion:
let mut data: Vec<u8> = (1..100).collect();
{
let mut mut_items = data.iter_mut();
while let Some(x) = mut_items.next() {
if *x % 2 == 0 {
let slice = mut_items.into_slice();
*x += *slice.iter().find(|&x| x % 3 == 0).unwrap();
mut_items = slice.iter_mut();
}
}
}
println!("{:?}", data);
gives
[1, 5, 3, 10, 5, 15, 7, 17, 9, 22, ...]
as with Matthieu M.'s solution.
The key is to use mut_items.into_slice() to "reborrow" the iterator, effectively producing a local (and thus safe) clone of the iterator.
Warning: The iterator presented right below is unsafe because it allows one to obtain multiple aliases to a single mutable element; skip to the second part for the corrected version. (It would be alright if the return type contained immutable references).
If you are willing to write your own window iterator, then it becomes quite easy.
First, the iterator in all its gory details:
use std::marker::PhantomData;
struct WindowIterMut<'a, T>
where T: 'a
{
begin: *mut T,
len: usize,
index: usize,
_marker: PhantomData<&'a mut [T]>,
}
impl<'a, T> WindowIterMut<'a, T>
where T: 'a
{
pub fn new(slice: &'a mut [T]) -> WindowIterMut<'a, T> {
WindowIterMut {
begin: slice.as_mut_ptr(),
len: slice.len(),
index: 0,
_marker: PhantomData,
}
}
}
impl<'a, T> Iterator for WindowIterMut<'a, T>
where T: 'a
{
type Item = (&'a mut [T], &'a mut [T]);
fn next(&mut self) -> Option<Self::Item> {
if self.index > self.len { return None; }
let slice: &'a mut [T] = unsafe {
std::slice::from_raw_parts_mut(self.begin, self.len)
};
let result = slice.split_at_mut(self.index);
self.index += 1;
Some(result)
}
}
Invoked on [1, 2, 3] it will return (&[], &[1, 2, 3]) then (&[1], &[2, 3]), ... until (&[1, 2, 3], &[]). In short, it iterates over all the potential partitions of the slice (without shuffling).
Which is safe to use as:
fn main() {
let mut data: Vec<u8> = (1..100).collect();
for (head, tail) in WindowIterMut::new(&mut data) {
if let Some(element) = head.last_mut() {
if *element % 2 == 0 {
if let Some(n3) = tail.iter().filter(|i| *i % 3 == 0).next() {
*element += *n3;
}
}
}
}
println!("{:?}", data);
}
Unfortunately it can also be used as:
fn main() {
let mut data: Vec<u8> = (1..100).collect();
let mut it = WindowIterMut::new(&mut data);
let first_0 = { it.next(); &mut it.next().unwrap().0[0] };
let second_0 = &mut it.next().unwrap().0[0];
println!("{:?} {:?}", first_0 as *const _, second_0 as *const _);
}
which when run print: 0x7f73a8435000 0x7f73a8435000, show-casing that both mutable references alias the same element.
Since we cannot get rid of aliasing, we need to get rid of mutability; or at least defer to interior mutability (Cell here since u8 is Copy).
Fortunately, Cell has no runtime cost, but it does cost a bit in ergonomics (all those .get() and .set()).
I take the opportunity to make the iterator slightly more generic too, and rename it since Window is already a used name for a different concept.
struct FingerIter<'a, T>
where T: 'a
{
begin: *const T,
len: usize,
index: usize,
_marker: PhantomData<&'a [T]>,
}
impl<'a, T> FingerIter<'a, T>
where T: 'a
{
pub fn new(slice: &'a [T]) -> FingerIter<'a, T> {
FingerIter {
begin: slice.as_ptr(),
len: slice.len(),
index: 0,
_marker: PhantomData,
}
}
}
impl<'a, T> Iterator for FingerIter<'a, T>
where T: 'a
{
type Item = (&'a [T], &'a T, &'a [T]);
fn next(&mut self) -> Option<Self::Item> {
if self.index >= self.len { return None; }
let slice: &'a [T] = unsafe {
std::slice::from_raw_parts(self.begin, self.len)
};
self.index += 1;
let result = slice.split_at(self.index);
Some((&result.0[0..self.index-1], result.0.last().unwrap(), result.1))
}
}
We use it as a building brick:
fn main() {
let data: Vec<Cell<u8>> = (1..100).map(|i| Cell::new(i)).collect();
for (_, element, tail) in FingerIter::new(&data) {
if element.get() % 2 == 0 {
if let Some(n3) = tail.iter().filter(|i| i.get() % 3 == 0).next() {
element.set(element.get() + n3.get());
}
}
}
let data: Vec<u8> = data.iter().map(|cell| cell.get()).collect();
println!("{:?}", data);
}
On the playpen this prints: [1, 5, 3, 10, 5, 15, 7, 17, 9, 22, ...], which seems correct.
Related
I am building up a library for generating the minimum perfect hash from a set of keys. The idea is to index the keys online without storing the full dataset in memory. Based on a user requirement, it is possible that skip_next() is not available and I want to fall back to using next(). Although it might be slower based on the speed of the iterator, it simplifies things for a general user.
My idea is to selectively iterate over all the elements generated by an iterator. This code works fine, but it requires a user to implement the trait FastIteration:
#[derive(Debug)]
struct Pixel {
r: Vec<i8>,
g: Vec<i8>,
b: Vec<i8>,
}
#[derive(Debug)]
struct Node {
r: i8,
g: i8,
b: i8,
}
struct PixelIterator<'a> {
pixel: &'a Pixel,
index: usize,
}
impl<'a> IntoIterator for &'a Pixel {
type Item = Node;
type IntoIter = PixelIterator<'a>;
fn into_iter(self) -> Self::IntoIter {
println!("Into &");
PixelIterator {
pixel: self,
index: 0,
}
}
}
impl<'a> Iterator for PixelIterator<'a> {
type Item = Node;
fn next(&mut self) -> Option<Node> {
println!("next &");
let result = match self.index {
0 | 1 | 2 | 3 => Node {
r: self.pixel.r[self.index],
g: self.pixel.g[self.index],
b: self.pixel.b[self.index],
},
_ => return None,
};
self.index += 1;
Some(result)
}
}
trait FastIteration {
fn skip_next(&mut self);
}
impl<'a> FastIteration for PixelIterator<'a> {
fn skip_next(&mut self) {
self.index += 1;
}
}
fn main() {
let p1 = Pixel {
r: vec![11, 21, 31, 41],
g: vec![12, 22, 32, 42],
b: vec![13, 23, 33, 43],
};
let mut index = 0;
let mut it = p1.into_iter();
loop {
if index == p1.r.len() {
break;
}
if index == 1 {
it.skip_next()
} else {
let val = it.next();
println!("{:?}", val);
}
index += 1;
}
}
How can one make the above program fall back to using the normal next() instead of skip_next() based on if the trait FastIteration is implemented or not?
fn fast_iterate<I>(objects: I)
where I: IntoIter + FastIteration { // should use skip_next() };
fn slow_iterate<I>(objects: I)
where I: IntoIter { // should NOT use skip_next(), use next() };
As above, one can always write two separate impl but is it possible to do this in one?
This question builds on:
Conditionally implement a Rust trait only if a type constraint is satisfied
Implement rayon `as_parallel_slice` using iterators.
You are looking for the unstable feature specialization:
#![feature(specialization)]
#[derive(Debug)]
struct Example(u8);
impl Iterator for Example {
type Item = u8;
fn next(&mut self) -> Option<u8> {
let v = self.0;
if v > 10 {
None
} else {
self.0 += 1;
Some(v)
}
}
}
trait FastIterator: Iterator {
fn skip_next(&mut self);
}
impl<I: Iterator> FastIterator for I {
default fn skip_next(&mut self) {
println!("step");
self.next();
}
}
impl FastIterator for Example {
fn skip_next(&mut self) {
println!("skip");
self.0 += 1;
}
}
fn main() {
let mut ex = Example(0);
ex.skip_next();
let mut r = 0..10;
r.skip_next();
}
I'm looking for a way to eliminate the temporary vector allocation in this example:
fn doit<T: Iterator<Item = Result<i32, &'static str>>>(name: &str, iter: T) {
println!(
"{}: {:?}",
name,
iter.collect::<Result<Vec<_>, _>>()
.map(|v| v.into_iter().min())
);
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
doit("without errors", without_errors.into_iter());
doit("with errors", with_errors.into_iter());
}
This is a variation of the iterator with error handling theme, except that I don't want to create a collection (so collect() doesn't quite do the job), but I want to perform further operations on the elements being iterated over.
Note that this gives the wrong result because Ok is less than Err:
fn doit<T: Iterator<Item = Result<i32, &'static str>>>(name: &str, iter: T) {
println!("{}: {:?}", name, iter.min());
}
It would give the right result for max() by accident, but it would not stop iterating on the first error.
Iterator::try_fold provides the framework for what you need, and it's available since Rust 1.27 (Playground):
fn fold_ok<I, T, E, F>(mut iter: I, f: F) -> Result<Option<T>, E>
where
I: Iterator<Item = Result<T, E>>,
T: Ord,
F: Fn(T, T) -> T,
{
iter.try_fold(None, |r, i| {
let i = i?;
Ok(Some(if let Some(r) = r { f(r, i) } else { i }))
})
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
fn doit<'r, T>(name: &str, iter: T)
where
T: Iterator<Item = &'r Result<i32, &'static str>> + Clone,
{
println!("{}: {:?}", name, fold_ok(iter.cloned(), ::std::cmp::min));
}
doit("without errors", without_errors.iter());
doit("with errors", with_errors.iter());
}
Before that, I think your only option is manually iterating (Playground)
fn fold_ok<I, T, E, F>(mut iter: I, f: F) -> Result<Option<T>, E>
where
I: Iterator<Item = Result<T, E>>,
T: Ord,
F: Fn(T, T) -> T,
{
let mut result = match iter.next() {
None => return Ok(None),
Some(r) => r?,
};
for item in iter {
result = f(result, item?);
}
Ok(Some(result))
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
fn doit<'r, T>(name: &str, iter: T)
where
T: Iterator<Item = &'r Result<i32, &'static str>> + Clone,
{
println!(
"{}: {:?}",
name,
fold_ok(iter.clone().cloned(), ::std::cmp::min)
);
}
doit("without errors", without_errors.iter());
doit("with errors", with_errors.iter());
}
"Lifting" a function to handle an iterator of results is a fairly common pattern and, as usual, itertools has a solution — process_results:
use itertools; // 0.8.0
fn doit(name: &str, iter: impl Iterator<Item = Result<i32, &'static str>>) {
let min = itertools::process_results(iter, |i| i.min());
println!("{}: {:?}", name, min);
}
This code began life as ResultShunt in the standard library before being extracted to itertools. It's what underlies the implementation of sum and product for iterators of Result.
It's possible to abuse collect() for this:
pub struct Min<T> {
value: Option<T>,
}
impl<T> Min<T> {
pub fn value(self) -> Option<T> {
self.value
}
}
impl<T> std::iter::FromIterator<T> for Min<T>
where
T: Ord,
{
fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self {
let mut iter = iter.into_iter();
match iter.next() {
None => Min { value: None },
Some(mut value) => {
for i in iter {
value = std::cmp::min(value, i);
}
Min { value: Some(value) }
}
}
}
}
This can be used via iter.collect::<Min<_>>().value(). This is a lot of machinery, and I don't see a way to abstract over it (so that you only need to supply std::cmp::min or some other semigroup operation).
I didn't look in the direction of Iterator::try_fold, which provides most of the machinery.
I am trying to build a binary tree and write an iterator to traverse values in the tree.
When implementing the IntoIterator trait for my tree nodes I ran into a problem with lifetimes
src\main.rs:43:6: 43:8 error: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates [E0207]
src\main.rs:43 impl<'a, T: 'a> IntoIterator for Node<T> {
I understand that I need to specify that NodeIterator will live as long as Node but I am unsure of how to express that
use std::cmp::PartialOrd;
use std::boxed::Box;
struct Node<T: PartialOrd> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
struct NodeIterator<'a, T: 'a + PartialOrd> {
current: &'a Node<T>,
parent: Option<&'a Node<T>>,
}
impl<T: PartialOrd> Node<T> {
pub fn insert(&mut self, value: T) {
...
}
}
impl<'a, T: 'a> IntoIterator for Node<T> { // line 43
type Item = T;
type IntoIter = NodeIterator<'a, T>;
fn into_iter(&self) -> Self::IntoIter {
NodeIterator::<'a> {
current: Some(&self),
parent: None
}
}
}
The particular error that you are getting is that 'a should appear on the right of for. Otherwise, how could the compiler know what a is?
When implementing IntoIterator you have to decide whether the iterator will consume the container, or whether it'll just produce references into it. At the moment, your setup is inconsistent, and the error message points it out.
In the case of a binary tree, you also have to think about which order you want to produce the values in: traditional orders are depth first (yielding a sorted sequence) and breadth first (exposing the "layers" of the tree). I'll assume depth first as it's the most common one.
Let's tackle the case of a consuming iterator first. It's simpler in the sense that we don't have to worry about lifetimes.
#![feature(box_patterns)]
struct Node<T: PartialOrd> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
struct NodeIterator<T: PartialOrd> {
stack: Vec<Node<T>>,
next: Option<T>,
}
impl<T: PartialOrd> IntoIterator for Node<T> {
type Item = T;
type IntoIter = NodeIterator<T>;
fn into_iter(self) -> Self::IntoIter {
let mut stack = Vec::new();
let smallest = pop_smallest(self, &mut stack);
NodeIterator { stack: stack, next: Some(smallest) }
}
}
impl<T: PartialOrd> Iterator for NodeIterator<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if let Some(next) = self.next.take() {
return Some(next);
}
if let Some(Node { value, right, .. }) = self.stack.pop() {
if let Some(right) = right {
let box right = right;
self.stack.push(right);
}
return Some(value);
}
None
}
}
fn pop_smallest<T: PartialOrd>(node: Node<T>, stack: &mut Vec<Node<T>>) -> T {
let Node { value, left, right } = node;
if let Some(left) = left {
stack.push(Node { value: value, left: None, right: right });
let box left = left;
return pop_smallest(left, stack);
}
if let Some(right) = right {
let box right = right;
stack.push(right);
}
value
}
fn main() {
let root = Node {
value: 3,
left: Some(Box::new(Node { value: 2, left: None, right: None })),
right: Some(Box::new(Node { value: 4, left: None, right: None }))
};
for t in root {
println!("{}", t);
}
}
Now, we can "easily" adapt it to the non-consuming case by sprinkling in the appropriate references:
struct RefNodeIterator<'a, T: PartialOrd + 'a> {
stack: Vec<&'a Node<T>>,
next: Option<&'a T>,
}
impl<'a, T: PartialOrd + 'a> IntoIterator for &'a Node<T> {
type Item = &'a T;
type IntoIter = RefNodeIterator<'a, T>;
fn into_iter(self) -> Self::IntoIter {
let mut stack = Vec::new();
let smallest = pop_smallest_ref(self, &mut stack);
RefNodeIterator { stack: stack, next: Some(smallest) }
}
}
impl<'a, T: PartialOrd + 'a> Iterator for RefNodeIterator<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<&'a T> {
if let Some(next) = self.next.take() {
return Some(next);
}
if let Some(node) = self.stack.pop() {
if let Some(ref right) = node.right {
self.stack.push(right);
}
return Some(&node.value);
}
None
}
}
fn pop_smallest_ref<'a, T>(node: &'a Node<T>, stack: &mut Vec<&'a Node<T>>) -> &'a T
where
T: PartialOrd + 'a
{
if let Some(ref left) = node.left {
stack.push(node);
return pop_smallest_ref(left, stack);
}
if let Some(ref right) = node.right {
stack.push(right);
}
&node.value
}
There's a lot to unpack in there; so take your time to digest it. Specifically:
the use of ref in Some(ref right) = node.right is because I don't want to consume node.right, only to obtain a reference inside the Option; the compiler will complain that I cannot move out of a borrowed object without it (so I just follow the complaints),
in stack.push(right), right: &'a Box<Node<T>> and yet stack: Vec<&'a Node<T>>; this is the magic of Deref: Box<T> implements Deref<T> so the compiler automatically transforms the reference as appropriate.
Note: I didn't write this code as-is; instead I just put the first few references where I expect them to be (such as the return type of Iterator) and then let the compiler guide me.
I am trying to create an mutable iterator for a vector of type: Vec<Vec<(K, V)>>
The iterator code:
pub struct IterMut<'a, K: 'a, V: 'a> {
iter: &'a mut Vec<Vec<(K, V)>>,
ix: usize,
inner_ix: usize,
}
impl<'a, K, V> Iterator for IterMut<'a, K, V> {
type Item = (&'a K, &'a mut V);
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
while self.iter.len() < self.ix {
while self.iter[self.ix].len() < self.inner_ix {
self.inner_ix += 1;
let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
return Some((&k, &mut v));
}
self.ix += 1;
}
return None;
}
}
The error I get is:
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
--> src/main.rs:16:42
|
16 | let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
| ^^^^^^^^^^^^^^^^^^
|
help: consider using an explicit lifetime parameter as shown: fn next(&'a mut self) -> Option<(&'a K, &'a mut V)>
--> src/main.rs:11:5
|
11 | fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
| ^
Apparently I have lifetime problems, but I don't know how to tell the compiler that this should work.
Is this how you should implement the mutable iterator or is there a better way?
When debugging cryptic error messages, I've found it easier to try and isolate the issue as much as possible.
The first step is to break the expression into its essential constituents, let's start by splitting the indexing steps:
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
while self.iter.len() < self.ix {
while self.iter[self.ix].len() < self.inner_ix {
self.inner_ix += 1;
let outer: &'a mut Vec<_> = self.iter;
let inner: &'a mut Vec<_> = &mut outer[self.ix];
let (ref k, ref mut v) = inner[self.inner_ix];
return Some((&k, &mut v));
}
self.ix += 1;
}
return None;
}
The Index trait assumes that the lifetime of its output is linked to that of its receiver, so to get a 'a lifetime we need the receiver to have a &'a lifetime, and it propagates upward, leading to the above code.
However there's an issue here: let outer: &'a mut Vec<_> = self.iter; will not compile because mutable references are not Copy.
So, how does one get a mutable reference from a mutable reference (which must be possible since IndexMut gets a mutable reference)?
One uses re-borrowing: let outer: &'a mut Vec<_> = &mut *self.iter;.
And, oh:
error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
--> <anon>:16:45
|
16 | let outer: &'a mut Vec<_> = &mut *self.iter;
| ^^^^^^^^^^^^^^^
|
The reborrowed reference is not valid for 'a, it's valid only for the (unnamed) lifetime of self!
Why Rust? Why?
Because doing otherwise would be unsafe.
&mut T is guaranteed NOT to be aliasing, however your method could create aliasing references (if you forgot to advance the index):
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
return Some((&k, &mut v));
}
And even if you don't, there's not guarantee that you don't have a rewind method that would allow "stepping back".
TL;DR: You were about to step on a landmine, you were steered toward Stack Overflow instead ;)
Alright, but how do you implement the iterator!.
Well, using iterators, of course. As Shepmaster (briefly) answers, there is the equivalent in the standard library already in the guise of FlatMap. The trick is to use existing iterators for the nitty-gritty details!
Something like:
use std::slice::IterMut;
pub struct MyIterMut<'a, K: 'a, V: 'a> {
outer: IterMut<'a, Vec<(K, V)>>,
inner: IterMut<'a, (K, V)>,
}
Then you consume from inner as long as it provides items, and when empty you refill it from outer.
impl<'a, K, V> MyIterMut<'a, K, V> {
fn new(v: &'a mut Vec<Vec<(K, V)>>) -> MyIterMut<'a, K, V> {
let mut outer = v.iter_mut();
let inner = outer.next()
.map(|v| v.iter_mut())
.unwrap_or_else(|| (&mut []).iter_mut());
MyIterMut { outer: outer, inner: inner }
}
}
impl<'a, K, V> Iterator for MyIterMut<'a, K, V> {
type Item = (&'a K, &'a mut V);
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
loop {
match self.inner.next() {
Some(r) => return Some((&r.0, &mut r.1)),
None => (),
}
match self.outer.next() {
Some(v) => self.inner = v.iter_mut(),
None => return None,
}
}
}
}
A quick test case:
fn main() {
let mut v = vec![
vec![(1, "1"), (2, "2")],
vec![],
vec![(3, "3")]
];
let iter = MyIterMut::new(&mut v);
let c: Vec<_> = iter.collect();
println!("{:?}", c);
}
Prints:
[(1, "1"), (2, "2"), (3, "3")]
as expected, so it's not completely broken, but I wish I did not have to rely on the &[] is 'static trick (ie, that std::slice::IterMut implemented Default).
You've provided no reason that you are reimplementing the standard Iterator::flat_map, so I'd just use that and another map to remove the mutability you don't need:
fn main() {
let mut a: Vec<Vec<(u8, u8)>> = Default::default();
let c = a.iter_mut()
.flat_map(|x| x.iter_mut())
.map(|&mut (ref a, ref mut b)| (a, b))
.count();
println!("{}", c);
}
Once you have that, you can just return the iterator in one of the many ways.
#[derive(Debug, Default)]
struct Thing<K, V>(Vec<Vec<(K, V)>>);
impl<K, V> Thing<K, V> {
fn iter_mut<'a>(&'a mut self) -> Box<Iterator<Item = (&'a K, &'a mut V)> + 'a> {
Box::new(self.0
.iter_mut()
.flat_map(|x| x.iter_mut())
.map(|&mut (ref a, ref mut b)| (a, b)))
}
}
fn main() {
let mut a = Thing::<u8, u8>::default();
let c = a.iter_mut().count();
println!("{}", c);
}
I'm trying to extend bluss's rust-itertools with SQL-like join iterators. I encountered a particular problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).
The iterator adaptor struct takes 2 input iterators of which the second (the right) is loaded into the HashMap. The iteration works as follows:
The item from the left iterator is matched against the map - in case of a match return both items
When the left iterator is exhausted, return the non-matched values from the map
The problem is with the second part where I tried to store the map's Values iterator along with the map to store its iteration state. But as I learned in this answer, it's not possible in rust.
Unfortunately I have no idea how it could be done otherwise.
Here is the complete code for the INNER JOIN adaptor, which does the first part:
use std::collections::HashMap;
use std::hash::Hash;
pub struct HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
left: I,
right: HashMap<K, V1>,
}
impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
/// Create a `HashJoinInner` iterator.
pub fn new<J>(l: I, r: J) -> Self
where J: Iterator<Item=(K, V1)>
{
let mut hm: HashMap<K, V1> = HashMap::new();
for (k, v) in r {
hm.insert(k, v);
}
HashJoinInner {
left: l,
right: hm,
}
}
}
impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
type Item = (V0, V1);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.left.next() {
Some((k0, v0)) => match self.right.get(&k0) {
Some(v1) => return Some((v0, Clone::clone(v1))),
None => continue,
},
None => return None,
}
}
}
}
I'll be grateful for any idea.
You cannot store the Values iterator because it contains references to the HashMap. These references could become invalid if you move the map. However, you can consume the HashMap using the into_iter method. That owns all the values of the HashMap and can be moved into a new struct.
Here's a tweaking of your code from the earlier question. This isn't yet a left or right join. There's complexity about the switch from being done with one iterator to finishing off the other iterator.
use std::collections::hash_map::{HashMap, IntoIter};
use std::hash::Hash;
struct Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
iter: IntoIter<K, (V, bool)>,
}
impl<K, V> Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
fn new<I>(it: I) -> Self
where I: Iterator<Item=(K, V)>
{
let mut map = HashMap::new();
for (k, v) in it {
map.insert(k, (v, false));
}
Foo { iter: map.into_iter() }
}
}
impl<K, V> Iterator for Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
type Item = V;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.iter.next() {
Some((_, (v, false))) => return Some(v.clone()),
Some(_) => continue,
None => return None,
}
}
}
}
fn main() {
let it = (0..).zip("AB".chars());
let foo = Foo::new(it);
for v in foo {
println!("{}", v);
}
}
However you don't need to do any of that to get what you want. You can simply create a hashmap and check it as you iterate over the other item. I accidentally created a left outer join, but just flip the arguments to get a right outer join. ^_^
use std::collections::hash_map::HashMap;
use std::hash::Hash;
struct LeftOuterJoin<L, K, RV> {
left: L,
right: HashMap<K, RV>,
}
impl<L, K, RV> LeftOuterJoin<L, K, RV>
where K: Hash + Eq
{
fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
LeftOuterJoin {
left: left.into_iter(),
right: right.into_iter().collect()
}
}
}
impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone
{
type Item = (K, LV, Option<RV>);
fn next(&mut self) -> Option<Self::Item> {
match self.left.next() {
Some((k, lv)) => {
let rv = self.right.get(&k);
Some((k, lv, rv.cloned()))
},
None => None,
}
}
}
fn main() {
let mut left = HashMap::new();
left.insert(1, "Alice");
left.insert(2, "Bob");
let mut right = HashMap::new();
right.insert(1, "Programmer");
for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) {
println!("{} ({}) is a {:?}", name, id, job);
}
}
Thanks to Shepmaster's idea of using std::collections::hash_map::IntoIter I've managed to solve the problem.
Here is the complete solution for RIGHT OUTER JOIN using the hash join strategy:
use std::collections::hash_map::{HashMap, IntoIter,};
use std::mem;
use std::hash::Hash;
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
pub struct HashJoinRightOuter<L, K, RV> {
left: L,
map: HashMap<K, (RV, bool)>,
/// exclusion iterator - yields the unmatched values from the map. It is created once the left
/// iterator is exhausted
excl_iter: Option<IntoIter<K, (RV, bool)>>,
}
impl<L, K, RV> HashJoinRightOuter<L, K, RV>
where K: Hash + Eq,
{
/// Create a `HashJoinRightOuter` iterator.
pub fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
let mut map: HashMap<K, (RV, bool)> = HashMap::new();
for (k, v) in right.into_iter() {
map.insert(k, (v, false));
}
HashJoinRightOuter {
left: left.into_iter(),
map: map,
excl_iter: None,
}
}
/// Moves the map to `self.excl_iter`
///
/// Once the left iterator is exhausted, the info about which keys were matched is complete.
/// To be able to iterate over map's values we need to move it into its `IntoIter`.
fn set_excl_iter(&mut self) {
let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new());
self.excl_iter = Some(map.into_iter());
}
}
impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone,
{
type Item = (Option<LV>, RV);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.excl_iter {
// the left iterator is not yet exhausted
None => match self.left.next() {
Some((lk, lv)) => match self.map.get_mut(&lk) {
Some(rt) => {
rt.1 = true; // flag as matched
return Some((Some(lv), Clone::clone(&rt.0)))
},
None => continue, // not interested in unmatched left value
},
// the left iterator is exhausted so move the map into `self.excl_iter`.
None => self.set_excl_iter(),
},
// iterate over unmatched values
Some(ref mut r) => match r.next() {
Some((_, (rv, matched))) => {
if !matched {
return Some((None, rv));
} else {
continue;
}
},
None => return None,
}
}
}
}
}
fn main() {
let a = (0..).zip("AB".chars());
let b = (1..).zip("XY".chars());
let mut it = HashJoinRightOuter::new(a, b);
assert_eq!(it.next(), Some((Some('B'), 'X')));
assert_eq!(it.next(), Some((None, 'Y')));
assert_eq!(it.next(), None);
}
At the beginning I failed because I tried to store both the data and it's reference in the same struct, which has no meaning anyway. What I really wanted was to store the data first, do some magic with it and once done, move it into another field to work with its transformation.
This can be used to solve other self-referencing struct problems as well.