An iterator adaptor implementing an SQL-like RIGHT OUTER JOIN using a HashMap - iterator

I'm trying to extend bluss's rust-itertools with SQL-like join iterators. I encountered a particular problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).
The iterator adaptor struct takes 2 input iterators of which the second (the right) is loaded into the HashMap. The iteration works as follows:
The item from the left iterator is matched against the map - in case of a match return both items
When the left iterator is exhausted, return the non-matched values from the map
The problem is with the second part where I tried to store the map's Values iterator along with the map to store its iteration state. But as I learned in this answer, it's not possible in rust.
Unfortunately I have no idea how it could be done otherwise.
Here is the complete code for the INNER JOIN adaptor, which does the first part:
use std::collections::HashMap;
use std::hash::Hash;
pub struct HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
left: I,
right: HashMap<K, V1>,
}
impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
/// Create a `HashJoinInner` iterator.
pub fn new<J>(l: I, r: J) -> Self
where J: Iterator<Item=(K, V1)>
{
let mut hm: HashMap<K, V1> = HashMap::new();
for (k, v) in r {
hm.insert(k, v);
}
HashJoinInner {
left: l,
right: hm,
}
}
}
impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
type Item = (V0, V1);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.left.next() {
Some((k0, v0)) => match self.right.get(&k0) {
Some(v1) => return Some((v0, Clone::clone(v1))),
None => continue,
},
None => return None,
}
}
}
}
I'll be grateful for any idea.

You cannot store the Values iterator because it contains references to the HashMap. These references could become invalid if you move the map. However, you can consume the HashMap using the into_iter method. That owns all the values of the HashMap and can be moved into a new struct.
Here's a tweaking of your code from the earlier question. This isn't yet a left or right join. There's complexity about the switch from being done with one iterator to finishing off the other iterator.
use std::collections::hash_map::{HashMap, IntoIter};
use std::hash::Hash;
struct Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
iter: IntoIter<K, (V, bool)>,
}
impl<K, V> Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
fn new<I>(it: I) -> Self
where I: Iterator<Item=(K, V)>
{
let mut map = HashMap::new();
for (k, v) in it {
map.insert(k, (v, false));
}
Foo { iter: map.into_iter() }
}
}
impl<K, V> Iterator for Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
type Item = V;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.iter.next() {
Some((_, (v, false))) => return Some(v.clone()),
Some(_) => continue,
None => return None,
}
}
}
}
fn main() {
let it = (0..).zip("AB".chars());
let foo = Foo::new(it);
for v in foo {
println!("{}", v);
}
}
However you don't need to do any of that to get what you want. You can simply create a hashmap and check it as you iterate over the other item. I accidentally created a left outer join, but just flip the arguments to get a right outer join. ^_^
use std::collections::hash_map::HashMap;
use std::hash::Hash;
struct LeftOuterJoin<L, K, RV> {
left: L,
right: HashMap<K, RV>,
}
impl<L, K, RV> LeftOuterJoin<L, K, RV>
where K: Hash + Eq
{
fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
LeftOuterJoin {
left: left.into_iter(),
right: right.into_iter().collect()
}
}
}
impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone
{
type Item = (K, LV, Option<RV>);
fn next(&mut self) -> Option<Self::Item> {
match self.left.next() {
Some((k, lv)) => {
let rv = self.right.get(&k);
Some((k, lv, rv.cloned()))
},
None => None,
}
}
}
fn main() {
let mut left = HashMap::new();
left.insert(1, "Alice");
left.insert(2, "Bob");
let mut right = HashMap::new();
right.insert(1, "Programmer");
for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) {
println!("{} ({}) is a {:?}", name, id, job);
}
}

Thanks to Shepmaster's idea of using std::collections::hash_map::IntoIter I've managed to solve the problem.
Here is the complete solution for RIGHT OUTER JOIN using the hash join strategy:
use std::collections::hash_map::{HashMap, IntoIter,};
use std::mem;
use std::hash::Hash;
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
pub struct HashJoinRightOuter<L, K, RV> {
left: L,
map: HashMap<K, (RV, bool)>,
/// exclusion iterator - yields the unmatched values from the map. It is created once the left
/// iterator is exhausted
excl_iter: Option<IntoIter<K, (RV, bool)>>,
}
impl<L, K, RV> HashJoinRightOuter<L, K, RV>
where K: Hash + Eq,
{
/// Create a `HashJoinRightOuter` iterator.
pub fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
let mut map: HashMap<K, (RV, bool)> = HashMap::new();
for (k, v) in right.into_iter() {
map.insert(k, (v, false));
}
HashJoinRightOuter {
left: left.into_iter(),
map: map,
excl_iter: None,
}
}
/// Moves the map to `self.excl_iter`
///
/// Once the left iterator is exhausted, the info about which keys were matched is complete.
/// To be able to iterate over map's values we need to move it into its `IntoIter`.
fn set_excl_iter(&mut self) {
let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new());
self.excl_iter = Some(map.into_iter());
}
}
impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone,
{
type Item = (Option<LV>, RV);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.excl_iter {
// the left iterator is not yet exhausted
None => match self.left.next() {
Some((lk, lv)) => match self.map.get_mut(&lk) {
Some(rt) => {
rt.1 = true; // flag as matched
return Some((Some(lv), Clone::clone(&rt.0)))
},
None => continue, // not interested in unmatched left value
},
// the left iterator is exhausted so move the map into `self.excl_iter`.
None => self.set_excl_iter(),
},
// iterate over unmatched values
Some(ref mut r) => match r.next() {
Some((_, (rv, matched))) => {
if !matched {
return Some((None, rv));
} else {
continue;
}
},
None => return None,
}
}
}
}
}
fn main() {
let a = (0..).zip("AB".chars());
let b = (1..).zip("XY".chars());
let mut it = HashJoinRightOuter::new(a, b);
assert_eq!(it.next(), Some((Some('B'), 'X')));
assert_eq!(it.next(), Some((None, 'Y')));
assert_eq!(it.next(), None);
}
At the beginning I failed because I tried to store both the data and it's reference in the same struct, which has no meaning anyway. What I really wanted was to store the data first, do some magic with it and once done, move it into another field to work with its transformation.
This can be used to solve other self-referencing struct problems as well.

Related

what technique should I use to create a Rust HashMap composite key that is hard to construct because it requires cloning multiple elements?

I am doing some dynamic programming, and I'd like to store already-computed values in a HashMap. Unfortunately, the key is a composite value, and somewhat expensive to construct:
#[derive(Eq, PartialEq, Hash)]
struct CostKey {
roots: Vec<usize>,
plans: Vec<Option<RegionPlanCandidate>>,
}
//used like
pub(super) fn cost_for(
&self,
roots: &[usize],
plans: &[Option<RegionPlanCandidate>],
) -> PlanCostLog {
let key = CostKey {
roots:Vec::from(roots),
plans:Vec::from(plans),
};
if let Some(cost) = self.cost_cache.borrow().get(&key) {
return (*cost).clone();
}
...
let rval = PlanCostLog::CrackSum(cost_log);
self.cost_cache.borrow_mut().insert(key, rval.clone());
rval
}
Even with this expensive implementation, I was able to cut the compute time for one of my examples by half. But cargo flamegraph shows that I'm still spending a non-trivial amount of time on the Vec::from calls.
If the key was not composite, and was just a reference, then the .raw_entry_mut().from_key(&key) would apply, but the nature of my key is problematic.
Theoretically, a map should be able to check Eq and Hash using just the borrowed roots and plans but I am not sure how to accomplish it with the existing APIs. How can I speed up the gets and only clone the slices when I need to insert?
Given your types, there isn't a huge amount you can do to improve the efficiency of HashMap::get here.
If your types were simpler, the way you could try to do this is to create another type which doesn't own its data, but hashes the same and can be compared for equality with CostKey, something like this:
#[derive(Eq, PartialEq, Hash)]
struct CostKeyRef<'a> {
roots: &'a [usize],
plans: &'a [Option<RegionPlanCandidate>],
}
impl<'a> PartialEq<CostKey> for CostKeyRef<'a> {
fn eq(&self, other: &CostKey) -> bool {
self.roots == &other.roots && self.plans == &other.plans
}
}
However, a problem arises when you try to implement Borrow<CostKeyRef<'a>> for CostKey. This is required for various HashMap methods, but can't be implemented because the types contain two fields. There isn't a way to coerce a &CostKey into a &CostKeyRef because their layouts are just incompatible.
You may be able to alter your types so that this is possible, but this is not advisable for a Rust beginner as it would require a good understanding of how data and, in particular, references and fat pointers are laid out in memory.
So what can you do?
Well, if your hash map is relatively small, you can use a linear probe instead. Exactly how small "relatively small" is will need to be discovered through measurement, but it will certainly be larger than 100 items, and quite possibly in the 1000's or more, depending on how much overhead all of that allocation actually has.
Using the same type as above (and simplifying your code in general, for the sake of illustration), you can do something like this:
use std::collections::HashMap;
#[derive(Eq, PartialEq, Hash)]
struct CostKey {
roots: Vec<usize>,
plans: Vec<Option<RegionPlanCandidate>>,
}
#[derive(Eq, PartialEq, Hash, Clone)]
struct RegionPlanCandidate;
struct Thing {
cost_cache: HashMap<CostKey, i64>,
}
impl Thing {
fn cost_for(&mut self, roots: &[usize], plans: &[Option<RegionPlanCandidate>]) -> i64 {
let key = CostKeyRef { roots, plans };
if let Some(cost) =
self.cost_cache
.iter()
.find_map(|(cost_key, cost)| (&key == cost_key).then(|| *cost))
{
return cost;
}
let rval = 12345;
self.cost_cache.insert(
CostKey {
roots: roots.to_vec(),
plans: plans.to_vec(),
},
12345,
);
rval
}
}
As a stopgap measure, I have created a struct DualKeyHashMap that provides the features I need. I cloned many fragments from the regular HashMap implementation, and it has only the two methods I need.
use hashbrown::raw::RawTable;
use hashbrown::hash_map::DefaultHashBuilder;
use std::borrow::Borrow;
use std::hash::{BuildHasher, Hash};
use std::mem;
/// copy of hashbrown::hash_map::make_hash()
#[cfg_attr(feature = "inline-more", inline)]
pub(crate) fn make_hash<K, Q, S>(hash_builder: &S, val: &Q) -> u64
where
K: Borrow<Q>,
Q: Hash + ?Sized,
S: BuildHasher,
{
use core::hash::Hasher;
let mut state = hash_builder.build_hasher();
val.hash(&mut state);
state.finish()
}
/// copy of hashbrown::hash_map::make_hasher()
#[cfg_attr(feature = "inline-more", inline)]
pub(crate) fn make_hasher<K, Q, V, S>(hash_builder: &S) -> impl Fn(&(Q, V)) -> u64 + '_
where
K: Borrow<Q>,
Q: Hash,
S: BuildHasher,
{
move |val| make_hash::<K, Q, S>(hash_builder, &val.0)
}
/// Ensures that a single closure type across uses of this which, in turn prevents multiple
/// instances of any functions like RawTable::reserve from being generated
#[cfg_attr(feature = "inline-more", inline)]
fn equivalent_key<Q, K, V>(k: &Q) -> impl Fn(&(K, V)) -> bool + '_
where
K: Borrow<Q>,
Q: ?Sized + Eq,
{
move |x| k.eq(x.0.borrow())
}
//
pub trait AlternateKey<O>: Hash {
fn eq(&self, arg: &O) -> bool;
}
//
pub struct DualKeyHashMap<K, V, S = DefaultHashBuilder> {
hash_builder: S,
base: RawTable<(K, V)>,
}
impl<K: Hash + Eq, V, S: BuildHasher+Default> DualKeyHashMap<K, V, S> {
pub fn new() -> DualKeyHashMap<K, V, S>
{
Self::default()
}
}
impl<K: Hash + Eq, V, S: BuildHasher+Default> Default for DualKeyHashMap<K, V, S> {
fn default() -> DualKeyHashMap<K, V, S>
{
DualKeyHashMap {
hash_builder: Default::default(),
base: Default::default(),
}
}
}
impl<K: Hash + Eq, V, S: BuildHasher> DualKeyHashMap<K, V, S> {
pub fn insert(&mut self, key1: K, val1: V) -> Option<V> {
let hash = make_hash::<K, _, S>(&self.hash_builder, &key1);
//println!("{}", hash);
if let Some((_, item)) = self.base.get_mut(hash, equivalent_key(&key1)) {
Some(mem::replace(item, val1))
} else {
self.base.insert(
hash,
(key1, val1),
make_hasher::<K, _, V, S>(&self.hash_builder),
);
None
}
}
#[inline]
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where
K: Borrow<Q>,
Q: Hash + Eq,
{
// Avoid `Option::map` because it bloats LLVM IR.
match self.get_inner(k) {
Some(&(_, ref v)) => Some(v),
None => None,
}
}
fn get_inner<Q: ?Sized>(&self, k: &Q) -> Option<&(K, V)>
where
K: Borrow<Q>,
Q: Hash + Eq,
{
let hash = make_hash::<K, Q, S>(&self.hash_builder, k);
self.base.get(hash, equivalent_key(k))
}
pub fn get2<M>(&self, k: &M) -> Option<&V>
where
M: AlternateKey<K>,
{
let hash = make_hash::<M, M, S>(&self.hash_builder, k);
match self
.base
.get(hash, |(k2, _)| <M as AlternateKey<K>>::eq(k, &k2))
{
Some(&(_, ref v)) => Some(v),
None => None,
}
}
pub fn len(&self)->usize
{
self.base.len()
}
}
I do not consider this a proper answer since it requires an entirely new hash map implementation which is extremely incomplete.

How can a function conditionally fall back to a trait if another trait is implemented or not?

I am building up a library for generating the minimum perfect hash from a set of keys. The idea is to index the keys online without storing the full dataset in memory. Based on a user requirement, it is possible that skip_next() is not available and I want to fall back to using next(). Although it might be slower based on the speed of the iterator, it simplifies things for a general user.
My idea is to selectively iterate over all the elements generated by an iterator. This code works fine, but it requires a user to implement the trait FastIteration:
#[derive(Debug)]
struct Pixel {
r: Vec<i8>,
g: Vec<i8>,
b: Vec<i8>,
}
#[derive(Debug)]
struct Node {
r: i8,
g: i8,
b: i8,
}
struct PixelIterator<'a> {
pixel: &'a Pixel,
index: usize,
}
impl<'a> IntoIterator for &'a Pixel {
type Item = Node;
type IntoIter = PixelIterator<'a>;
fn into_iter(self) -> Self::IntoIter {
println!("Into &");
PixelIterator {
pixel: self,
index: 0,
}
}
}
impl<'a> Iterator for PixelIterator<'a> {
type Item = Node;
fn next(&mut self) -> Option<Node> {
println!("next &");
let result = match self.index {
0 | 1 | 2 | 3 => Node {
r: self.pixel.r[self.index],
g: self.pixel.g[self.index],
b: self.pixel.b[self.index],
},
_ => return None,
};
self.index += 1;
Some(result)
}
}
trait FastIteration {
fn skip_next(&mut self);
}
impl<'a> FastIteration for PixelIterator<'a> {
fn skip_next(&mut self) {
self.index += 1;
}
}
fn main() {
let p1 = Pixel {
r: vec![11, 21, 31, 41],
g: vec![12, 22, 32, 42],
b: vec![13, 23, 33, 43],
};
let mut index = 0;
let mut it = p1.into_iter();
loop {
if index == p1.r.len() {
break;
}
if index == 1 {
it.skip_next()
} else {
let val = it.next();
println!("{:?}", val);
}
index += 1;
}
}
How can one make the above program fall back to using the normal next() instead of skip_next() based on if the trait FastIteration is implemented or not?
fn fast_iterate<I>(objects: I)
where I: IntoIter + FastIteration { // should use skip_next() };
fn slow_iterate<I>(objects: I)
where I: IntoIter { // should NOT use skip_next(), use next() };
As above, one can always write two separate impl but is it possible to do this in one?
This question builds on:
Conditionally implement a Rust trait only if a type constraint is satisfied
Implement rayon `as_parallel_slice` using iterators.
You are looking for the unstable feature specialization:
#![feature(specialization)]
#[derive(Debug)]
struct Example(u8);
impl Iterator for Example {
type Item = u8;
fn next(&mut self) -> Option<u8> {
let v = self.0;
if v > 10 {
None
} else {
self.0 += 1;
Some(v)
}
}
}
trait FastIterator: Iterator {
fn skip_next(&mut self);
}
impl<I: Iterator> FastIterator for I {
default fn skip_next(&mut self) {
println!("step");
self.next();
}
}
impl FastIterator for Example {
fn skip_next(&mut self) {
println!("skip");
self.0 += 1;
}
}
fn main() {
let mut ex = Example(0);
ex.skip_next();
let mut r = 0..10;
r.skip_next();
}

Implement IntoIterator for binary tree

I am trying to build a binary tree and write an iterator to traverse values in the tree.
When implementing the IntoIterator trait for my tree nodes I ran into a problem with lifetimes
src\main.rs:43:6: 43:8 error: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates [E0207]
src\main.rs:43 impl<'a, T: 'a> IntoIterator for Node<T> {
I understand that I need to specify that NodeIterator will live as long as Node but I am unsure of how to express that
use std::cmp::PartialOrd;
use std::boxed::Box;
struct Node<T: PartialOrd> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
struct NodeIterator<'a, T: 'a + PartialOrd> {
current: &'a Node<T>,
parent: Option<&'a Node<T>>,
}
impl<T: PartialOrd> Node<T> {
pub fn insert(&mut self, value: T) {
...
}
}
impl<'a, T: 'a> IntoIterator for Node<T> { // line 43
type Item = T;
type IntoIter = NodeIterator<'a, T>;
fn into_iter(&self) -> Self::IntoIter {
NodeIterator::<'a> {
current: Some(&self),
parent: None
}
}
}
The particular error that you are getting is that 'a should appear on the right of for. Otherwise, how could the compiler know what a is?
When implementing IntoIterator you have to decide whether the iterator will consume the container, or whether it'll just produce references into it. At the moment, your setup is inconsistent, and the error message points it out.
In the case of a binary tree, you also have to think about which order you want to produce the values in: traditional orders are depth first (yielding a sorted sequence) and breadth first (exposing the "layers" of the tree). I'll assume depth first as it's the most common one.
Let's tackle the case of a consuming iterator first. It's simpler in the sense that we don't have to worry about lifetimes.
#![feature(box_patterns)]
struct Node<T: PartialOrd> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
struct NodeIterator<T: PartialOrd> {
stack: Vec<Node<T>>,
next: Option<T>,
}
impl<T: PartialOrd> IntoIterator for Node<T> {
type Item = T;
type IntoIter = NodeIterator<T>;
fn into_iter(self) -> Self::IntoIter {
let mut stack = Vec::new();
let smallest = pop_smallest(self, &mut stack);
NodeIterator { stack: stack, next: Some(smallest) }
}
}
impl<T: PartialOrd> Iterator for NodeIterator<T> {
type Item = T;
fn next(&mut self) -> Option<T> {
if let Some(next) = self.next.take() {
return Some(next);
}
if let Some(Node { value, right, .. }) = self.stack.pop() {
if let Some(right) = right {
let box right = right;
self.stack.push(right);
}
return Some(value);
}
None
}
}
fn pop_smallest<T: PartialOrd>(node: Node<T>, stack: &mut Vec<Node<T>>) -> T {
let Node { value, left, right } = node;
if let Some(left) = left {
stack.push(Node { value: value, left: None, right: right });
let box left = left;
return pop_smallest(left, stack);
}
if let Some(right) = right {
let box right = right;
stack.push(right);
}
value
}
fn main() {
let root = Node {
value: 3,
left: Some(Box::new(Node { value: 2, left: None, right: None })),
right: Some(Box::new(Node { value: 4, left: None, right: None }))
};
for t in root {
println!("{}", t);
}
}
Now, we can "easily" adapt it to the non-consuming case by sprinkling in the appropriate references:
struct RefNodeIterator<'a, T: PartialOrd + 'a> {
stack: Vec<&'a Node<T>>,
next: Option<&'a T>,
}
impl<'a, T: PartialOrd + 'a> IntoIterator for &'a Node<T> {
type Item = &'a T;
type IntoIter = RefNodeIterator<'a, T>;
fn into_iter(self) -> Self::IntoIter {
let mut stack = Vec::new();
let smallest = pop_smallest_ref(self, &mut stack);
RefNodeIterator { stack: stack, next: Some(smallest) }
}
}
impl<'a, T: PartialOrd + 'a> Iterator for RefNodeIterator<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<&'a T> {
if let Some(next) = self.next.take() {
return Some(next);
}
if let Some(node) = self.stack.pop() {
if let Some(ref right) = node.right {
self.stack.push(right);
}
return Some(&node.value);
}
None
}
}
fn pop_smallest_ref<'a, T>(node: &'a Node<T>, stack: &mut Vec<&'a Node<T>>) -> &'a T
where
T: PartialOrd + 'a
{
if let Some(ref left) = node.left {
stack.push(node);
return pop_smallest_ref(left, stack);
}
if let Some(ref right) = node.right {
stack.push(right);
}
&node.value
}
There's a lot to unpack in there; so take your time to digest it. Specifically:
the use of ref in Some(ref right) = node.right is because I don't want to consume node.right, only to obtain a reference inside the Option; the compiler will complain that I cannot move out of a borrowed object without it (so I just follow the complaints),
in stack.push(right), right: &'a Box<Node<T>> and yet stack: Vec<&'a Node<T>>; this is the magic of Deref: Box<T> implements Deref<T> so the compiler automatically transforms the reference as appropriate.
Note: I didn't write this code as-is; instead I just put the first few references where I expect them to be (such as the return type of Iterator) and then let the compiler guide me.

Iterating generically over either a map or a vector of 2-tuples

Because of reasons, I want to define a generic function that can iterate over key-value pairs expressed either as a mapping, or as a vector of 2-tuples (or anything else that satisfies IntoIterator<Item=(K, V)>, where K and V are stringy). Concretely, I want this to work:
use std::collections::HashMap;
fn main() {
let vc = vec![
("a", "foo"),
("b", "bar"),
("c", "baz")
];
operate(&vc);
let mut map = HashMap::new();
map.insert("d", "blurf");
map.insert("e", "quux");
map.insert("f", "xyzzy");
operate(&map);
}
I've got a definition of operate that works for the HashMap, but not for the vector:
fn operate<I, K, V>(x: I)
where I: IntoIterator<Item=(K, V)>,
K: AsRef<str>, V: AsRef<str>
{
for (ref k, ref v) in x {
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
The error message I get is
error[E0271]: type mismatch resolving `<&std::vec::Vec<(&str, &str)> as std::iter::IntoIterator>::Item == (_, _)`
--> test.rs:18:5
|
18 | operate(&vc);
| ^^^^^^^ expected reference, found tuple
|
= note: expected type `&(&str, &str)`
= note: found type `(_, _)`
= note: required by `operate`
and I don't understand it at all. For one thing, it seems like it's backwards, and for another, why am I only getting an error for the Vec and not the HashMap?
The function provided by IntoIterator consumes self.
fn into_iter(self) -> Self::IntoIter
In order to allow the use of IntoIterator without consuming the collection, both Vec and HashMap have implementations of IntoIterator for &'a Vec<T> and &'a HashMap<K,V,S>, respectively. However, they are not quite the same.
For the hash map, each Item is a (&K, &V), which does not impose a problem because the code effectively assumes the items as 2-sized tuples of keys and values that coerce to &str. And &&str does indeed coerce to &str.
For the vector, each Item is a &T (thus &(K, V) in this case), but because the function is expecting (K, V) as the iterating item, it is currently unable to deal with items of &(K, V).
As it is, the function works if you move the vector, which yields an IntoIterator where Item = (K, V):
let vc = vec![
("a", "foo"),
("b", "bar"),
("c", "baz")
];
operate(vc);
But what if we want it to work for both collections without consuming any of them? Well, I just devised two solutions.
#1
This one involves hiding the tuple behind a new trait:
/// for stuff that can be turned into a pair of references
trait AsRefPair<K, V> {
fn as_ref_pair(&self) -> (&K, &V);
}
Implementing it for for &(K,V) and (&K,&V):
impl<'a, K, V> AsRefPair<K, V> for (&'a K, &'a V) {
fn as_ref_pair(&self) -> (&K, &V) {
(self.0, self.1)
}
}
impl<'a, K, V> AsRefPair<K, V> for &'a (K, V) {
fn as_ref_pair(&self) -> (&K, &V) {
(&self.0, &self.1)
}
}
And now this function works:
fn operate<I, T, K, V>(x: I)
where I: IntoIterator<Item=T>,
T: AsRefPair<K, V>,
K: AsRef<str>, V: AsRef<str>
{
for p in x {
let (ref k, ref v) = p.as_ref_pair();
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
Playground. It might sound a bit crazy at first, but...!
#2
In this one, just stop working with tuples... and start working with key-values!
trait KeyValue<K, V> {
fn key_value(&self) -> (&K, &V) {
(self.key(), self.value())
}
fn key(&self) -> &K;
fn value(&self) -> &V;
}
impl<K, V> KeyValue<K, V> for (K, V) {
fn key(&self) -> &K {
&self.0
}
fn value(&self) -> &V {
&self.1
}
}
impl<'a, K, V> KeyValue<K, V> for &'a (K, V) {
fn key(&self) -> &K {
&self.0
}
fn value(&self) -> &V {
&self.1
}
}
fn operate<I, T, K, V>(x: I)
where I: IntoIterator<Item=T>,
T: KeyValue<K, V>,
K: AsRef<str>, V: AsRef<str>
{
for p in x {
let (ref k, ref v) = p.key_value();
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
Playground. I find this one a bit more idiomatic.
If you pass to the function operate() an iterator instead of a reference to vector, you can use Iterator adaptors to convert Iterator::Item to what you need:
operate(vc.iter().map(|&(ref a, ref b)| (a, b)));

Rust iterators and looking forward (peek/multipeek)

I am trying to use a pattern with iterators in Rust and falling down somewhere, apparently simple.
I would like to iterate through a container and find an element with a predicate [A] (simple), but then look forward using another predicate and get that value [B] and use [B] to mutate [A] in some way. In this case [A] is mutable and [B] can be immutable; this makes no difference to me, only to the borrow checker (rightly).
It would help to understand this with a simple scenario, so I have added a small snippet to let folk see the issue/attempted goal. I have played with itertools and breaking into for/while loops, although I want to remain as idiomatic as possible.
Silly Example scenario
Lookup an even number, find next number that is divisible by 3 and add to the initial number.
#[allow(unused)]
fn is_div_3(num: &u8) -> bool {
num % 3 == 0
}
fn main() {
let mut data: Vec<u8> = (0..100).collect();
let count = data.iter_mut()
.map(|x| {
if *x % 2 == 0 {
// loop through numbers forward to next is_div_3,
// then x = x + that number
}
true
})
.count();
println!("data {:?}, count was {} ", data, count);
}
playground
Sadly I'm a bit late, but here goes.
It's not totally pretty, but it's not as bad as the other suggestion:
let mut data: Vec<u8> = (1..100).collect();
{
let mut mut_items = data.iter_mut();
while let Some(x) = mut_items.next() {
if *x % 2 == 0 {
let slice = mut_items.into_slice();
*x += *slice.iter().find(|&x| x % 3 == 0).unwrap();
mut_items = slice.iter_mut();
}
}
}
println!("{:?}", data);
gives
[1, 5, 3, 10, 5, 15, 7, 17, 9, 22, ...]
as with Matthieu M.'s solution.
The key is to use mut_items.into_slice() to "reborrow" the iterator, effectively producing a local (and thus safe) clone of the iterator.
Warning: The iterator presented right below is unsafe because it allows one to obtain multiple aliases to a single mutable element; skip to the second part for the corrected version. (It would be alright if the return type contained immutable references).
If you are willing to write your own window iterator, then it becomes quite easy.
First, the iterator in all its gory details:
use std::marker::PhantomData;
struct WindowIterMut<'a, T>
where T: 'a
{
begin: *mut T,
len: usize,
index: usize,
_marker: PhantomData<&'a mut [T]>,
}
impl<'a, T> WindowIterMut<'a, T>
where T: 'a
{
pub fn new(slice: &'a mut [T]) -> WindowIterMut<'a, T> {
WindowIterMut {
begin: slice.as_mut_ptr(),
len: slice.len(),
index: 0,
_marker: PhantomData,
}
}
}
impl<'a, T> Iterator for WindowIterMut<'a, T>
where T: 'a
{
type Item = (&'a mut [T], &'a mut [T]);
fn next(&mut self) -> Option<Self::Item> {
if self.index > self.len { return None; }
let slice: &'a mut [T] = unsafe {
std::slice::from_raw_parts_mut(self.begin, self.len)
};
let result = slice.split_at_mut(self.index);
self.index += 1;
Some(result)
}
}
Invoked on [1, 2, 3] it will return (&[], &[1, 2, 3]) then (&[1], &[2, 3]), ... until (&[1, 2, 3], &[]). In short, it iterates over all the potential partitions of the slice (without shuffling).
Which is safe to use as:
fn main() {
let mut data: Vec<u8> = (1..100).collect();
for (head, tail) in WindowIterMut::new(&mut data) {
if let Some(element) = head.last_mut() {
if *element % 2 == 0 {
if let Some(n3) = tail.iter().filter(|i| *i % 3 == 0).next() {
*element += *n3;
}
}
}
}
println!("{:?}", data);
}
Unfortunately it can also be used as:
fn main() {
let mut data: Vec<u8> = (1..100).collect();
let mut it = WindowIterMut::new(&mut data);
let first_0 = { it.next(); &mut it.next().unwrap().0[0] };
let second_0 = &mut it.next().unwrap().0[0];
println!("{:?} {:?}", first_0 as *const _, second_0 as *const _);
}
which when run print: 0x7f73a8435000 0x7f73a8435000, show-casing that both mutable references alias the same element.
Since we cannot get rid of aliasing, we need to get rid of mutability; or at least defer to interior mutability (Cell here since u8 is Copy).
Fortunately, Cell has no runtime cost, but it does cost a bit in ergonomics (all those .get() and .set()).
I take the opportunity to make the iterator slightly more generic too, and rename it since Window is already a used name for a different concept.
struct FingerIter<'a, T>
where T: 'a
{
begin: *const T,
len: usize,
index: usize,
_marker: PhantomData<&'a [T]>,
}
impl<'a, T> FingerIter<'a, T>
where T: 'a
{
pub fn new(slice: &'a [T]) -> FingerIter<'a, T> {
FingerIter {
begin: slice.as_ptr(),
len: slice.len(),
index: 0,
_marker: PhantomData,
}
}
}
impl<'a, T> Iterator for FingerIter<'a, T>
where T: 'a
{
type Item = (&'a [T], &'a T, &'a [T]);
fn next(&mut self) -> Option<Self::Item> {
if self.index >= self.len { return None; }
let slice: &'a [T] = unsafe {
std::slice::from_raw_parts(self.begin, self.len)
};
self.index += 1;
let result = slice.split_at(self.index);
Some((&result.0[0..self.index-1], result.0.last().unwrap(), result.1))
}
}
We use it as a building brick:
fn main() {
let data: Vec<Cell<u8>> = (1..100).map(|i| Cell::new(i)).collect();
for (_, element, tail) in FingerIter::new(&data) {
if element.get() % 2 == 0 {
if let Some(n3) = tail.iter().filter(|i| i.get() % 3 == 0).next() {
element.set(element.get() + n3.get());
}
}
}
let data: Vec<u8> = data.iter().map(|cell| cell.get()).collect();
println!("{:?}", data);
}
On the playpen this prints: [1, 5, 3, 10, 5, 15, 7, 17, 9, 22, ...], which seems correct.