Iterating generically over either a map or a vector of 2-tuples - iterator

Because of reasons, I want to define a generic function that can iterate over key-value pairs expressed either as a mapping, or as a vector of 2-tuples (or anything else that satisfies IntoIterator<Item=(K, V)>, where K and V are stringy). Concretely, I want this to work:
use std::collections::HashMap;
fn main() {
let vc = vec![
("a", "foo"),
("b", "bar"),
("c", "baz")
];
operate(&vc);
let mut map = HashMap::new();
map.insert("d", "blurf");
map.insert("e", "quux");
map.insert("f", "xyzzy");
operate(&map);
}
I've got a definition of operate that works for the HashMap, but not for the vector:
fn operate<I, K, V>(x: I)
where I: IntoIterator<Item=(K, V)>,
K: AsRef<str>, V: AsRef<str>
{
for (ref k, ref v) in x {
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
The error message I get is
error[E0271]: type mismatch resolving `<&std::vec::Vec<(&str, &str)> as std::iter::IntoIterator>::Item == (_, _)`
--> test.rs:18:5
|
18 | operate(&vc);
| ^^^^^^^ expected reference, found tuple
|
= note: expected type `&(&str, &str)`
= note: found type `(_, _)`
= note: required by `operate`
and I don't understand it at all. For one thing, it seems like it's backwards, and for another, why am I only getting an error for the Vec and not the HashMap?

The function provided by IntoIterator consumes self.
fn into_iter(self) -> Self::IntoIter
In order to allow the use of IntoIterator without consuming the collection, both Vec and HashMap have implementations of IntoIterator for &'a Vec<T> and &'a HashMap<K,V,S>, respectively. However, they are not quite the same.
For the hash map, each Item is a (&K, &V), which does not impose a problem because the code effectively assumes the items as 2-sized tuples of keys and values that coerce to &str. And &&str does indeed coerce to &str.
For the vector, each Item is a &T (thus &(K, V) in this case), but because the function is expecting (K, V) as the iterating item, it is currently unable to deal with items of &(K, V).
As it is, the function works if you move the vector, which yields an IntoIterator where Item = (K, V):
let vc = vec![
("a", "foo"),
("b", "bar"),
("c", "baz")
];
operate(vc);
But what if we want it to work for both collections without consuming any of them? Well, I just devised two solutions.
#1
This one involves hiding the tuple behind a new trait:
/// for stuff that can be turned into a pair of references
trait AsRefPair<K, V> {
fn as_ref_pair(&self) -> (&K, &V);
}
Implementing it for for &(K,V) and (&K,&V):
impl<'a, K, V> AsRefPair<K, V> for (&'a K, &'a V) {
fn as_ref_pair(&self) -> (&K, &V) {
(self.0, self.1)
}
}
impl<'a, K, V> AsRefPair<K, V> for &'a (K, V) {
fn as_ref_pair(&self) -> (&K, &V) {
(&self.0, &self.1)
}
}
And now this function works:
fn operate<I, T, K, V>(x: I)
where I: IntoIterator<Item=T>,
T: AsRefPair<K, V>,
K: AsRef<str>, V: AsRef<str>
{
for p in x {
let (ref k, ref v) = p.as_ref_pair();
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
Playground. It might sound a bit crazy at first, but...!
#2
In this one, just stop working with tuples... and start working with key-values!
trait KeyValue<K, V> {
fn key_value(&self) -> (&K, &V) {
(self.key(), self.value())
}
fn key(&self) -> &K;
fn value(&self) -> &V;
}
impl<K, V> KeyValue<K, V> for (K, V) {
fn key(&self) -> &K {
&self.0
}
fn value(&self) -> &V {
&self.1
}
}
impl<'a, K, V> KeyValue<K, V> for &'a (K, V) {
fn key(&self) -> &K {
&self.0
}
fn value(&self) -> &V {
&self.1
}
}
fn operate<I, T, K, V>(x: I)
where I: IntoIterator<Item=T>,
T: KeyValue<K, V>,
K: AsRef<str>, V: AsRef<str>
{
for p in x {
let (ref k, ref v) = p.key_value();
println!("{}: {}", k.as_ref(), v.as_ref());
}
}
Playground. I find this one a bit more idiomatic.

If you pass to the function operate() an iterator instead of a reference to vector, you can use Iterator adaptors to convert Iterator::Item to what you need:
operate(vc.iter().map(|&(ref a, ref b)| (a, b)));

Related

what technique should I use to create a Rust HashMap composite key that is hard to construct because it requires cloning multiple elements?

I am doing some dynamic programming, and I'd like to store already-computed values in a HashMap. Unfortunately, the key is a composite value, and somewhat expensive to construct:
#[derive(Eq, PartialEq, Hash)]
struct CostKey {
roots: Vec<usize>,
plans: Vec<Option<RegionPlanCandidate>>,
}
//used like
pub(super) fn cost_for(
&self,
roots: &[usize],
plans: &[Option<RegionPlanCandidate>],
) -> PlanCostLog {
let key = CostKey {
roots:Vec::from(roots),
plans:Vec::from(plans),
};
if let Some(cost) = self.cost_cache.borrow().get(&key) {
return (*cost).clone();
}
...
let rval = PlanCostLog::CrackSum(cost_log);
self.cost_cache.borrow_mut().insert(key, rval.clone());
rval
}
Even with this expensive implementation, I was able to cut the compute time for one of my examples by half. But cargo flamegraph shows that I'm still spending a non-trivial amount of time on the Vec::from calls.
If the key was not composite, and was just a reference, then the .raw_entry_mut().from_key(&key) would apply, but the nature of my key is problematic.
Theoretically, a map should be able to check Eq and Hash using just the borrowed roots and plans but I am not sure how to accomplish it with the existing APIs. How can I speed up the gets and only clone the slices when I need to insert?
Given your types, there isn't a huge amount you can do to improve the efficiency of HashMap::get here.
If your types were simpler, the way you could try to do this is to create another type which doesn't own its data, but hashes the same and can be compared for equality with CostKey, something like this:
#[derive(Eq, PartialEq, Hash)]
struct CostKeyRef<'a> {
roots: &'a [usize],
plans: &'a [Option<RegionPlanCandidate>],
}
impl<'a> PartialEq<CostKey> for CostKeyRef<'a> {
fn eq(&self, other: &CostKey) -> bool {
self.roots == &other.roots && self.plans == &other.plans
}
}
However, a problem arises when you try to implement Borrow<CostKeyRef<'a>> for CostKey. This is required for various HashMap methods, but can't be implemented because the types contain two fields. There isn't a way to coerce a &CostKey into a &CostKeyRef because their layouts are just incompatible.
You may be able to alter your types so that this is possible, but this is not advisable for a Rust beginner as it would require a good understanding of how data and, in particular, references and fat pointers are laid out in memory.
So what can you do?
Well, if your hash map is relatively small, you can use a linear probe instead. Exactly how small "relatively small" is will need to be discovered through measurement, but it will certainly be larger than 100 items, and quite possibly in the 1000's or more, depending on how much overhead all of that allocation actually has.
Using the same type as above (and simplifying your code in general, for the sake of illustration), you can do something like this:
use std::collections::HashMap;
#[derive(Eq, PartialEq, Hash)]
struct CostKey {
roots: Vec<usize>,
plans: Vec<Option<RegionPlanCandidate>>,
}
#[derive(Eq, PartialEq, Hash, Clone)]
struct RegionPlanCandidate;
struct Thing {
cost_cache: HashMap<CostKey, i64>,
}
impl Thing {
fn cost_for(&mut self, roots: &[usize], plans: &[Option<RegionPlanCandidate>]) -> i64 {
let key = CostKeyRef { roots, plans };
if let Some(cost) =
self.cost_cache
.iter()
.find_map(|(cost_key, cost)| (&key == cost_key).then(|| *cost))
{
return cost;
}
let rval = 12345;
self.cost_cache.insert(
CostKey {
roots: roots.to_vec(),
plans: plans.to_vec(),
},
12345,
);
rval
}
}
As a stopgap measure, I have created a struct DualKeyHashMap that provides the features I need. I cloned many fragments from the regular HashMap implementation, and it has only the two methods I need.
use hashbrown::raw::RawTable;
use hashbrown::hash_map::DefaultHashBuilder;
use std::borrow::Borrow;
use std::hash::{BuildHasher, Hash};
use std::mem;
/// copy of hashbrown::hash_map::make_hash()
#[cfg_attr(feature = "inline-more", inline)]
pub(crate) fn make_hash<K, Q, S>(hash_builder: &S, val: &Q) -> u64
where
K: Borrow<Q>,
Q: Hash + ?Sized,
S: BuildHasher,
{
use core::hash::Hasher;
let mut state = hash_builder.build_hasher();
val.hash(&mut state);
state.finish()
}
/// copy of hashbrown::hash_map::make_hasher()
#[cfg_attr(feature = "inline-more", inline)]
pub(crate) fn make_hasher<K, Q, V, S>(hash_builder: &S) -> impl Fn(&(Q, V)) -> u64 + '_
where
K: Borrow<Q>,
Q: Hash,
S: BuildHasher,
{
move |val| make_hash::<K, Q, S>(hash_builder, &val.0)
}
/// Ensures that a single closure type across uses of this which, in turn prevents multiple
/// instances of any functions like RawTable::reserve from being generated
#[cfg_attr(feature = "inline-more", inline)]
fn equivalent_key<Q, K, V>(k: &Q) -> impl Fn(&(K, V)) -> bool + '_
where
K: Borrow<Q>,
Q: ?Sized + Eq,
{
move |x| k.eq(x.0.borrow())
}
//
pub trait AlternateKey<O>: Hash {
fn eq(&self, arg: &O) -> bool;
}
//
pub struct DualKeyHashMap<K, V, S = DefaultHashBuilder> {
hash_builder: S,
base: RawTable<(K, V)>,
}
impl<K: Hash + Eq, V, S: BuildHasher+Default> DualKeyHashMap<K, V, S> {
pub fn new() -> DualKeyHashMap<K, V, S>
{
Self::default()
}
}
impl<K: Hash + Eq, V, S: BuildHasher+Default> Default for DualKeyHashMap<K, V, S> {
fn default() -> DualKeyHashMap<K, V, S>
{
DualKeyHashMap {
hash_builder: Default::default(),
base: Default::default(),
}
}
}
impl<K: Hash + Eq, V, S: BuildHasher> DualKeyHashMap<K, V, S> {
pub fn insert(&mut self, key1: K, val1: V) -> Option<V> {
let hash = make_hash::<K, _, S>(&self.hash_builder, &key1);
//println!("{}", hash);
if let Some((_, item)) = self.base.get_mut(hash, equivalent_key(&key1)) {
Some(mem::replace(item, val1))
} else {
self.base.insert(
hash,
(key1, val1),
make_hasher::<K, _, V, S>(&self.hash_builder),
);
None
}
}
#[inline]
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where
K: Borrow<Q>,
Q: Hash + Eq,
{
// Avoid `Option::map` because it bloats LLVM IR.
match self.get_inner(k) {
Some(&(_, ref v)) => Some(v),
None => None,
}
}
fn get_inner<Q: ?Sized>(&self, k: &Q) -> Option<&(K, V)>
where
K: Borrow<Q>,
Q: Hash + Eq,
{
let hash = make_hash::<K, Q, S>(&self.hash_builder, k);
self.base.get(hash, equivalent_key(k))
}
pub fn get2<M>(&self, k: &M) -> Option<&V>
where
M: AlternateKey<K>,
{
let hash = make_hash::<M, M, S>(&self.hash_builder, k);
match self
.base
.get(hash, |(k2, _)| <M as AlternateKey<K>>::eq(k, &k2))
{
Some(&(_, ref v)) => Some(v),
None => None,
}
}
pub fn len(&self)->usize
{
self.base.len()
}
}
I do not consider this a proper answer since it requires an entirely new hash map implementation which is extremely incomplete.

How do I perform iterator computations over iterators of Results without collecting to a temporary vector?

I'm looking for a way to eliminate the temporary vector allocation in this example:
fn doit<T: Iterator<Item = Result<i32, &'static str>>>(name: &str, iter: T) {
println!(
"{}: {:?}",
name,
iter.collect::<Result<Vec<_>, _>>()
.map(|v| v.into_iter().min())
);
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
doit("without errors", without_errors.into_iter());
doit("with errors", with_errors.into_iter());
}
This is a variation of the iterator with error handling theme, except that I don't want to create a collection (so collect() doesn't quite do the job), but I want to perform further operations on the elements being iterated over.
Note that this gives the wrong result because Ok is less than Err:
fn doit<T: Iterator<Item = Result<i32, &'static str>>>(name: &str, iter: T) {
println!("{}: {:?}", name, iter.min());
}
It would give the right result for max() by accident, but it would not stop iterating on the first error.
Iterator::try_fold provides the framework for what you need, and it's available since Rust 1.27 (Playground):
fn fold_ok<I, T, E, F>(mut iter: I, f: F) -> Result<Option<T>, E>
where
I: Iterator<Item = Result<T, E>>,
T: Ord,
F: Fn(T, T) -> T,
{
iter.try_fold(None, |r, i| {
let i = i?;
Ok(Some(if let Some(r) = r { f(r, i) } else { i }))
})
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
fn doit<'r, T>(name: &str, iter: T)
where
T: Iterator<Item = &'r Result<i32, &'static str>> + Clone,
{
println!("{}: {:?}", name, fold_ok(iter.cloned(), ::std::cmp::min));
}
doit("without errors", without_errors.iter());
doit("with errors", with_errors.iter());
}
Before that, I think your only option is manually iterating (Playground)
fn fold_ok<I, T, E, F>(mut iter: I, f: F) -> Result<Option<T>, E>
where
I: Iterator<Item = Result<T, E>>,
T: Ord,
F: Fn(T, T) -> T,
{
let mut result = match iter.next() {
None => return Ok(None),
Some(r) => r?,
};
for item in iter {
result = f(result, item?);
}
Ok(Some(result))
}
fn main() {
let without_errors = vec![Ok(1), Ok(2), Ok(3)];
let with_errors = vec![Ok(1), Err("error"), Ok(2)];
fn doit<'r, T>(name: &str, iter: T)
where
T: Iterator<Item = &'r Result<i32, &'static str>> + Clone,
{
println!(
"{}: {:?}",
name,
fold_ok(iter.clone().cloned(), ::std::cmp::min)
);
}
doit("without errors", without_errors.iter());
doit("with errors", with_errors.iter());
}
"Lifting" a function to handle an iterator of results is a fairly common pattern and, as usual, itertools has a solution — process_results:
use itertools; // 0.8.0
fn doit(name: &str, iter: impl Iterator<Item = Result<i32, &'static str>>) {
let min = itertools::process_results(iter, |i| i.min());
println!("{}: {:?}", name, min);
}
This code began life as ResultShunt in the standard library before being extracted to itertools. It's what underlies the implementation of sum and product for iterators of Result.
It's possible to abuse collect() for this:
pub struct Min<T> {
value: Option<T>,
}
impl<T> Min<T> {
pub fn value(self) -> Option<T> {
self.value
}
}
impl<T> std::iter::FromIterator<T> for Min<T>
where
T: Ord,
{
fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self {
let mut iter = iter.into_iter();
match iter.next() {
None => Min { value: None },
Some(mut value) => {
for i in iter {
value = std::cmp::min(value, i);
}
Min { value: Some(value) }
}
}
}
}
This can be used via iter.collect::<Min<_>>().value(). This is a lot of machinery, and I don't see a way to abstract over it (so that you only need to supply std::cmp::min or some other semigroup operation).
I didn't look in the direction of Iterator::try_fold, which provides most of the machinery.

Mutable iterator for Vec<Vec<(K, V)>>

I am trying to create an mutable iterator for a vector of type: Vec<Vec<(K, V)>>
The iterator code:
pub struct IterMut<'a, K: 'a, V: 'a> {
iter: &'a mut Vec<Vec<(K, V)>>,
ix: usize,
inner_ix: usize,
}
impl<'a, K, V> Iterator for IterMut<'a, K, V> {
type Item = (&'a K, &'a mut V);
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
while self.iter.len() < self.ix {
while self.iter[self.ix].len() < self.inner_ix {
self.inner_ix += 1;
let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
return Some((&k, &mut v));
}
self.ix += 1;
}
return None;
}
}
The error I get is:
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
--> src/main.rs:16:42
|
16 | let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
| ^^^^^^^^^^^^^^^^^^
|
help: consider using an explicit lifetime parameter as shown: fn next(&'a mut self) -> Option<(&'a K, &'a mut V)>
--> src/main.rs:11:5
|
11 | fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
| ^
Apparently I have lifetime problems, but I don't know how to tell the compiler that this should work.
Is this how you should implement the mutable iterator or is there a better way?
When debugging cryptic error messages, I've found it easier to try and isolate the issue as much as possible.
The first step is to break the expression into its essential constituents, let's start by splitting the indexing steps:
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
while self.iter.len() < self.ix {
while self.iter[self.ix].len() < self.inner_ix {
self.inner_ix += 1;
let outer: &'a mut Vec<_> = self.iter;
let inner: &'a mut Vec<_> = &mut outer[self.ix];
let (ref k, ref mut v) = inner[self.inner_ix];
return Some((&k, &mut v));
}
self.ix += 1;
}
return None;
}
The Index trait assumes that the lifetime of its output is linked to that of its receiver, so to get a 'a lifetime we need the receiver to have a &'a lifetime, and it propagates upward, leading to the above code.
However there's an issue here: let outer: &'a mut Vec<_> = self.iter; will not compile because mutable references are not Copy.
So, how does one get a mutable reference from a mutable reference (which must be possible since IndexMut gets a mutable reference)?
One uses re-borrowing: let outer: &'a mut Vec<_> = &mut *self.iter;.
And, oh:
error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
--> <anon>:16:45
|
16 | let outer: &'a mut Vec<_> = &mut *self.iter;
| ^^^^^^^^^^^^^^^
|
The reborrowed reference is not valid for 'a, it's valid only for the (unnamed) lifetime of self!
Why Rust? Why?
Because doing otherwise would be unsafe.
&mut T is guaranteed NOT to be aliasing, however your method could create aliasing references (if you forgot to advance the index):
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
let (ref k, ref mut v) = self.iter[self.ix][self.inner_ix];
return Some((&k, &mut v));
}
And even if you don't, there's not guarantee that you don't have a rewind method that would allow "stepping back".
TL;DR: You were about to step on a landmine, you were steered toward Stack Overflow instead ;)
Alright, but how do you implement the iterator!.
Well, using iterators, of course. As Shepmaster (briefly) answers, there is the equivalent in the standard library already in the guise of FlatMap. The trick is to use existing iterators for the nitty-gritty details!
Something like:
use std::slice::IterMut;
pub struct MyIterMut<'a, K: 'a, V: 'a> {
outer: IterMut<'a, Vec<(K, V)>>,
inner: IterMut<'a, (K, V)>,
}
Then you consume from inner as long as it provides items, and when empty you refill it from outer.
impl<'a, K, V> MyIterMut<'a, K, V> {
fn new(v: &'a mut Vec<Vec<(K, V)>>) -> MyIterMut<'a, K, V> {
let mut outer = v.iter_mut();
let inner = outer.next()
.map(|v| v.iter_mut())
.unwrap_or_else(|| (&mut []).iter_mut());
MyIterMut { outer: outer, inner: inner }
}
}
impl<'a, K, V> Iterator for MyIterMut<'a, K, V> {
type Item = (&'a K, &'a mut V);
#[inline]
fn next(&mut self) -> Option<(&'a K, &'a mut V)> {
loop {
match self.inner.next() {
Some(r) => return Some((&r.0, &mut r.1)),
None => (),
}
match self.outer.next() {
Some(v) => self.inner = v.iter_mut(),
None => return None,
}
}
}
}
A quick test case:
fn main() {
let mut v = vec![
vec![(1, "1"), (2, "2")],
vec![],
vec![(3, "3")]
];
let iter = MyIterMut::new(&mut v);
let c: Vec<_> = iter.collect();
println!("{:?}", c);
}
Prints:
[(1, "1"), (2, "2"), (3, "3")]
as expected, so it's not completely broken, but I wish I did not have to rely on the &[] is 'static trick (ie, that std::slice::IterMut implemented Default).
You've provided no reason that you are reimplementing the standard Iterator::flat_map, so I'd just use that and another map to remove the mutability you don't need:
fn main() {
let mut a: Vec<Vec<(u8, u8)>> = Default::default();
let c = a.iter_mut()
.flat_map(|x| x.iter_mut())
.map(|&mut (ref a, ref mut b)| (a, b))
.count();
println!("{}", c);
}
Once you have that, you can just return the iterator in one of the many ways.
#[derive(Debug, Default)]
struct Thing<K, V>(Vec<Vec<(K, V)>>);
impl<K, V> Thing<K, V> {
fn iter_mut<'a>(&'a mut self) -> Box<Iterator<Item = (&'a K, &'a mut V)> + 'a> {
Box::new(self.0
.iter_mut()
.flat_map(|x| x.iter_mut())
.map(|&mut (ref a, ref mut b)| (a, b)))
}
}
fn main() {
let mut a = Thing::<u8, u8>::default();
let c = a.iter_mut().count();
println!("{}", c);
}

An iterator adaptor implementing an SQL-like RIGHT OUTER JOIN using a HashMap

I'm trying to extend bluss's rust-itertools with SQL-like join iterators. I encountered a particular problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).
The iterator adaptor struct takes 2 input iterators of which the second (the right) is loaded into the HashMap. The iteration works as follows:
The item from the left iterator is matched against the map - in case of a match return both items
When the left iterator is exhausted, return the non-matched values from the map
The problem is with the second part where I tried to store the map's Values iterator along with the map to store its iteration state. But as I learned in this answer, it's not possible in rust.
Unfortunately I have no idea how it could be done otherwise.
Here is the complete code for the INNER JOIN adaptor, which does the first part:
use std::collections::HashMap;
use std::hash::Hash;
pub struct HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
left: I,
right: HashMap<K, V1>,
}
impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
/// Create a `HashJoinInner` iterator.
pub fn new<J>(l: I, r: J) -> Self
where J: Iterator<Item=(K, V1)>
{
let mut hm: HashMap<K, V1> = HashMap::new();
for (k, v) in r {
hm.insert(k, v);
}
HashJoinInner {
left: l,
right: hm,
}
}
}
impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
type Item = (V0, V1);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.left.next() {
Some((k0, v0)) => match self.right.get(&k0) {
Some(v1) => return Some((v0, Clone::clone(v1))),
None => continue,
},
None => return None,
}
}
}
}
I'll be grateful for any idea.
You cannot store the Values iterator because it contains references to the HashMap. These references could become invalid if you move the map. However, you can consume the HashMap using the into_iter method. That owns all the values of the HashMap and can be moved into a new struct.
Here's a tweaking of your code from the earlier question. This isn't yet a left or right join. There's complexity about the switch from being done with one iterator to finishing off the other iterator.
use std::collections::hash_map::{HashMap, IntoIter};
use std::hash::Hash;
struct Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
iter: IntoIter<K, (V, bool)>,
}
impl<K, V> Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
fn new<I>(it: I) -> Self
where I: Iterator<Item=(K, V)>
{
let mut map = HashMap::new();
for (k, v) in it {
map.insert(k, (v, false));
}
Foo { iter: map.into_iter() }
}
}
impl<K, V> Iterator for Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
type Item = V;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.iter.next() {
Some((_, (v, false))) => return Some(v.clone()),
Some(_) => continue,
None => return None,
}
}
}
}
fn main() {
let it = (0..).zip("AB".chars());
let foo = Foo::new(it);
for v in foo {
println!("{}", v);
}
}
However you don't need to do any of that to get what you want. You can simply create a hashmap and check it as you iterate over the other item. I accidentally created a left outer join, but just flip the arguments to get a right outer join. ^_^
use std::collections::hash_map::HashMap;
use std::hash::Hash;
struct LeftOuterJoin<L, K, RV> {
left: L,
right: HashMap<K, RV>,
}
impl<L, K, RV> LeftOuterJoin<L, K, RV>
where K: Hash + Eq
{
fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
LeftOuterJoin {
left: left.into_iter(),
right: right.into_iter().collect()
}
}
}
impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone
{
type Item = (K, LV, Option<RV>);
fn next(&mut self) -> Option<Self::Item> {
match self.left.next() {
Some((k, lv)) => {
let rv = self.right.get(&k);
Some((k, lv, rv.cloned()))
},
None => None,
}
}
}
fn main() {
let mut left = HashMap::new();
left.insert(1, "Alice");
left.insert(2, "Bob");
let mut right = HashMap::new();
right.insert(1, "Programmer");
for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) {
println!("{} ({}) is a {:?}", name, id, job);
}
}
Thanks to Shepmaster's idea of using std::collections::hash_map::IntoIter I've managed to solve the problem.
Here is the complete solution for RIGHT OUTER JOIN using the hash join strategy:
use std::collections::hash_map::{HashMap, IntoIter,};
use std::mem;
use std::hash::Hash;
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
pub struct HashJoinRightOuter<L, K, RV> {
left: L,
map: HashMap<K, (RV, bool)>,
/// exclusion iterator - yields the unmatched values from the map. It is created once the left
/// iterator is exhausted
excl_iter: Option<IntoIter<K, (RV, bool)>>,
}
impl<L, K, RV> HashJoinRightOuter<L, K, RV>
where K: Hash + Eq,
{
/// Create a `HashJoinRightOuter` iterator.
pub fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
let mut map: HashMap<K, (RV, bool)> = HashMap::new();
for (k, v) in right.into_iter() {
map.insert(k, (v, false));
}
HashJoinRightOuter {
left: left.into_iter(),
map: map,
excl_iter: None,
}
}
/// Moves the map to `self.excl_iter`
///
/// Once the left iterator is exhausted, the info about which keys were matched is complete.
/// To be able to iterate over map's values we need to move it into its `IntoIter`.
fn set_excl_iter(&mut self) {
let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new());
self.excl_iter = Some(map.into_iter());
}
}
impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone,
{
type Item = (Option<LV>, RV);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.excl_iter {
// the left iterator is not yet exhausted
None => match self.left.next() {
Some((lk, lv)) => match self.map.get_mut(&lk) {
Some(rt) => {
rt.1 = true; // flag as matched
return Some((Some(lv), Clone::clone(&rt.0)))
},
None => continue, // not interested in unmatched left value
},
// the left iterator is exhausted so move the map into `self.excl_iter`.
None => self.set_excl_iter(),
},
// iterate over unmatched values
Some(ref mut r) => match r.next() {
Some((_, (rv, matched))) => {
if !matched {
return Some((None, rv));
} else {
continue;
}
},
None => return None,
}
}
}
}
}
fn main() {
let a = (0..).zip("AB".chars());
let b = (1..).zip("XY".chars());
let mut it = HashJoinRightOuter::new(a, b);
assert_eq!(it.next(), Some((Some('B'), 'X')));
assert_eq!(it.next(), Some((None, 'Y')));
assert_eq!(it.next(), None);
}
At the beginning I failed because I tried to store both the data and it's reference in the same struct, which has no meaning anyway. What I really wanted was to store the data first, do some magic with it and once done, move it into another field to work with its transformation.
This can be used to solve other self-referencing struct problems as well.

How do I pass a function pointer recursively?

I want to write an Iterator adaptor which applies a function recursively to its underlying Iterator. Recursively because the variant IR::Loop includes a Vec<IR>, of which an iterator should also be passed to the function.
The function should take an &mut Iterator<Item = IR> and use it to compute the next value of the iterator, (like itertools::batching).
use std::iter::Peekable;
#[derive(Clone)]
enum IR {
OperationA,
OperationB,
Loop(Vec<IR>),
}
pub trait MyItertools: Iterator {
fn apply_recursive<F: Fn(&mut Peekable<Self>) -> Option<Self::Item>>(
self,
f: F,
) -> ApplyRecursive<Self, F>
where
Self: Sized,
Self::Item: Clone,
{
ApplyRecursive {
iter: self.peekable(),
f: f,
}
}
}
impl<T: ?Sized> MyItertools for T
where
T: Iterator,
{
}
//applies a function recursively to some Iterator with Item=IR
#[derive(Clone)]
struct ApplyRecursive<I, F>
where
I: Iterator,
I::Item: Clone,
{
iter: Peekable<I>,
f: F,
}
impl<I: Iterator<Item = IR>, F> Iterator for ApplyRecursive<I, F>
where
F: Fn(&mut Peekable<I>)
-> Option<I::Item>,
{
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
match self.iter.peek() {
Some(&IR::Loop(code)) => {
self.iter.next(); //advance the iterator
let code: Vec<IR> = code.into_iter().apply_recursive(self.f).collect();
Some(IR::Loop(code))
}
Some(x) => (self.f)(&mut self.iter),
None => None,
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.iter.size_hint()
}
}
fn main() {}
playground
What am I doing wrong? I don't even understand the error message:
error[E0277]: the trait bound `for<'r> F: std::ops::Fn<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` is not satisfied
--> src/main.rs:54:54
|
54 | let code: Vec<IR> = code.into_iter().apply_recursive(self.f).collect();
| ^^^^^^^^^^^^^^^ the trait `for<'r> std::ops::Fn<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` is not implemented for `F`
|
= help: consider adding a `where for<'r> F: std::ops::Fn<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` bound
error[E0277]: the trait bound `for<'r> F: std::ops::FnOnce<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` is not satisfied
--> src/main.rs:54:54
|
54 | let code: Vec<IR> = code.into_iter().apply_recursive(self.f).collect();
| ^^^^^^^^^^^^^^^ the trait `for<'r> std::ops::FnOnce<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` is not implemented for `F`
|
= help: consider adding a `where for<'r> F: std::ops::FnOnce<(&'r mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>` bound
error: no method named `collect` found for type `ApplyRecursive<std::vec::IntoIter<IR>, F>` in the current scope
--> src/main.rs:54:78
|
54 | let code: Vec<IR> = code.into_iter().apply_recursive(self.f).collect();
| ^^^^^^^
|
= note: the method `collect` exists but the following trait bounds were not satisfied: `F : std::ops::Fn<(&mut std::iter::Peekable<std::vec::IntoIter<IR>>,)>`, `ApplyRecursive<std::vec::IntoIter<IR>, F> : std::iter::Iterator`
= help: items from traits can only be used if the trait is implemented and in scope; the following trait defines an item `collect`, perhaps you need to implement it:
= help: candidate #1: `std::iter::Iterator`
The last error indicates that you don't have an Iterator. Iterator is only implemented for your struct under certain conditions, and you aren't meeting them. The second error explains why.
the trait for<'r> Fn<(&'r mut IntoIter<IR>,)> is not implemented for the type F
So, why does the compiler think this won't work? Let's look at your constraints:
impl<I, F> Iterator for ApplyRecursive<I, F>
where
I: Iterator<Item = IR>
F: Fn(&mut Peekable<I>) -> Option<I::Item>,
This structure refers to a concrete type I that implements Iterator. Then F is a concrete type that accepts a mutable reference to the same concrete type as I. However, you try to use your function (specialized for whatever type it happens to be) on the concrete type IntoIter - but this might be a different concrete type!
The easiest fix is to remove the generics here:
impl<F> Iterator for ApplyRecursive<vec::IntoIter<IR>, F>
where
F: Fn(&mut vec::IntoIter<IR>) -> Option<IR>,
{
type Item = IR;
fn next(&mut self) -> Option<IR> {
This unlocks a whole other slew of errors about mutability, accessing private fields, and exporting private types, but I think it gets over this hump.
Alternatively, we can change F to accept a trait object, and not worry about specializing it:
pub trait CustomIter: Iterator {
fn apply_recursive<F>(self, f: F) -> ApplyRecursive<Self, F>
where
F: Fn(&mut Iterator<Item = Self::Item>) -> Option<Self::Item>,
Self: Sized,
Self::Item: Clone,
{
ApplyRecursive { iter: self.peekable(), f: f }
}
}
impl<I, F> Iterator for ApplyRecursive<I, F>
where
I: Iterator<Item = IR>,
F: Fn(&mut Iterator<Item = IR>) -> Option<IR>,
{
type Item = I::Item;
fn next(&mut self) -> Option<IR> {