I have the following code which generates a vector of bytes from the passed vector of enum values:
#[derive(Debug, PartialEq)]
pub enum BertType {
SmallInteger(u8),
Integer(i32),
Float(f64),
String(String),
Boolean(bool),
Tuple(BertTuple),
}
#[derive(Debug, PartialEq)]
pub struct BertTuple {
pub values: Vec<BertType>
}
pub struct Serializer;
pub trait Serialize<T> {
fn to_bert(&self, data: T) -> Vec<u8>;
}
impl Serializer {
fn enum_value_to_binary(&self, enum_value: BertType) -> Vec<u8> {
match enum_value {
BertType::SmallInteger(value_u8) => self.to_bert(value_u8),
BertType::Integer(value_i32) => self.to_bert(value_i32),
BertType::Float(value_f64) => self.to_bert(value_f64),
BertType::String(string) => self.to_bert(string),
BertType::Boolean(boolean) => self.to_bert(boolean),
BertType::Tuple(tuple) => self.to_bert(tuple),
}
}
}
// some functions for serialize bool/integer/etc. into Vec<u8>
// ...
impl Serialize<BertTuple> for Serializer {
fn to_bert(&self, data: BertTuple) -> Vec<u8> {
let mut binary: Vec<u8> = data.values
.iter()
.map(|&item| self.enum_value_to_binary(item)) // <-- what the issue there?
.collect();
let arity = data.values.len();
match arity {
0...255 => self.get_small_tuple(arity as u8, binary),
_ => self.get_large_tuple(arity as i32, binary),
}
}
}
But when compiling, I receive an error with iterating around map:
error: the trait bound `std::vec::Vec<u8>: std::iter::FromIterator<std::vec::Vec<u8>>` is not satisfied [E0277]
.collect();
^~~~~~~
help: run `rustc --explain E0277` to see a detailed explanation
note: a collection of type `std::vec::Vec<u8>` cannot be built from an iterator over elements of type `std::vec::Vec<u8>`
error: aborting due to previous error
error: Could not compile `bert-rs`.
How can I fix this issue with std::iter::FromIterator?
The problem is that enum_value_to_binary returns a Vec<u8> for each element in values. So you end up with an Iterator<Item=Vec<u8>> and you call collect::<Vec<u8>>() on that, but it doesn't know how to flatten the nested vectors. If you want all the values to be flattened into one Vec<u8>, then you should use flat_map instead of map:
let mut binary: Vec<u8> = data.values
.iter()
.flat_map(|item| self.enum_value_to_binary(item).into_iter())
.collect();
Or, slightly more idiomatic and performant, you can just have enum_value_to_binary return an iterator directly.
Also, the iter method returns an Iterator<Item=&'a T>, which means you are just borrowing the elements, but self.enum_value_to_binary wants to take ownership over the value. There's a couple of ways to fix that. One option would be to use into_iter instead of iter, which will give you the elements by value. If you do that, you'll move the arity variable up to before the binary variable, since creating the binary variable will take ownership (move) data.values.
The other option would be to change self.enum_value_to_binary to take it's argument by reference.
Also possible that you meant for the type of binary to actually be Vec<Vec<u8>>.
Related
Serde derive macros come with the ability to control how a field is serialized/deserialized through the #[serde(with = "module")] field attribute. The "module" should have serialize and deserialize functions with the right arguments and return types.
An example that unfortunately got a bit too contrived:
use serde::{Deserialize, Serialize};
#[derive(Debug, Default, PartialEq, Eq)]
pub struct StringPair(String, String);
mod stringpair_serde {
pub fn serialize<S>(sp: &super::StringPair, ser: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
ser.serialize_str(format!("{}:{}", sp.0, sp.1).as_str())
}
pub fn deserialize<'de, D>(d: D) -> Result<super::StringPair, D::Error>
where
D: serde::Deserializer<'de>,
{
d.deserialize_str(Visitor)
}
struct Visitor;
impl<'de> serde::de::Visitor<'de> for Visitor {
type Value = super::StringPair;
fn expecting(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "a pair of strings separated by colon (:)")
}
fn visit_str<E>(self, s: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
Ok(s.split_once(":")
.map(|tup| super::StringPair(tup.0.to_string(), tup.1.to_string()))
.unwrap_or(Default::default()))
}
}
}
#[derive(Serialize, Deserialize)]
struct UsesStringPair {
// Other fields ...
#[serde(with = "stringpair_serde")]
pub stringpair: StringPair,
}
fn main() {
let usp = UsesStringPair {
stringpair: StringPair("foo".to_string(), "bar".to_string()),
};
assert_eq!(
serde_json::json!(&usp).to_string(),
r#"{"stringpair":"foo:bar"}"#
);
let usp: UsesStringPair = serde_json::from_str(r#"{"stringpair":"baz:qux"}"#).unwrap();
assert_eq!(
usp.stringpair,
StringPair("baz".to_string(), "qux".to_string())
)
}
Testing derived serialization for UsesStringPair is trivial with simple assertions. But I have looked at serde_test example as that makes sense to me too.
However, I want to be able to independently test the stringpair_serde::{serialize, deserialize} functions (e.g. if my crate provides just mycrate::StringPair and mycrate::stringpair_serde, and UsesStringPair is for the crate users to implement).
One way I've looked into is creating a serde_json::Serializer (using new, requires a io::Write implementation, which I couldn't figure out how to create and use trivially, but that's a separate question) and calling serialize with the created Serializer, then making assertions on the result as before. However, that does not test any/all implementations of serde::Serializer, just the one provided in serde_json.
I'm wondering if there's a method like in the serde_test example that works for ser/deser functions provided by a module.
I am new to Rust and reading The Rust Programming Language, and in the Error Handling section there is a "case study" describing a program to read data from a CSV file using the csv and rustc-serialize libraries (using getopts for argument parsing).
The author writes a function search that steps through the rows of the csv file using a csv::Reader object and collect those entries whose 'city' field match a specified value into a vector and returns it. I've taken a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:
extern crate csv;
extern crate rustc_serialize;
use std::path::Path;
use std::fs::File;
fn search<P>(data_path: P, city: &str) -> Vec<DataRow>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.collect()
}
where the DataRow type is just a record,
#[derive(Debug, RustcDecodable)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>
}
Now, the author poses, as the dreaded "exercise to the reader", the problem of modifying this function to return an iterator instead of a vector (eliminating the call to collect). My question is: How can this be done at all, and what are the most concise and idiomatic ways of doing it?
A simple attempt that i think gets the type signature right is
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
Box::new(reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city))
}
I return a trait object of type Box<Iterator<Item=DataRow> + 'a> so as not to have to expose the internal Filter type, and where the lifetime 'a is introduced just to avoid having to make a local clone of city. But this fails to compile because reader does not live long enough; it's allocated on the stack and so is deallocated when the function returns.
I guess this means that reader has to be allocated on the heap (i.e. boxed) from the beginning, or somehow moved off the stack before the function ends. If I were returning a closure, this is exactly the problem that would be solved by making it a move closure. But I don't know how to do something similar when I'm not returning a function. I've tried defining a custom iterator type containing the needed data, but I couldn't get it to work, and it kept getting uglier and more contrived (don't make too much of this code, I'm only including it to show the general direction of my attempts):
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
struct ResultIter<'a> {
reader: csv::Reader<File>,
wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>>
}
impl<'a> Iterator for ResultIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<DataRow>
{ self.wrapped_iterator.unwrap().next() }
}
let file = File::open(data_path).expect("Opening file failed!");
// Incrementally initialise
let mut result_iter = ResultIter {
reader: csv::Reader::from_reader(file).has_headers(true),
wrapped_iterator: None // Uninitialised
};
result_iter.wrapped_iterator =
Some(Box::new(result_iter.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|&row: &DataRow| row.city == city)));
Box::new(result_iter)
}
This question seems to concern the same problem, but the author of the answer solves it by making the concerned data static, which I don't think is an alternative for this question.
I am using Rust 1.10.0, the current stable version from the Arch Linux package rust.
CSV 1.0
As I alluded to in the answer for older versions of the crate, the best way of solving this is for the CSV crate to have an owning iterator, which it now does: DeserializeRecordsIntoIter
use csv::ReaderBuilder; // 1.1.1
use serde::Deserialize; // 1.0.104
use std::{fs::File, path::Path};
#[derive(Debug, Deserialize)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>,
}
fn search_iter(data_path: impl AsRef<Path>, city: &str) -> impl Iterator<Item = DataRow> + '_ {
let file = File::open(data_path).expect("Opening file failed");
ReaderBuilder::new()
.has_headers(true)
.from_reader(file)
.into_deserialize::<DataRow>()
.map(|row| row.expect("Failed decoding row"))
.filter(move |row| row.city == city)
}
Before version 1.0
The straightest path to convert the original function would be to simply wrap the iterator. However, doing so directly will lead to problems because you cannot return an object that refers to itself and the result of decode refers to the Reader. If you could surmount that, you cannot have an iterator return references to itself.
One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:
fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a>
where
P: AsRef<Path>,
{
let file = File::open(data_path).expect("Opening file failed!");
MyIter {
reader: csv::Reader::from_reader(file).has_headers(true),
city: city,
}
}
struct MyIter<'a> {
reader: csv::Reader<File>,
city: &'a str,
}
impl<'a> Iterator for MyIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<Self::Item> {
let city = self.city;
self.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.next()
}
}
This could have overhead associated with it, depending on the implementation of decode. Additionally, this might "rewind" back to the beginning of the input — if you substituted a Vec instead of a csv::Reader, you would see this. However, it happens to work in this case.
Beyond that, I'd normally open the file and create the csv::Reader outside of the function and pass in the DecodedRecords iterator and transform it, returning a newtype / box / type alias around the underlying iterator. I prefer this because the structure of your code mirrors the lifetimes of the objects.
I'm a little surprised that there isn't an implementation of IntoIterator for csv::Reader, which would also solve the problem because there would not be any references.
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?
What is the correct way to return an Iterator (or any other trait)?
Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I would like to make an iterator that generates a stream of prime numbers. My general thought process was to wrap an iterator with successive filters so for example you start with
let mut n = (2..N)
Then for each prime number you mutate the iterator and add on a filter
let p1 = n.next()
n = n.filter(|&x| x%p1 !=0)
let p2 = n.next()
n = n.filter(|&x| x%p2 !=0)
I am trying to use the following code, but I can not seem to get it to work
struct Primes {
base: Iterator<Item = u64>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.next();
match p {
Some(n) => {
let prime = n.clone();
let step = self.base.filter(move |&: &x| {x%prime!=0});
self.base = &step as &Iterator<Item = u64>;
Some(n)
},
_ => None
}
}
}
I have toyed with variations of this, but I can't seem to get lifetimes and types to match up. Right now the compiler is telling me
I can't mutate self.base
the variable prime doesn't live long enough
Here is the error I am getting
solution.rs:16:17: 16:26 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:16 let p = self.base.next();
^~~~~~~~~
solution.rs:20:28: 20:37 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
^~~~~~~~~
solution.rs:21:30: 21:34 error: `step` does not live long enough
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
^~~~
solution.rs:15:39: 26:6 note: reference must be valid for the lifetime 'a as defined on the block at 15:38...
solution.rs:15 fn next(&mut self) -> Option<u64> {
solution.rs:16 let p = self.base.next();
solution.rs:17 match p {
solution.rs:18 Some(n) => {
solution.rs:19 let prime = n.clone();
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
...
solution.rs:20:71: 23:14 note: ...but borrowed value is only valid for the block suffix following statement 1 at 20:70
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
solution.rs:22 Some(n)
solution.rs:23 },
error: aborting due to 3 previous errors
Why won't Rust let me do this?
Here is a working version:
struct Primes<'a> {
base: Option<Box<Iterator<Item = u64> + 'a>>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.as_mut().unwrap().next();
p.map(|n| {
let base = self.base.take();
let step = base.unwrap().filter(move |x| x % n != 0);
self.base = Some(Box::new(step));
n
})
}
}
impl<'a> Primes<'a> {
#[inline]
pub fn new<I: Iterator<Item = u64> + 'a>(r: I) -> Primes<'a> {
Primes {
base: Some(Box::new(r)),
}
}
}
fn main() {
for p in Primes::new(2..).take(32) {
print!("{} ", p);
}
println!("");
}
I'm using a Box<Iterator> trait object. Boxing is unavoidable because the internal iterator must be stored somewhere between next() calls, and there is nowhere you can store reference trait objects.
I made the internal iterator an Option. This is necessary because you need to replace it with a value which consumes it, so it is possible that the internal iterator may be "absent" from the structure for a short time. Rust models absence with Option. Option::take replaces the value it is called on with None and returns whatever was there. This is useful when shuffling non-copyable objects around.
Note, however, that this sieve implementation is going to be both memory and computationally inefficient - for each prime you're creating an additional layer of iterators which takes heap space. Also the depth of stack when calling next() grows linearly with the number of primes, so you will get a stack overflow on a sufficiently large number:
fn main() {
println!("{}", Primes::new(2..).nth(10000).unwrap());
}
Running it:
% ./test1
thread '<main>' has overflowed its stack
zsh: illegal hardware instruction (core dumped) ./test1
I'd like to use Peekable as the basis for a new cautious_take_while operation that acts like take_while from IteratorExt but without consuming the first failed item. (There's a side question of whether this is a good idea, and whether there are better ways to accomplish this goal in Rust -- I'd be happy for hints in that direction, but mostly I'm trying to understand where my code is breaking).
The API I'm trying to enable is basically:
let mut chars = "abcdefg.".chars().peekable();
let abc : String = chars.by_ref().cautious_take_while(|&x| x != 'd');
let defg : String = chars.by_ref().cautious_take_while(|&x| x != '.');
// yielding (abc = "abc", defg = "defg")
I've taken a crack at creating a MCVE here, but I'm getting:
:10:5: 10:19 error: cannot move out of borrowed content
:10 chars.by_ref().cautious_take_while(|&x| x != '.');
As far as I can tell, I'm following the same pattern as Rust's own TakeWhile in terms of my function signatures, but I'm seeing different different behavior from the borrow checker. Can someone point out what I'm doing wrong?
The funny thing with by_ref() is that it returns a mutable reference to itself:
pub trait IteratorExt: Iterator + Sized {
fn by_ref(&mut self) -> &mut Self { self }
}
It works because the Iterator trait is implemented for the mutable pointer to Iterator type. Smart!
impl<'a, I> Iterator for &'a mut I where I: Iterator, I: ?Sized { ... }
The standard take_while function works because it uses the trait Iterator, that is automatically resolved to &mut Peekable<T>.
But your code does not work because Peekable is a struct, not a trait, so your CautiousTakeWhileable must specify the type, and you are trying to take ownership of it, but you cannot, because you have a mutable pointer.
Solution, do not take a Peekable<T> but &mut Peekable<T>. You will need to specify the lifetime too:
impl <'a, T: Iterator, P> Iterator for CautiousTakeWhile<&'a mut Peekable<T>, P>
where P: FnMut(&T::Item) -> bool {
//...
}
impl <'a, T: Iterator> CautiousTakeWhileable for &'a mut Peekable<T> {
fn cautious_take_while<P>(self, f: P) -> CautiousTakeWhile<&'a mut Peekable<T>, P>
where P: FnMut(&T::Item) -> bool {
CautiousTakeWhile{inner: self, condition: f,}
}
}
A curious side effect of this solution is that now by_ref is not needed, because cautious_take_while() takes a mutable reference, so it does not steal ownership. The by_ref() call is needed for take_while() because it can take either Peekable<T> or &mut Peekable<T>, and it defaults to the first one. With the by_ref() call it will resolve to the second one.
And now that I finally understand it, I think it might be a good idea to change the definition of struct CautiousTakeWhile to include the peekable bit into the struct itself. The difficulty is that the lifetime has to be specified manually, if I'm right. Something like:
struct CautiousTakeWhile<'a, T: Iterator + 'a, P>
where T::Item : 'a {
inner: &'a mut Peekable<T>,
condition: P,
}
trait CautiousTakeWhileable<'a, T>: Iterator {
fn cautious_take_while<P>(self, P) -> CautiousTakeWhile<'a, T, P> where
P: FnMut(&Self::Item) -> bool;
}
and the rest is more or less straightforward.
This was a tricky one! I'll lead with the meat of the code, then attempt to explain it (if I understand it...). It's also the ugly, unsugared version, as I wanted to reduce incidental complexity.
use std::iter::Peekable;
fn main() {
let mut chars = "abcdefg.".chars().peekable();
let abc: String = CautiousTakeWhile{inner: chars.by_ref(), condition: |&x| x != 'd'}.collect();
let defg: String = CautiousTakeWhile{inner: chars.by_ref(), condition: |&x| x != '.'}.collect();
println!("{}, {}", abc, defg);
}
struct CautiousTakeWhile<'a, I, P> //'
where I::Item: 'a, //'
I: Iterator + 'a, //'
P: FnMut(&I::Item) -> bool,
{
inner: &'a mut Peekable<I>, //'
condition: P,
}
impl<'a, I, P> Iterator for CautiousTakeWhile<'a, I, P>
where I::Item: 'a, //'
I: Iterator + 'a, //'
P: FnMut(&I::Item) -> bool
{
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
let return_next =
match self.inner.peek() {
Some(ref v) => (self.condition)(v),
_ => false,
};
if return_next { self.inner.next() } else { None }
}
}
Actually, Rodrigo seems to have a good explanation, so I'll defer to that, unless you'd like me to explain something specific.
Background: the serialize crate is undocumented, deriving Decodable doesn't work. I've also looked at existing implementations for other types and find the code difficult to follow.
How does the decoding process work, and how do I implement Decodable for this struct?
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
The reason why #[deriving(Decodable)] doesn't work is that [A,..GRIDW] doesn't implement Decodable, and it's impossible to implement a trait for a type when both are defined outside of this crate, which is the case here. So the only solution I can see is to manually implement Decodable for Grid.
And this is as far as I've gotten
impl <A: Decodable<D, E>, D: Decoder<E>, E> Decodable<D, E> for Grid<A> {
fn decode(decoder: &mut D) -> Result<Grid<A>, E> {
decoder.read_struct("Grid", 1u, ref |d| Ok(Grid {
data: match d.read_struct_field("data", 0u, ref |d| Decodable::decode(d)) {
Ok(e) => e,
Err(e) => return Err(e)
},
}))
}
}
Which gives an error at Decodable::decode(d)
error: failed to find an implementation of trait
serialize::serialize::Decodable for [[A, .. 20], .. 20]
It's not really possible to do this nicely at the moment for a variety of reasons:
We can't be generic over the length of a fixed length array (the fundamental issue)
The current trait coherence restrictions means we can't write a custom trait MyDecodable<D, E> { ... } with impl MyDecodable<D, E> for [A, .. GRIDW] (and one for GRIDH) and a blanket implementation impl<A: Decodable<D, E>> MyDecodable<D, E> for A. This forces a trait-based solution into using an intermediary type, which then makes the compiler's type inference rather unhappy and AFAICT impossible to satisfy.
We don't have associated types (aka "output types"), which I think would allow the type inference to be slightly sane.
Thus, for now, we're left with a manual implementation. :(
extern crate serialize;
use std::default::Default;
use serialize::{Decoder, Decodable};
static GRIDW: uint = 10;
static GRIDH: uint = 5;
fn decode_grid<E, D: Decoder<E>,
A: Copy + Default + Decodable<D, E>>(d: &mut D)
-> Result<Grid<A>, E> {
// mirror the Vec implementation: try to read a sequence
d.read_seq(|d, len| {
// check it's the required length
if len != GRIDH {
return Err(
d.error(format!("expecting length {} but found {}",
GRIDH, len).as_slice()));
}
// create the array with empty values ...
let mut array: [[A, .. GRIDW], .. GRIDH]
= [[Default::default(), .. GRIDW], .. GRIDH];
// ... and fill it in progressively ...
for (i, outer) in array.mut_iter().enumerate() {
// ... by reading each outer element ...
try!(d.read_seq_elt(i, |d| {
// ... as a sequence ...
d.read_seq(|d, len| {
// ... of the right length,
if len != GRIDW { return Err(d.error("...")) }
// and then read each element of that sequence as the
// elements of the grid.
for (j, inner) in outer.mut_iter().enumerate() {
*inner = try!(d.read_seq_elt(j, Decodable::decode));
}
Ok(())
})
}));
}
// all done successfully!
Ok(Grid { data: array })
})
}
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
impl<E, D: Decoder<E>, A: Copy + Default + Decodable<D, E>>
Decodable<D, E> for Grid<A> {
fn decode(d: &mut D) -> Result<Grid<A>, E> {
d.read_struct("Grid", 1, |d| {
d.read_struct_field("data", 0, decode_grid)
})
}
}
fn main() {}
playpen.
It's also possible to write a more "generic" [T, .. n] decoder by using macros to instantiate each version, with special control over how the recursive decoding is handled to allow nested fixed-length arrays to be handled (as required for Grid); this requires somewhat less code (especially with more layers, or a variety of different lengths), but the macro solution:
may be harder to understand, and
the one I give there may be less efficient (there's a new array variable created for every fixed length array, including new Defaults, while the non-macro solution above just uses a single array and thus only calls Default::default once for each element in the grid). It may be possible to expand to a similar set of recursive loops, but I'm not sure.