I'm looking for a serializer, in any language, that I can give something (an object, a struct containing other structs, a hashmap, etc), that then lets me pull the serialized result from it byte by byte? And crucially, the serializer can't allocate an unbounded vector to store the bytes intermediately.
The API I'm thinking of, in Rust-like notation:
struct Serializer<T: serde::Serialize>(_);
impl<T: serde::Serialize> Serializer<T> {
fn new() -> Serializer<T>;
fn push(&mut self) -> Option<impl FnOnce(T)>;
fn pull(&mut self) -> Option<impl FnOnce()->u8>;
}
i.e. I can push a T where T implements serde::Serialize; and I can pull u8s; either operation may "block" (by returning Option::None) if awaiting the other operation; and the memory allocations of Serializer are bounded.
Are there any examples of this? And what would it be called? I'm not really sure what to google..?
Related
I have a data format using custom enum of values. From the database I receive a Vec<MyVal>. I want to convert this to a struct (and fail if it doesn't work). I want to use serde because after processing I want to return the API response as a json, and serde makes this super easy.
Playground link for the example
enum MyVal {
Bool(bool),
Text(String)
}
#[derive(Serialize, Deserialize)]
struct User {
name: String,
registered: bool
}
The challenge is with converting the data format into the serde data model. For this I can implement a Deserializer and implement the visit_seq method i.e. visit the Vec<MyVal> as if it were a sequence and return the values one by one. The visitor for User can consume the visited values to build the struct User.
However I'm not able to figure out how to convert the Vec into something visitor_seq can consume. Here's some sample code.
struct MyWrapper(Vec<MyVal>);
impl<'de> Deserializer<'de> for MyWrapper {
type Error = de::value::Error;
// skip unncessary
fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: serde::de::Visitor<'de>,
{
let convert_myval = move |myval: &MyVal| match myval {
MyVal::Bool(b) => visitor.visit_bool(*b),
MyVal::Text(s) => visitor.visit_string(s.clone())
};
// now I have a vec of serde values
let rec_vec: Vec<Result<V::Value, Self::Error>> =
self.0.iter().map(|myval| convert_myval(myval)).collect();
// giving error!!
visitor.visit_seq(rec_vec.into_iter())
}
}
The error is
92 | visitor.visit_seq(rec_vec.into_iter())
| --------- ^^^^^^^^^^^^^^^^^^^ the trait `SeqAccess<'de>` is not implemented for `std::vec::IntoIter<Result<<V as Visitor<'de>>::Value, _::_serde::de::value::Error>>`
| |
| required by a bound introduced by this call
So I looked into SeqAccess and it has an implementor that requires that whatever is passed to it implement the Iterator trait. But I thought I had that covered, because vec.into_iter returns an IntoIter, a consuming iterator which does implement the Iterator trait.
So I'm completely lost as to what's going wrong here. Surely there must be a way to visit a Vec<Result<Value, Error>> as a seq?
Preface: The question wants to treat a Rust data structure Vec<MyData> like a serialized piece of data (e.g.: like a JSON string) and allow deserializing that into any other Rust data structure that implements Deserialize. This is a quite unusual, but not without precedent. And since the MyVals are actually pieces of data with various types which get returned from a database access crate, this approach does make sense.
The main problem with the code in the question is that it tries to deserialize two different data structures (MyWrapper<Vec<MyVal>> and MyVal) with a single Deserializer. The obvious way out is to define a second struct MyValWrapper(MyVal) and implement Deserializer for it:
struct MyValWrapper(MyVal);
impl<'de> Deserializer<'de> for MyValWrapper {
type Error = de::value::Error;
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
match self.0 {
MyVal::Bool(b) => visitor.visit_bool(b),
MyVal::Text(s) => visitor.visit_string(s.clone()),
}
}
// Maybe you should call deserialize_any from all other deserialize_* functions to get a passable error message, e.g.:
forward_to_deserialize_any! {
bool i8 i16 i32 i64 i128 u8 u16 u32 u64 u128 f32 f64 char str string
bytes byte_buf option unit unit_struct newtype_struct seq tuple
tuple_struct map struct enum identifier ignored_any
}
// Though maybe you want to handle deserialize_enum differently...
}
MyValWrapper now can deserialize a MyVal. To use MyValWrapper to deserialize a Vec of MyVals, the serde::value::SeqDeserializer adapter is convenient, it can turn an iterator of (Into)Deserializer into a sequence Deserializer:
let data: Vec<MyData> = …;
// Vec deserialization snippet
let mut sd = SeqDeserializer::new(data.into_iter().map(|myval| MyValWrapper(myval)));
let res = visitor.visit_seq(&mut sd)?;
sd.end()?;
Ok(res)
For some reason, SeqDeserializer requires the iterator items to be IntoDeserializer, but no impl<T: IntoDeserializer> Deserializer for T exists, so we need to make sure that MyValWrapper is not only a Deserializer but trivially also an IntoDeserializer:
impl<'de> IntoDeserializer<'de> for MyValWrapper {
type Deserializer = MyValWrapper;
fn into_deserializer(self) -> Self::Deserializer {
self
}
}
Finally, you need to impl Deserializer for MyWrapper (you can use the "Vec deserialization snippet" for that) — if you do actually need Deserializer to be implemented, which I suspect you don't: the SeqDeserializer already implements Deserializer, and it is a wrapper struct (just as MyWrapper is a wrapper struct). Especially, if your final goal is having a function like
fn turn_bad_myvals_into_good_T<T: DeserializeOwned>(v: Vec<MyVal>) -> T {
T::deserialize(SeqDeserializer::new(
v.into_iter().map(|myval| MyValWrapper(myval)),
))
}
then you entirely don't need MyWrapper.
Disclaimer: I'm a beginner with rust!
Lets assume I have the following to structs and I want the default json serialization from the derived Serialize traits
#[derive(Serialize)]
struct Address {
zip:usize,
street: String,
}
#[derive(Serialize)]
struct Person{
name: String,
age: usize,
address: Address
}
In addition, I have implemented a custom serde::Serializer for a legacy format which takes any Serialize and internally forwards to the serialize methods when it encounters nested structs, enums, etc.
struct MySerializer {...}
impl Serializer for MySerializer {...}
Further, I defined a custom trait MyFormat that extends the Serialize trait and is supposed to work as a tag or something, i.e. if MySerializer encounters something that implements MyFormat, it should not use Serialize, but MyFormat instead.
trait MyFormat: Serialize {
/// by default just use the normal Serialize trait method.
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
Serialize::serialize(&self, serializer)
}
}
impl MyFormat for Address {
fn serialize<S>(&self, serializer: S) -> Result<<S as Serializer>::Ok, <S as Serializer>::Error>
where
S: Serializer,
{
// this can actually look much more complicated, e.g. some additional mapping will be done here.
serializer.serialize_some(&self.zip)?;
serializer.serialize_some(&"some extra field only present in this format")?;
serializer.serialize_some(&self.street)?
}
}
impl MyFormat for Person {}
Now the problem is, when I use the custom serializer, I can, on the most outer level control whether to use the derived Serialize, however, the moment I descent into the structure, the Serializer trait expects Serialize types in the trait method specification which I cannot narrow down to MyFormat, as that would violate the trait.
In the above example name and age are ok, as the format for primitive types is identical, however, the nested Address might require to "inject" some additional fields if serialized to MyFormat.
I assume I just miss some paradigm that lets me do that with traits in rust. In C++ I could use template specialization or SFINAE. Is there anything in rust that can solve this kind of problem?
Kind regards,
Marti
The zip that accepts iterable is turning my object to Object[] vs the merge. After the zip, I cannot perform other transformation because I lost my object type. Is this the same concept as the stream's reduce combiner? Just wondering how to properly use it. Thanks.
final List<Object[]> list = Flux
.zip(List.of(Mono.just("hello"), Mono.just("world")), objects -> objects)
.collectList().block();
final List<String> strings = Flux
.merge(List.of(Mono.just("hello"), Mono.just("world")))
.collectList().block();
It's an API limitation at present since the generic type of the Iterable's Publisher isn't captured, so that type information isn't available to you in the method. This means you'll unfortunately have to do something unsafe if you want to keep the type information here.
The most trivial change to your current code to get a List<String[]> would be the following:
final List<String[]> list = Flux
.zip(List.of(Mono.just("hello"), Mono.just("world")), objects -> Arrays.stream(objects).toArray(String[]::new))
.collectList().block();
...but of course, you do lose your type safety.
Depending on your use case (generally speaking, if you combinator can combine elements one at a time rather than all in one go), you may also be able to use Flux.zip() in a reducer:
List<Flux<String>> l = new ArrayList<>();
l.add(Flux.just("hello", "me"));
l.add(Flux.just("world", "hungry"));
final List<String> strings = Flux.fromIterable(l)
.reduce((a, b) -> Flux.zip(a, b, (x, y) -> x + ", " + y))
.flatMap(x -> x.collectList())
.block();
It's not equivalent, but may be a type-safe alternative depending on what you need.
Looks like the first argument to the zip function takes a Iterable<? extends Publisher<?>> the question marks mean it can take whatever object.
and its second argument Function<? super Object[],? extends O> is a function that the first argument is "something" that is an object in an array, and the second argument is "something" that extends a concrete type.
So sadly you will be getting a Object[] it's how it is written. You can cast your objects to the correct.
I have never used it before but i played around with it a bit.
final Flux<String> helloWorldString = Flux.zip(List.of(Mono.just("hello"), Mono.just(" "), Mono.just("world")), objects -> {
StringBuilder value = new StringBuilder();
for (var object : objects) {
value.append((String) object);
}
return value.toString();
});
As it is a combinator i think its purpose is to take any objects[] and build a concrete type out if it.
This would make it possible to safely iterate over the same element twice, or to hold some state for the global thing being iterated over in the item type.
Something like:
trait IterShort<Iter>
where Self: Borrow<Iter>,
{
type Item;
fn next(self) -> Option<Self::Item>;
}
then an implementation could look like:
impl<'a, MyIter> IterShort<MyIter> for &'a mut MyIter {
type Item = &'a mut MyItem;
fn next(self) -> Option<Self::Item> {
// ...
}
}
I realize I could write my own (I just did), but I'd like one that works with the for-loop notation. Is that possible?
The std::iter::Iterator trait can not do this, but you can write a different trait:
trait StreamingIterator {
type Item;
fn next<'a>(&'a mut self) -> Option<&'a mut Self::Item>;
}
Note that the return value of next borrows the iterator itself, whereas in Vec::iter for example it only borrows the vector.
The downside is that &mut is hard-coded. Making it generic would require higher-kinded types (so that StreamingIterator::Item could itself be generic over a lifetime parameter).
Alexis Beingessner gave a talk about this and more titled Who Owns This Stream of Data? at RustCamp.
As to for loops, they’re really tied to std::iter::IntoIterator which is tied to std::iter::Iterator. You’d just have to implement both.
The standard iterators can't do this as far as I can see. The very definition of an iterator is that the outside has control over the elements while the inside has control over what produces the elements.
From what I understand of what you are trying to do, I'd flip the concept around and instead of returning elements from an iterator to a surrounding environment, pass the environment to the iterator. That is, you create a struct with a constructor function that accepts a closure and implements the iterator trait. On each call to next, the passed-in closure is called with the next element and the return value of that closure or modifications thereof are returned as the current element. That way, next can handle the lifetime of whatever would otherwise be returned to the surrounding environment.
fn main() {
let vec: Vec<_> = (0..5).map(|n| n.to_string()).collect();
for item in get_iterator(&vec) {
println!("{}", item);
}
}
fn get_iterator(s: &[String]) -> Box<Iterator<Item=String>> {
Box::new(s.iter())
}
fn get_iterator<'a>(s: &'a [String]) -> Box<Iterator<Item=&'a String> + 'a> {
Box::new(s.iter())
}
The trick here is that we start with a slice of items and that slice has the lifetime 'a. slice::iter returns a slice::Iter with the same lifetime as the slice. The implementation of Iterator likewise returns references with that lifetime. We need to connect all of the lifetimes together.
That explains the 'a in the arguments and in the Item=&'a part. So what's the + 'a mean? There's a complete answer about that, and another with more detail. The short version is that an object with references inside of it may implement a trait, so we need to account for those lifetimes when talking about a trait. By default, that lifetime is 'static as it was determined that was the usual case.
The Box is not strictly required, but is a normal thing you'll see when you don't want to deal with the complicated types that might underlie the implementation (or just don't want to expose the implementation). In this case, the function could be
fn get_iterator<'a>(s: &'a [String]) -> std::slice::Iter<'a, String> {
s.iter()
}
But if you add .skip(1), the type would be:
std::iter::Skip<std::slice::Iter<'a, String>>
And if you involve a closure, then it's currently impossible to specify the type, as closures are unique, anonymous, auto-generated types! A Box is required for those cases.