How to properly handle empty, null and valid JSON? - serialization

I need to deserialize a JSON file into either None or Some(T) in Rust. The source we are using will provide null or empty, '{}', JSON fields when no values are present. I want to handle both as the None case and only deserialize when the JSON field is not null or empty.
input: {"test": null} -> output: {"test": None}
input: {"test": {}} -> output: {"test": None}
input: {"test": {"valid_json": 42}} -> output: {"test": {"valid_json": 42}}
All of the answers I could find address one case or another but not both.

use serde::{Deserialize, Deserializer};
#[derive(Deserialize, Debug, PartialEq)]
struct Foo {
#[serde(deserialize_with = "object_empty_as_none")]
bar: Option<Bar>,
}
#[derive(Deserialize, Debug, PartialEq)]
struct Bar {
inner: u32,
}
pub fn object_empty_as_none<'de, D, T>(deserializer: D) -> Result<Option<T>, D::Error>
where
D: Deserializer<'de>,
for<'a> T: Deserialize<'a>,
{
#[derive(Deserialize, Debug)]
#[serde(deny_unknown_fields)]
struct Empty {}
#[derive(Deserialize, Debug)]
#[serde(untagged)]
enum Aux<T> {
T(T),
Empty(Empty),
Null,
}
match Deserialize::deserialize(deserializer)? {
Aux::T(t) => Ok(Some(t)),
Aux::Empty(_) | Aux::Null => Ok(None),
}
}
fn main() {
let data = r#"{"bar": null}"#;
let v: Foo = serde_json::from_str(data).unwrap();
assert_eq!(v, Foo { bar: None });
let data = r#"{"bar": {}}"#;
let v: Foo = serde_json::from_str(data).unwrap();
assert_eq!(v, Foo { bar: None });
let data = r#"{"bar": {"inner": 42}}"#;
let v: Foo = serde_json::from_str(data).unwrap();
assert_eq!(
v,
Foo {
bar: Some(Bar { inner: 42 })
}
);
let data = r#"{"bar": {"not_inner": 42}}"#;
let v: Result<Foo, _> = serde_json::from_str(data);
assert!(v.is_err());
}
Should be enough for most case. Remove #[serde(deny_unknown_fields)] on Empty if you want to.

This page tells you how to implement a custom map deserializer, which requires customizing how visit_map produces key-value pairs from the input data. I've basically copied that page and produced a minimal example that implements what you're looking for. Link to playground.
use std::fmt;
use std::marker::PhantomData;
use serde::de::{Deserialize, Deserializer, MapAccess, Visitor};
use serde_json::Value as JsonValue;
use std::collections::HashMap;
#[derive(Debug)]
struct MyMap(HashMap<String, JsonValue>);
impl MyMap {
fn with_capacity(capacity: usize) -> Self {
Self(HashMap::with_capacity(capacity))
}
}
struct MyMapVisitor {
marker: PhantomData<fn() -> MyMap>,
}
impl MyMapVisitor {
fn new() -> Self {
MyMapVisitor {
marker: PhantomData,
}
}
}
impl<'de> Visitor<'de> for MyMapVisitor {
// The type that our Visitor is going to produce.
type Value = MyMap;
// Format a message stating what data this Visitor expects to receive.
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a very special map")
}
// Deserialize MyMap from an abstract "map" provided by the
// Deserializer. The MapAccess input is a callback provided by
// the Deserializer to let us see each entry in the map.
fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
where
M: MapAccess<'de>,
{
let mut map = MyMap::with_capacity(access.size_hint().unwrap_or(0));
// While there are entries remaining in the input, add them
// into our map. Empty Objects get turned into Null.
while let Some((key, value)) = access.next_entry()? {
let value = match value {
JsonValue::Object(o) if o.is_empty() => JsonValue::Null,
_ => value,
};
map.0.insert(key, value);
}
Ok(map)
}
}
// This is the trait that informs Serde how to deserialize MyMap.
impl<'de> Deserialize<'de> for MyMap {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
// Instantiate our Visitor and ask the Deserializer to drive
// it over the input data, resulting in an instance of MyMap.
deserializer.deserialize_map(MyMapVisitor::new())
}
}
fn main() -> serde_json::Result<()> {
let json_str = r#"{"a": null, "b": {}, "c": {"valid_json": 42}}"#;
let v: MyMap = serde_json::from_str(json_str)?;
println!("{:?}", v);
Ok(())
}
This prints MyMap({"b": Null, "c": Object({"valid_json": Number(42)}), "a": Null}) which I believe is what you're after.

Related

Is there a way to tell Serde to use a struct field as a map's key?

I have a map of items that I would like to serialize to a list of structs, each having a field for the corresponding key.
Imagine having a YAML file like this:
name_a:
some_field: 0
name_b:
some_field: 0
name_c:
some_field: 0
And a corresponding structure like this:
struct Item {
name: String,
some_field: usize,
}
I would like to deserialize the named items into a Vec<Item> instead of a Map<String, Item>. The item names (name_a, ...) are put into the name field of the Item objects.
I've attempted the following:
extern crate serde_yaml;
use std::fs::read_to_string;
let contents = read_to_string("file.yml").unwrap();
let items: Vec<Item> = serde_yaml::from_str(&contents).unwrap();
This however doesn't work and produces the invalid type: map, expected a sequence error.
I'd prefer to avoid creating a transient Map<String, PartialItem> that is converted to a Vec, and I would also prefer not to implement an additional PartialItem struct. Using an Option<String> as name would be possible, although I don't think this is optimal.
One way is to deserialize the map yourself:
use std::fmt;
use serde::de::{Deserialize, Deserializer, MapAccess, Visitor};
use serde_derive::Deserialize;
struct ItemMapVisitor {}
impl ItemMapVisitor {
fn new() -> Self {
Self {}
}
}
#[derive(Debug, Deserialize)]
struct SomeField {
some_field: u32,
}
#[derive(Debug)]
struct Item {
name: String,
some_field: u32,
}
#[derive(Debug)]
struct VecItem(Vec<Item>);
impl Item {
fn new(name: String, some_field: u32) -> Self {
Self { name, some_field }
}
}
impl<'de> Visitor<'de> for ItemMapVisitor {
type Value = VecItem;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("name: somefield:")
}
fn visit_map<M>(self, mut access: M) -> Result<Self::Value, M::Error>
where
M: MapAccess<'de>,
{
let mut items = Vec::with_capacity(access.size_hint().unwrap_or(0));
while let Some((key, value)) = access.next_entry::<String, SomeField>()? {
items.push(Item::new(key, value.some_field));
}
Ok(VecItem(items))
}
}
impl<'de> Deserialize<'de> for VecItem {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
deserializer.deserialize_map(ItemMapVisitor::new())
}
}
fn main() {
let contents = r#"
name_a:
some_field: 0
name_b:
some_field: 1
name_c:
some_field: 2
"#;
let items: VecItem = serde_yaml::from_str(&contents).unwrap();
println!("{:#?}", items);
}
Output:
VecItem(
[
Item {
name: "name_a",
some_field: 0
},
Item {
name: "name_b",
some_field: 1
},
Item {
name: "name_c",
some_field: 2
}
]
)
If you don't want of Somefield structure. You could also use this:
#[derive(Debug, Deserialize)]
struct Item {
#[serde(skip)]
name: String,
some_field: u32,
}
while let Some((key, value)) = access.next_entry::<String, Item>()? {
items.push(Item::new(key, value.some_field));
}
But this could add some useless copy.
Define a default value for Item::name field
#[derive(Debug, Serialize, Deserialize)]
struct Item {
#[serde(default)]
name: String,
some_field: usize,
}
With this trick Itemcan be used both for deserializing and for transforming to a Vec of Items:
let contents = read_to_string("file.yml").unwrap();
let items: HashMap<String, Item> = serde_yaml::from_str(&contents).unwrap();
let slist: Vec<Item> = items
.into_iter()
.map(|(k, v)| Item {
name: k,
some_field: v.some_field,
})
.collect();

How can I return an iterator over a locked struct member in Rust?

Here is as far as I could get, using rental, partly based on How can I store a Chars iterator in the same struct as the String it is iterating on?. The difference here is that the get_iter method of the locked member has to take a mutable self reference.
I'm not tied to using rental: I'd be just as happy with a solution using reffers or owning_ref.
The PhantomData is present here just so that MyIter bears the normal lifetime relationship to MyIterable, the thing being iterated over.
I also tried changing #[rental] to #[rental(deref_mut_suffix)] and changing the return type of MyIterable.get_iter to Box<Iterator<Item=i32> + 'a> but that gave me other lifetime errors originating in the macro that I was unable to decipher.
#[macro_use]
extern crate rental;
use std::marker::PhantomData;
pub struct MyIterable {}
impl MyIterable {
// In the real use-case I can't remove the 'mut'.
pub fn get_iter<'a>(&'a mut self) -> MyIter<'a> {
MyIter {
marker: PhantomData,
}
}
}
pub struct MyIter<'a> {
marker: PhantomData<&'a MyIterable>,
}
impl<'a> Iterator for MyIter<'a> {
type Item = i32;
fn next(&mut self) -> Option<i32> {
Some(42)
}
}
use std::sync::Mutex;
rental! {
mod locking_iter {
pub use super::{MyIterable, MyIter};
use std::sync::MutexGuard;
#[rental]
pub struct LockingIter<'a> {
guard: MutexGuard<'a, MyIterable>,
iter: MyIter<'guard>,
}
}
}
use locking_iter::LockingIter;
impl<'a> Iterator for LockingIter<'a> {
type Item = i32;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
self.rent_mut(|iter| iter.next())
}
}
struct Access {
shared: Mutex<MyIterable>,
}
impl Access {
pub fn get_iter<'a>(&'a self) -> Box<Iterator<Item = i32> + 'a> {
Box::new(LockingIter::new(self.shared.lock().unwrap(), |mi| {
mi.get_iter()
}))
}
}
fn main() {
let access = Access {
shared: Mutex::new(MyIterable {}),
};
let iter = access.get_iter();
let contents: Vec<i32> = iter.take(2).collect();
println!("contents: {:?}", contents);
}
As user rodrigo has pointed out in a comment, the solution is simply to change #[rental] to #[rental_mut].

How can a function conditionally fall back to a trait if another trait is implemented or not?

I am building up a library for generating the minimum perfect hash from a set of keys. The idea is to index the keys online without storing the full dataset in memory. Based on a user requirement, it is possible that skip_next() is not available and I want to fall back to using next(). Although it might be slower based on the speed of the iterator, it simplifies things for a general user.
My idea is to selectively iterate over all the elements generated by an iterator. This code works fine, but it requires a user to implement the trait FastIteration:
#[derive(Debug)]
struct Pixel {
r: Vec<i8>,
g: Vec<i8>,
b: Vec<i8>,
}
#[derive(Debug)]
struct Node {
r: i8,
g: i8,
b: i8,
}
struct PixelIterator<'a> {
pixel: &'a Pixel,
index: usize,
}
impl<'a> IntoIterator for &'a Pixel {
type Item = Node;
type IntoIter = PixelIterator<'a>;
fn into_iter(self) -> Self::IntoIter {
println!("Into &");
PixelIterator {
pixel: self,
index: 0,
}
}
}
impl<'a> Iterator for PixelIterator<'a> {
type Item = Node;
fn next(&mut self) -> Option<Node> {
println!("next &");
let result = match self.index {
0 | 1 | 2 | 3 => Node {
r: self.pixel.r[self.index],
g: self.pixel.g[self.index],
b: self.pixel.b[self.index],
},
_ => return None,
};
self.index += 1;
Some(result)
}
}
trait FastIteration {
fn skip_next(&mut self);
}
impl<'a> FastIteration for PixelIterator<'a> {
fn skip_next(&mut self) {
self.index += 1;
}
}
fn main() {
let p1 = Pixel {
r: vec![11, 21, 31, 41],
g: vec![12, 22, 32, 42],
b: vec![13, 23, 33, 43],
};
let mut index = 0;
let mut it = p1.into_iter();
loop {
if index == p1.r.len() {
break;
}
if index == 1 {
it.skip_next()
} else {
let val = it.next();
println!("{:?}", val);
}
index += 1;
}
}
How can one make the above program fall back to using the normal next() instead of skip_next() based on if the trait FastIteration is implemented or not?
fn fast_iterate<I>(objects: I)
where I: IntoIter + FastIteration { // should use skip_next() };
fn slow_iterate<I>(objects: I)
where I: IntoIter { // should NOT use skip_next(), use next() };
As above, one can always write two separate impl but is it possible to do this in one?
This question builds on:
Conditionally implement a Rust trait only if a type constraint is satisfied
Implement rayon `as_parallel_slice` using iterators.
You are looking for the unstable feature specialization:
#![feature(specialization)]
#[derive(Debug)]
struct Example(u8);
impl Iterator for Example {
type Item = u8;
fn next(&mut self) -> Option<u8> {
let v = self.0;
if v > 10 {
None
} else {
self.0 += 1;
Some(v)
}
}
}
trait FastIterator: Iterator {
fn skip_next(&mut self);
}
impl<I: Iterator> FastIterator for I {
default fn skip_next(&mut self) {
println!("step");
self.next();
}
}
impl FastIterator for Example {
fn skip_next(&mut self) {
println!("skip");
self.0 += 1;
}
}
fn main() {
let mut ex = Example(0);
ex.skip_next();
let mut r = 0..10;
r.skip_next();
}

Lifetime issue when implementing Iterator

I was implementing the Iterator trait for several structs and encountered some problems. Why is implementing Iterator for Rows shows error?
Here is a link: link to playground
Basically why this doesn't work?
struct Stripe<'a> {
cells: &'a [u32],
}
struct Rows<'a> {
foo: &'a Foo,
vec: Vec<u32>,
first: bool,
}
impl<'a> std::iter::Iterator for Rows<'a> {
type Item = Stripe<'a>;
fn next(&mut self) -> Option<Stripe<'a>> {
if self.first {
self.first = false;
Some(
Stripe {
cells: &self.vec[0..1],
}
)
} else {
None
}
}
}
The lifetime 'a in the Row type refers only to one field of the type. The references you are returning have nothing to do with that lifetime. The Iterator trait does not allow you to return lifetimes into the iterator-object itself. That would require adding a new lifetime to the next function.
I suggest you create a RowsIterator type with a reference to your Rows object and handle the iterator-specific stuff in there:
struct Stripe<'a> {
cells: &'a [u32],
}
struct Rows {
vec: Vec<u32>,
}
struct RowsIter<'a> {
rows: &'a Rows,
first: bool,
}
impl<'a> std::iter::Iterator for RowsIter<'a> {
type Item = Stripe<'a>;
fn next(&mut self) -> Option<Stripe<'a>> {
if self.first {
self.first = false;
Some(
Stripe {
cells: &self.rows.vec[0..1],
}
)
} else {
None
}
}
}
Full example in the playground

How can I add new methods to Iterator?

I want to define a .unique() method on iterators that enables me to iterate without duplicates.
use std::collections::HashSet;
struct UniqueState<'a> {
seen: HashSet<String>,
underlying: &'a mut Iterator<Item = String>,
}
trait Unique {
fn unique(&mut self) -> UniqueState;
}
impl Unique for Iterator<Item = String> {
fn unique(&mut self) -> UniqueState {
UniqueState {
seen: HashSet::new(),
underlying: self,
}
}
}
impl<'a> Iterator for UniqueState<'a> {
type Item = String;
fn next(&mut self) -> Option<String> {
while let Some(x) = self.underlying.next() {
if !self.seen.contains(&x) {
self.seen.insert(x.clone());
return Some(x);
}
}
None
}
}
This compiles. However, when I try to use in the same file:
fn main() {
let foo = vec!["a", "b", "a", "cc", "cc", "d"];
for s in foo.iter().unique() {
println!("{}", s);
}
}
I get the following error:
error[E0599]: no method named `unique` found for type `std::slice::Iter<'_, &str>` in the current scope
--> src/main.rs:37:25
|
37 | for s in foo.iter().unique() {
| ^^^^^^
|
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following trait defines an item `unique`, perhaps you need to implement it:
candidate #1: `Unique`
What am I doing wrong? How would I extend this arbitrary hashable types?
In your particular case, it's because you have implemented your trait for an iterator of String, but your vector is providing an iterator of &str. Here's a more generic version:
use std::collections::HashSet;
use std::hash::Hash;
struct Unique<I>
where
I: Iterator,
{
seen: HashSet<I::Item>,
underlying: I,
}
impl<I> Iterator for Unique<I>
where
I: Iterator,
I::Item: Hash + Eq + Clone,
{
type Item = I::Item;
fn next(&mut self) -> Option<Self::Item> {
while let Some(x) = self.underlying.next() {
if !self.seen.contains(&x) {
self.seen.insert(x.clone());
return Some(x);
}
}
None
}
}
trait UniqueExt: Iterator {
fn unique(self) -> Unique<Self>
where
Self::Item: Hash + Eq + Clone,
Self: Sized,
{
Unique {
seen: HashSet::new(),
underlying: self,
}
}
}
impl<I: Iterator> UniqueExt for I {}
fn main() {
let foo = vec!["a", "b", "a", "cc", "cc", "d"];
for s in foo.iter().unique() {
println!("{}", s);
}
}
Broadly, we create a new extension trait called UniqueExt which has Iterator as a supertrait. When Iterator is a supertrait, we will have access to the associated type Iterator::Item.
This trait defines the unique method, which is only valid to call when then iterated item can be:
Hashed
Compared for total equality
Cloned
Additionally, it requires that the item implementing Iterator have a known size at compile time. This is done so that the iterator can be consumed by the Unique iterator adapter.
The other important part is the blanket implementation of the trait for any type that also implements Iterator:
impl<I: Iterator> UniqueExt for I {}