Deserializing an enum using a combination of #[serde(untagged)] and #[serde(with)] - serialization

I'm trying to use an actix-web server as a gateway to a small stack to guarantee a strict data format inside of the stack while allowing some freedoms for the user.
To do that, I want to deserialize a JSON string to the struct, then validate it, serialize it again and publish it on a message broker. The main part of the data is an array of arrays that contain integers, floats and datetimes. I'm using serde for deserialization and chrono to deal with datetimes.
I tried using a struct combined with an enum to allow the different types:
#[derive(Serialize, Deserialize)]
pub struct Data {
pub column_names: Option<Vec<String>>,
pub values: Vec<Vec<ValueType>>,
}
#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum ValueType {
I32(i32),
F64(f64),
#[serde(with = "datetime_handler")]
Dt(DateTime<Utc>),
}
Since chrono::DateTime<T> does not implement Serialize, I added a custom module for that similar to how it is described in the serde docs.
mod datetime_handler {
use chrono::{DateTime, TimeZone, Utc};
use serde::{self, Deserialize, Deserializer, Serializer};
pub fn serialize<S>(dt: &DateTime<Utc>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let s = dt.to_rfc3339();
serializer.serialize_str(&s)
}
pub fn deserialize<'de, D>(deserializer: D) -> Result<DateTime<Utc>, D::Error>
where
D: Deserializer<'de>,
{
println!("Checkpoint 1");
let s = String::deserialize(deserializer)?;
println!("{}", s);
println!("Checkpoint 2");
let err1 = match DateTime::parse_from_rfc3339(&s) {
Ok(dt) => return Ok(dt.with_timezone(&Utc)),
Err(e) => Err(e),
};
println!("Checkpoint 3");
const FORMAT1: &'static str = "%Y-%m-%d %H:%M:%S";
match Utc.datetime_from_str(&s, FORMAT1) {
Ok(dt) => return Ok(dt.with_timezone(&Utc)),
Err(e) => println!("{}", e), // return first error not second if both fail
};
println!("Checkpoint 4");
return err1.map_err(serde::de::Error::custom);
}
}
This tries 2 different time formats one after the other and works for DateTime strings.
The Problem
It seems like the combination of `#[derive(Serialize, Deserialize)]`, `#[serde(untagged)]` and `#[serde(with)]` does something unexpected. `serde:from_str(...)` tries to deserialize every entry in the array with my custom `deserialize` function.
I would expect it to either try to deserialize into `ValueType::I32` first, succeed and continue with the next entry, as [the docs](https://serde.rs/enum-representations.html) say:
Serde will try to match the data against each variant in order and the first one that deserializes successfully is the one returned.
What happens is that the custom deserializeis applied to e.g. "0" fails and the deserialization stops.
What's going on? How do I solve it?
My ideas are that I either fail to deserialize in the wrong way or that I somehow "overwrite" the derived deserialize with my own.

#jonasbb helped me realize the code works when using [0,16.9,"2020-12-23 00:23:14"] but it does not when trying to deserialize ["0","16.9","2020-12-23 00:23:14"]. Serde does not serialize numbers from strings by default, the attempts for I32 and F64 just fail silently. This is discussed in this serde-issue and can be solved using the inofficial serde-aux crate.

Many crates will implement serde and other common utility crates, but will leave them as optional features. This can help save time when compiling. You can check a crate by viewing the Cargo.toml file to see if there is a feature for it or the dependency is included but marked as optional.
In your case, I can go to chrono on crates.io and select the Repository link to view the source code for the crate. In the Cargo.toml file, I can see that serde is used, but is not enabled by default.
[features]
default = ["clock", "std", "oldtime"]
alloc = []
std = []
clock = ["libc", "std", "winapi"]
oldtime = ["time"]
wasmbind = ["wasm-bindgen", "js-sys"]
unstable-locales = ["pure-rust-locales", "alloc"]
__internal_bench = []
__doctest = []
[depenencies]
...
serde = { version = "1.0.99", default-features = false, optional = true }
To enable it you can go into the Cargo.toml for your project and add it as a feature to chrono.
[depenencies]
chrono = { version: "0.4.19", features = ["serde"] }
Alternatively, chrono lists some (but not all?) of their optional features in their documentation. However, not all crates do this and docs can sometimes be out of date so I usually prefer the manual method.
As for the issue between the interaction of deserialize_with and untagged on enums, I don't see any issue with your code. It may be a bug in serde so I suggest you create an issue on the serde Repository so they can further look into why this error occurs.

Related

Rocket REST API return global object

I'm starting to learn Rust and the rocket framework https://crates.io/crates/rocket.
I have a dumb question that I can't figure out.
How do I return my_universe that I created on the first line of main() when calling GET /universe/ports/21?
fn main() {
let my_universe = universe::model::Universe::new();
rocket::ignite().mount("/universe", routes![ports]).launch();
}
#[get("/ports/<id>")]
fn ports(id: u16) -> universe::model::Universe {
// need to return my_universe here
}
The issue I'm having is that if I define my_universe within the route controller ports(), it'll recreate the my_universe object on each request. Instead, I need the route to return the same my_universe object on each request
Sharing non-mutable state in rocket is fairly easy. You can add the state with manage during the build of rocket.
rocket::build()
.manage(my_universe) // put the shared state here
.mount("/universe", routes![ports])
If you want to return this state in a route you will have to add both serde as a dependency and the json feature of rocket.
[dependencies]
rocket = { version = "0.5.0-rc.2", features = ["json"]}
serde = "1.0.147"
You can now annotate your struct with Serialize so we can send it as a JSON response later.
#[derive(Serialize)]
struct Universe {
/* ... */
}
And access this state in your route with a &State parameter.
#[get("/ports/<id>")]
fn ports(id: u16, universe: &State<Universe>) -> Json<&Universe> {
Json(universe.inner())
}
Here we can access the inner value of the state and return it as Json.
So far, the state is immutable and can not be changed in the route which might not be what you want. Consider wrapping your state into a Mutex to allow for interior mutability.

Can you serialize a vector of stuct's to TOML in rust?

Summary
I'm writing a program in rust and I would prefer use a TOML file to store a vector of struct's. However I can't figure out how to store a vector of struct's in a TOML file. I am able to do this using JSON but was hoping for TOML so just confirming that I'm not overlooking something (not even sure if the TOML format would be able to support what I want). Therefore, I'm trying to find out if anyone knows of a way use rust to serialize a vector of struct's to TOML and more importantly to deserialize it back into a vector.
Error message (on attempt to deserialize)
thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { inner: ErrorInner { kind: Wanted { expected: "a table key", found: "a right bracket" }, line: Some(0), col: 2, at: Some(2), message: "", key: [] } }', src/main.rs:22:55
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Excerpt from Cargo.toml
[dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1.0.86"
toml = "0.5.9"
Example code
Link to code on Playground
use serde::{Deserialize, Serialize};
#[derive(PartialEq, Debug, Serialize, Deserialize)]
struct Point {
x: i32,
}
/// An ordered list of points (This is what I want to store in the TOML file)
type Points = Vec<Point>;
fn main(){
// Create sample data to test on
let original: Points = vec![Point { x: 1 }];
// Working code that converts it to json and back
let json = serde_json::to_string(&original).unwrap();
let reconstructed: Points = serde_json::from_str(&json).unwrap();
assert_eq!(original, reconstructed); // No panic
// "Desired" code that converts it to toml but unable to deserialize
let toml = toml::to_string(&original).unwrap();
let reconstructed: Points = toml::from_str(&toml).unwrap(); // Panics!
assert_eq!(original, reconstructed);
}
Output of toml::to_string(&original).unwrap()
[[]]
x = 1
Explanation of example code
In the example code I create some sample data then convert it to JSON and back with no issue. I then try to convert it to TOML, which doesn't give an error but the output doesn't make sense to me. Then I try to convert it back into a rust vector and that triggers the error. My biggest problem is I'm not even sure how I would expect the TOML file to look for a valid representation of a vector with multiple struct's.
Related Questions / Research
I wasn't able to find any information for creating a vector with multiple struct's the closest I could find is this question, and while the question looks like it should solve my problem the actual problem was serializing enums and the solution hence refers to that and doesn't solve my problem.
It seems that to represent an array of tables in Toml the syntax is
[[points]]
x = 1
[[points]]
x = 2
So backtracking from Toml syntax and original panic error Error { inner: ErrorInner { kind: Wanted { expected: "a table key", found: "a right bracket" }: Introducing a wrapper struct to represent table key fixes the issue.
use serde::{Deserialize, Serialize};
#[derive(PartialEq, Debug, Serialize, Deserialize)]
struct Point {
x: i32,
}
#[derive(PartialEq, Debug,Serialize, Deserialize)]
struct Points {
points: Vec<Point>
}
impl From<Vec<Point>> for Points {
fn from(points: Vec<Point>) -> Self {
Points {
points
}
}
}
fn main(){
let original: Points = vec![Point { x: 1 }, Point {x : 2}].into();
let json = serde_json::to_string(&original).unwrap();
let reconstructed: Points = serde_json::from_str(&json).unwrap();
assert_eq!(original, reconstructed);
let toml = toml::to_string(&original).unwrap();
let reconstructed: Points = toml::from_str(&toml).unwrap();
assert_eq!(original, reconstructed);
}

Rust deserialize JSON into custom HashMap<String, google_firestore1::Value>

I just started with Rust and I have some trouble with deserialization.
I'm actually trying to use the function ProjectDatabaseDocumentCreateDocumentCall from the following crate google_firestore1. I want to populate the field fields of the struct Document. The documentation of the struct is clear, it's expecting a HashMap<String, google_firestore1::Value> as a value.
The question is, how can I deserialize a JSON string to a HashMap<String, google_firestore1::Value> ?
Here is the code I wrote for the moment:
extern crate google_firestore1 as firestore1;
use google_firestore1::Document;
use std::collections::HashMap;
use serde_json;
pub fn go() {
let _my_doc = Document::default();
let test = "{\"test\":\"test\", \"myarray\": [1]}";
// Working perfectly fine
let _working: HashMap<String, serde_json::Value> = serde_json::from_str(test).unwrap();
// Not working
let _not_working: HashMap<String, firestore1::Value> = serde_json::from_str(test).unwrap();
// Later I want to do the following
// _my_doc.fields = _not_working
}
Obvsiouly this is not working, and it crashes with the following error.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"test\", expected struct Value", line: 1, column: 14)', src/firestore.rs:17:85
stack backtrace:
Of course, I noticed that serde_json::Value and firestore1::Value are not the same Struct.
But I gave a look at the source code and it seems that firestore1::Value is implementing the Deserialize trait.
So why is it not working ? In this case, do I need to iterate over the first HashMap and deserialize serde_json::Value to firestore1::Value again ? Is there a cleaner way to do what I want ?
Thanks for your answer !
The definition of the firestore1::Value is:
/// A message that can hold any of the supported value types.
///
/// This type is not used in any activity, and only used as *part* of another schema.
///
#[derive(Default, Clone, Debug, Serialize, Deserialize)]
pub struct Value {
/// A bytes value.
///
/// Must not exceed 1 MiB - 89 bytes.
/// Only the first 1,500 bytes are considered by queries.
#[serde(rename="bytesValue")]
pub bytes_value: Option<String>,
/// A timestamp value.
///
/// Precise only to microseconds. When stored, any additional precision is
/// rounded down.
#[serde(rename="timestampValue")]
pub timestamp_value: Option<String>,
...
}
This means each entry for a firestore1::Value must be an object.
I suspect that only one of the fields would actually be set, corresponding
to the actual type of the value (as they're all optional).
So your json would need to be something like:
let test = r#"{
"test":{"stringValue":"test"},
"myarray": {
"arrayValue":{"values":[{"integerValue":1}]}
}
}"#;
This is pretty ugly, so if you're doing a lot of your own JSON to firestore conversations, I'd probably write some helpers to convert from the serde_json::Value to firestore1::Value.
It would probably look something like this:
fn my_firestore_from_json(v:serde_json::Value) -> firestore1::Value {
match v {
serde_json::Value::Null => firestore::Value {
// I don't know why this is a Option<String>
null_value: Some("".to_string),
..Default::default(),
},
serde_json::Value::Bool(b) => firestore::Value {
bool_value: Some(b),
..Default::default(),
},
// Implement this
serde_json::Value::Number(n) => my_firestore_number(n),
serde_json::Value::String(s) => firestore::Value {
string_value: Some(s),
..Default::default(),
},
serde_json::Value::Array(v) => firestore::Value {
array_value:
Some(firestore1::ArrayValue{
values:v.into_iter().map(my_firestore_from_json)
}),
..Default::default(),
},
// Implement this
serde_json::Value::Object(d) => my_firststore_object(/* something */)
}
}
This would be a bit neater if there were various implementations of From<T> for the firestore1::Value, but using the implementation of
Default makes this not too ugly.
It is also worth noting that not all firebase types are created here,
since the types expressed in serde_json are different from those supported by firebase.
Anyway this allows you to use your JSON as written by doing something like:
let test = "{\"test\":\"test\", \"myarray\": [1]}";
let working: HashMap<String, serde_json::Value> = serde_json::from_str(test).unwrap();
let value_map: HashMap<String, firestore1::Value> = working.iter().map(|(k,v)| (k, my_firestore_from_json(v)).collect();

Rust: Read and map lines from stdin and handling different error types

I'm learning Rust and trying to solve some basic algorithm problems with it. In many cases, I want to read lines from stdin, perform some transformation on each line and return a vector of resulting items. One way I did this was like this:
// Fully working Rust code
let my_values: Vec<u32> = stdin
.lock()
.lines()
.filter_map(Result::ok)
.map(|line| line.parse::<u32>())
.filter_map(Result::ok)
.map(|x|x*2) // For example
.collect();
This works but of course silently ignores any errors that may occur. Now what I woud like to do is something along the lines of:
// Pseudo-ish code
let my_values: Result<Vec<u32>, X> = stdin
.lock()
.lines() // Can cause std::io::Error
.map(|line| line.parse::<u32>()) // Can cause std::num::ParseIntError
.map(|x| x*2)
.collect();
Where X is some kind of error type that I can match on afterwards. Preferably I want to perform the whole operation on one line at a time and immediately discard the string data after it has been parsed to an int.
I think I need to create some kind of Enum type to hold the various possible errors, possibly like this:
#[derive(Debug)]
enum InputError {
Io(std::io::Error),
Parse(std::num::ParseIntError),
}
However, I don't quite understand how to put everything together to make it clean and avoid having to explicitly match and cast everywhere. Also, is there some way to automatically create these enum error types or do I have to explicilty enumerate them every time I do this?
You're on the right track.
The way I'd approach this is by using the enum you've defined,
then add implementations of From for the error types you're interested in.
That will allow you to use the ? operator on your maps to get the kind of behaviour you want.
#[derive(Debug)]
enum MyError {
IOError(std::io::Error),
ParseIntError(std::num::ParseIntError),
}
impl From<std::io::Error> for MyError {
fn from(e:std::io::Error) -> MyError {
return MyError::IOError(e)
}
}
impl From<std::num::ParseIntError> for MyError {
fn from(e:std::num::ParseIntError) -> MyError {
return MyError::ParseIntError(e)
}
}
Then you can implement the actual transform as either
let my_values: Vec<_> = stdin
.lock()
.lines()
.map(|line| -> Result<u32,MyError> { Ok(line?.parse::<u32>()?*2) } )
.collect();
which will give you one entry for each input, like: {Ok(x), Err(MyError(x)), Ok(x)}.
or you can do:
let my_values: Result<Vec<_>,MyError> = stdin
.lock()
.lines()
.map(|line| -> Result<u32,MyError> { Ok(line?.parse::<u32>()?*2) } )
.collect();
Which will give you either Err(MyError(...)) or Ok([1,2,3])
Note that you can further reduce some of the error boilerplate by using an error handling crate like snafu, but in this case it's not too much.

How do I handle errors from libc functions in an idiomatic Rust manner?

libc's error handling is usually to return something < 0 in case of an error. I find myself doing this over and over:
let pid = fork()
if pid < 0 {
// Please disregard the fact that `Err(pid)`
// should be a `&str` or an enum
return Err(pid);
}
I find it ugly that this needs 3 lines of error handling, especially considering that these tests are quite frequent in this kind of code.
Is there a way to return an Err in case fork() returns < 0?
I found two things which are close:
assert_eq!. This needs another line and it panics so the caller cannot handle the error.
Using traits like these:
pub trait LibcResult<T> {
fn to_option(&self) -> Option<T>;
}
impl LibcResult<i64> for i32 {
fn to_option(&self) -> Option<i64> {
if *self < 0 { None } else { Some(*self) }
}
}
I could write fork().to_option().expect("could not fork"). This is now only one line, but it panics instead of returning an Err. I guess this could be solved using ok_or.
Some functions of libc have < 0 as sentinel (e.g. fork), while others use > 0 (e.g. pthread_attr_init), so this would need another argument.
Is there something out there which solves this?
As indicated in the other answer, use pre-made wrappers whenever possible. Where such wrappers do not exist, the following guidelines might help.
Return Result to indicate errors
The idiomatic Rust return type that includes error information is Result (std::result::Result). For most functions from POSIX libc, the specialized type std::io::Result is a perfect fit because it uses std::io::Error to encode errors, and it includes all standard system errors represented by errno values. A good way to avoid repetition is using a utility function such as:
use std::io::{Result, Error};
fn check_err<T: Ord + Default>(num: T) -> Result<T> {
if num < T::default() {
return Err(Error::last_os_error());
}
Ok(num)
}
Wrapping fork() would look like this:
pub fn fork() -> Result<u32> {
check_err(unsafe { libc::fork() }).map(|pid| pid as u32)
}
The use of Result allows idiomatic usage such as:
let pid = fork()?; // ? means return if Err, unwrap if Ok
if pid == 0 {
// child
...
}
Restrict the return type
The function will be easier to use if the return type is modified so that only "possible" values are included. For example, if a function logically has no return value, but returns an int only to communicate the presence of error, the Rust wrapper should return nothing:
pub fn dup2(oldfd: i32, newfd: i32) -> Result<()> {
check_err(unsafe { libc::dup2(oldfd, newfd) })?;
Ok(())
}
Another example are functions that logically return an unsigned integer, such as a PID or a file descriptor, but still declare their result as signed to include the -1 error return value. In that case, consider returning an unsigned value in Rust, as in the fork() example above. nix takes this one step further by having fork() return Result<ForkResult>, where ForkResult is a real enum with methods such as is_child(), and from which the PID is extracted using pattern matching.
Use options and other enums
Rust has a rich type system that allows expressing things that have to be encoded as magic values in C. To return to the fork() example, that function returns 0 to indicate the child return. This would be naturally expressed with an Option and can be combined with the Result shown above:
pub fn fork() -> Result<Option<u32>> {
let pid = check_err(unsafe { libc::fork() })? as u32;
if pid != 0 {
Some(pid)
} else {
None
}
}
The user of this API would no longer need to compare with the magic value, but would use pattern matching, for example:
if let Some(child_pid) = fork()? {
// execute parent code
} else {
// execute child code
}
Return values instead of using output parameters
C often returns values using output parameters, pointer parameters into which the results are stored. This is either because the actual return value is reserved for the error indicator, or because more than one value needs to be returned, and returning structs was badly supported by historical C compilers.
In contrast, Rust's Result supports return value independent of error information, and has no problem whatsoever with returning multiple values. Multiple values returned as a tuple are much more ergonomic than output parameters because they can be used in expressions or captured using pattern matching.
Wrap system resources in owned objects
When returning handles to system resources, such as file descriptors or Windows handles, it good practice to return them wrapped in an object that implements Drop to release them. This will make it less likely that a user of the wrapper will make a mistake, and it makes the use of return values more idiomatic, removing the need for awkward invocations of close() and resource leaks coming from failing to do so.
Taking pipe() as an example:
use std::fs::File;
use std::os::unix::io::FromRawFd;
pub fn pipe() -> Result<(File, File)> {
let mut fds = [0 as libc::c_int; 2];
check_err(unsafe { libc::pipe(fds.as_mut_ptr()) })?;
Ok(unsafe { (File::from_raw_fd(fds[0]), File::from_raw_fd(fds[1])) })
}
// Usage:
// let (r, w) = pipe()?;
// ... use R and W as normal File object
This pipe() wrapper returns multiple values and uses a wrapper object to refer to a system resource. Also, it returns the File objects defined in the Rust standard library and accepted by Rust's IO layer.
The best option is to not reimplement the universe. Instead, use nix, which wraps everything for you and has done the hard work of converting all the error types and handling the sentinel values:
pub fn fork() -> Result<ForkResult>
Then just use normal error handling like try! or ?.
Of course, you could rewrite all of nix by converting your trait to returning Results and including the specific error codes and then use try! or ?, but why would you?
There's nothing magical in Rust that converts negative or positive numbers into a domain specific error type for you. The code you already have is the correct approach, once you've enhanced it to use a Result either by creating it directly or via something like ok_or.
An intermediate solution would be to reuse nix's Errno struct, perhaps with your own trait sugar on top.
so this would need another argument
I'd say it would be better to have different methods: one for negative sentinel values and one for positive sentinel values.