How do I format a signed integer to a sign-aware hexadecimal representation? - formatting

My initial intent was to convert a signed primitive number to its hexadecimal representation in a way that preserves the number's sign. It turns out that the current implementations of LowerHex, UpperHex and relatives for signed primitive integers will simply treat them as unsigned. Regardless of what extra formatting flags that I add, these implementations appear to simply reinterpret the number as its unsigned counterpart for formatting purposes. (Playground)
println!("{:X}", 15i32); // F
println!("{:X}", -15i32); // FFFFFFF1 (expected "-F")
println!("{:X}", -0x80000000i32); // 80000000 (expected "-80000000")
println!("{:+X}", -0x80000000i32); // +80000000
println!("{:+o}", -0x8000i16); // +100000
println!("{:+b}", -0x8000i16); // +1000000000000000
The documentation in std::fmt is not clear on whether this is supposed to happen, or is even valid, and UpperHex (or any other formatting trait) does not mention that the implementations for signed integers interpret the numbers as unsigned. There seem to be no related issues on Rust's GitHub repository either. (Post-addendum notice: Starting from 1.24.0, the documentation has been improved to properly address these concerns, see issue #42860)
Ultimately, one could implement specific functions for the task (as below), with the unfortunate downside of not being very compatible with the formatter API.
fn to_signed_hex(n: i32) -> String {
if n < 0 {
format!("-{:X}", -n)
} else {
format!("{:X}", n)
}
}
assert_eq!(to_signed_hex(-15i32), "-F".to_string());
Is this behaviour for signed integer types intentional? Is there a way to do this formatting procedure while still adhering to a standard Formatter?

Is there a way to do this formatting procedure while still adhering to a standard Formatter?
Yes, but you need to make a newtype in order to provide a distinct implementation of UpperHex. Here's an implementation that respects the +, # and 0 flags (and possibly more, I haven't tested):
use std::fmt::{self, Formatter, UpperHex};
struct ReallySigned(i32);
impl UpperHex for ReallySigned {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
let prefix = if f.alternate() { "0x" } else { "" };
let bare_hex = format!("{:X}", self.0.abs());
f.pad_integral(self.0 >= 0, prefix, &bare_hex)
}
}
fn main() {
for &v in &[15, -15] {
for &v in &[&v as &UpperHex, &ReallySigned(v) as &UpperHex] {
println!("Value: {:X}", v);
println!("Value: {:08X}", v);
println!("Value: {:+08X}", v);
println!("Value: {:#08X}", v);
println!("Value: {:+#08X}", v);
println!();
}
}
}

This is like Francis Gagné's answer, but made generic to handle i8 through i128.
use std::fmt::{self, Formatter, UpperHex};
use num_traits::Signed;
struct ReallySigned<T: PartialOrd + Signed + UpperHex>(T);
impl<T: PartialOrd + Signed + UpperHex> UpperHex for ReallySigned<T> {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
let prefix = if f.alternate() { "0x" } else { "" };
let bare_hex = format!("{:X}", self.0.abs());
f.pad_integral(self.0 >= T::zero(), prefix, &bare_hex)
}
}
fn main() {
println!("{:#X}", -0x12345678);
println!("{:#X}", ReallySigned(-0x12345678));
}

Related

Testing serialize/deserialize functions for serde "with" attribute

Serde derive macros come with the ability to control how a field is serialized/deserialized through the #[serde(with = "module")] field attribute. The "module" should have serialize and deserialize functions with the right arguments and return types.
An example that unfortunately got a bit too contrived:
use serde::{Deserialize, Serialize};
#[derive(Debug, Default, PartialEq, Eq)]
pub struct StringPair(String, String);
mod stringpair_serde {
pub fn serialize<S>(sp: &super::StringPair, ser: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
ser.serialize_str(format!("{}:{}", sp.0, sp.1).as_str())
}
pub fn deserialize<'de, D>(d: D) -> Result<super::StringPair, D::Error>
where
D: serde::Deserializer<'de>,
{
d.deserialize_str(Visitor)
}
struct Visitor;
impl<'de> serde::de::Visitor<'de> for Visitor {
type Value = super::StringPair;
fn expecting(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "a pair of strings separated by colon (:)")
}
fn visit_str<E>(self, s: &str) -> Result<Self::Value, E>
where
E: serde::de::Error,
{
Ok(s.split_once(":")
.map(|tup| super::StringPair(tup.0.to_string(), tup.1.to_string()))
.unwrap_or(Default::default()))
}
}
}
#[derive(Serialize, Deserialize)]
struct UsesStringPair {
// Other fields ...
#[serde(with = "stringpair_serde")]
pub stringpair: StringPair,
}
fn main() {
let usp = UsesStringPair {
stringpair: StringPair("foo".to_string(), "bar".to_string()),
};
assert_eq!(
serde_json::json!(&usp).to_string(),
r#"{"stringpair":"foo:bar"}"#
);
let usp: UsesStringPair = serde_json::from_str(r#"{"stringpair":"baz:qux"}"#).unwrap();
assert_eq!(
usp.stringpair,
StringPair("baz".to_string(), "qux".to_string())
)
}
Testing derived serialization for UsesStringPair is trivial with simple assertions. But I have looked at serde_test example as that makes sense to me too.
However, I want to be able to independently test the stringpair_serde::{serialize, deserialize} functions (e.g. if my crate provides just mycrate::StringPair and mycrate::stringpair_serde, and UsesStringPair is for the crate users to implement).
One way I've looked into is creating a serde_json::Serializer (using new, requires a io::Write implementation, which I couldn't figure out how to create and use trivially, but that's a separate question) and calling serialize with the created Serializer, then making assertions on the result as before. However, that does not test any/all implementations of serde::Serializer, just the one provided in serde_json.
I'm wondering if there's a method like in the serde_test example that works for ser/deser functions provided by a module.

How to create a vector of all decorated functions from a specific module?

I have a file main.rs and a file rule.rs. I want to define functions in rule.rs to be included in the Rules::rule vector without having to push them one by one. I'd prefer a loop to push them.
main.rs:
struct Rules {
rule: Vec<fn(arg: &Arg) -> bool>,
}
impl Rules {
fn validate_incomplete(self, arg: &Arg) -> bool {
// iterate through all constraints and evaluate, if false return and stop
for constraint in self.incomplete_rule_constraints.iter() {
if !constraint(&arg) {
return false;
}
}
true
}
}
rule.rs:
pub fn test_constraint1(arg: &Arg) -> bool {
arg.last_element().total() < 29500
}
pub fn test_constraint2(arg: &Arg) -> bool {
arg.last_element().total() < 35000
}
Rules::rule should be populated with test_constraint1 and test_constraint2.
In Python, I could add a decorator #rule_decorator above the constraints which you want to be included in the Vec, but I don't see an equivalent in Rust.
In Python, I could use dir(module) to see all available methods/attributes.
Python variant:
class Rules:
def __init__(self, name: str):
self.name = name
self.rule = []
for member in dir(self):
method = getattr(self, member)
if "rule_decorator" in dir(method):
self.rule.append(method)
def validate_incomplete(self, arg: Arg):
for constraint in self.incomplete_rule_constraints:
if not constraint(arg):
return False
return True
With the rule.py file:
#rule_decorator
def test_constraint1(arg: Arg):
return arg.last_element().total() < 29500
#rule_decorator
def test_constraint1(arg: Arg):
return arg.last_element().total() < 35000
All functions with a rule_decorator are added to the self.rule list and checked off by the validate_incomplete function.
Rust does not have the same reflection features as Python. In particular, you cannot iterate through all functions of a module at runtime. At least you can't do that with builtin tools. It is possible to write so called procedural macros which let you add custom attributes to your functions, e.g. #[rule_decorator] fn foo() { ... }. With proc macros, you can do almost anything.
However, using proc macros for this is way too over-engineered (in my opinion). In your case, I would simply list all functions to be included in your vector:
fn test_constraint1(arg: u32) -> bool {
arg < 29_500
}
fn test_constraint2(arg: u32) -> bool {
arg < 35_000
}
fn main() {
let rules = vec![test_constraint1 as fn(_) -> _, test_constraint2];
// Or, if you already have a vector and need to add to it:
let mut rules = Vec::new();
rules.extend_from_slice(
&[test_constraint1 as fn(_) -> _, test_constraint2]
);
}
A few notes about this code:
I replaced &Arg with u32, because it doesn't have anything to do with the problem. Please omit unnecessary details from questions on StackOverflow.
I used _ in the number literals to increase readability.
The strange as fn(_) -> _ cast is sadly necessary. You can read more about it in this question.
You can, with some tweaks and restrictions, achieve your goals. You'll need to use the inventory crate. This is limited to Linux, macOS and Windows at the moment.
You can then use inventory::submit to add values to a global registry, inventory::collect to build the registry, and inventory::iter to iterate over the registry.
Due to language restrictions, you cannot create a registry for values of a type that you do not own, which includes the raw function pointer. We will need to create a newtype called Predicate to use the crate:
use inventory; // 0.1.3
struct Predicate(fn(&u32) -> bool);
inventory::collect!(Predicate);
struct Rules;
impl Rules {
fn validate_incomplete(&self, arg: &u32) -> bool {
inventory::iter::<Predicate>
.into_iter()
.all(|Predicate(constraint)| constraint(arg))
}
}
mod rules {
use super::Predicate;
pub fn test_constraint1(arg: &u32) -> bool {
*arg < 29500
}
inventory::submit!(Predicate(test_constraint1));
pub fn test_constraint2(arg: &u32) -> bool {
*arg < 35000
}
inventory::submit!(Predicate(test_constraint2));
}
fn main() {
if Rules.validate_incomplete(&42) {
println!("valid");
} else {
println!("invalid");
}
}
There are a few more steps you'd need to take to reach your originally-stated goal:
"a vector"
You can collect from the provided iterator to build a Vec.
"decorated functions"
You can write your own procedural macro that will call inventory::submit!(Predicate(my_function_name)); for you.
"from a specific module"
You can add the module name into the Predicate struct and filter on that later.
See also:
How can I statically register structures at compile time?

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

Can I use the "null pointer optimization" for my own non-pointer types?

When you have an Option<&T>, the compiler knows that NULL is never a possible value for &T, and encodes the None variant as NULL instead. This allows for space-saving:
use std::mem;
fn main() {
assert_eq!(mem::size_of::<&u8>(), mem::size_of::<Option<&u8>>());
}
However, if you do the same with a non-pointer type, there's no extra bits to store that value in and extra space is required:
use std::mem;
fn main() {
// fails because left is 1 and right is 2
assert_eq!(mem::size_of::<u8>(), mem::size_of::<Option<u8>>());
}
In general, this is correct. However, I'd like to opt-in to the optimization because I know that my type has certain impossible values. As a made-up-example, I might have a player character that has an age. The age may be unknown, but will never be as high as 255:
struct Age(u8);
struct Player {
age: Option<Age>,
}
I'd like to be able to inform the optimizer of this constraint - Age can never be 255, so it's safe to use that bit pattern as None. Is this possible?
As of Rust 1.28, you can use std::num::NonZeroU8 (and friends). This acts as a wrapper that tells the compiler the contents of a number will never contain a literal zero. It's also why Option<Box<T>> is pointer-sized.
Here's an example showing how to create an Age and read its payload.
use std::num::NonZeroU8;
struct Age(NonZeroU8);
impl Age {
pub fn new(age: u8) -> Age {
let age = NonZeroU8::new(age).expect("Age cannot be zero!");
Age(age)
}
pub fn age(&self) -> u8 {
self.0.get()
}
}
struct Player {
age: Option<Age>,
}
fn main() {
println!("size: {}", std::mem::size_of::<Player>());
// Output: size: 1
}

Implementing Decodable for a wrapper around a fixed size vector

Background: the serialize crate is undocumented, deriving Decodable doesn't work. I've also looked at existing implementations for other types and find the code difficult to follow.
How does the decoding process work, and how do I implement Decodable for this struct?
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
The reason why #[deriving(Decodable)] doesn't work is that [A,..GRIDW] doesn't implement Decodable, and it's impossible to implement a trait for a type when both are defined outside of this crate, which is the case here. So the only solution I can see is to manually implement Decodable for Grid.
And this is as far as I've gotten
impl <A: Decodable<D, E>, D: Decoder<E>, E> Decodable<D, E> for Grid<A> {
fn decode(decoder: &mut D) -> Result<Grid<A>, E> {
decoder.read_struct("Grid", 1u, ref |d| Ok(Grid {
data: match d.read_struct_field("data", 0u, ref |d| Decodable::decode(d)) {
Ok(e) => e,
Err(e) => return Err(e)
},
}))
}
}
Which gives an error at Decodable::decode(d)
error: failed to find an implementation of trait
serialize::serialize::Decodable for [[A, .. 20], .. 20]
It's not really possible to do this nicely at the moment for a variety of reasons:
We can't be generic over the length of a fixed length array (the fundamental issue)
The current trait coherence restrictions means we can't write a custom trait MyDecodable<D, E> { ... } with impl MyDecodable<D, E> for [A, .. GRIDW] (and one for GRIDH) and a blanket implementation impl<A: Decodable<D, E>> MyDecodable<D, E> for A. This forces a trait-based solution into using an intermediary type, which then makes the compiler's type inference rather unhappy and AFAICT impossible to satisfy.
We don't have associated types (aka "output types"), which I think would allow the type inference to be slightly sane.
Thus, for now, we're left with a manual implementation. :(
extern crate serialize;
use std::default::Default;
use serialize::{Decoder, Decodable};
static GRIDW: uint = 10;
static GRIDH: uint = 5;
fn decode_grid<E, D: Decoder<E>,
A: Copy + Default + Decodable<D, E>>(d: &mut D)
-> Result<Grid<A>, E> {
// mirror the Vec implementation: try to read a sequence
d.read_seq(|d, len| {
// check it's the required length
if len != GRIDH {
return Err(
d.error(format!("expecting length {} but found {}",
GRIDH, len).as_slice()));
}
// create the array with empty values ...
let mut array: [[A, .. GRIDW], .. GRIDH]
= [[Default::default(), .. GRIDW], .. GRIDH];
// ... and fill it in progressively ...
for (i, outer) in array.mut_iter().enumerate() {
// ... by reading each outer element ...
try!(d.read_seq_elt(i, |d| {
// ... as a sequence ...
d.read_seq(|d, len| {
// ... of the right length,
if len != GRIDW { return Err(d.error("...")) }
// and then read each element of that sequence as the
// elements of the grid.
for (j, inner) in outer.mut_iter().enumerate() {
*inner = try!(d.read_seq_elt(j, Decodable::decode));
}
Ok(())
})
}));
}
// all done successfully!
Ok(Grid { data: array })
})
}
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
impl<E, D: Decoder<E>, A: Copy + Default + Decodable<D, E>>
Decodable<D, E> for Grid<A> {
fn decode(d: &mut D) -> Result<Grid<A>, E> {
d.read_struct("Grid", 1, |d| {
d.read_struct_field("data", 0, decode_grid)
})
}
}
fn main() {}
playpen.
It's also possible to write a more "generic" [T, .. n] decoder by using macros to instantiate each version, with special control over how the recursive decoding is handled to allow nested fixed-length arrays to be handled (as required for Grid); this requires somewhat less code (especially with more layers, or a variety of different lengths), but the macro solution:
may be harder to understand, and
the one I give there may be less efficient (there's a new array variable created for every fixed length array, including new Defaults, while the non-macro solution above just uses a single array and thus only calls Default::default once for each element in the grid). It may be possible to expand to a similar set of recursive loops, but I'm not sure.