Getting raw bytes from packed struct - serialization

I've got a struct which looks like this:
#[repr(packed)]
struct Header {
some: u8,
thing: u8,
}
How can I get raw bytes from it, that I could use for either a C library or socket interaction?
I was hoping to use transmute to solve the problem, but unfortunately this doesn't work:
let header = Header {....}
let header_bytes: &[u8] = unsafe { mem::transmute(header) };
let result = self.socket.write(header_bytes);
This fails with
error: transmute called on types with different sizes: &Header (64 bits) to &[u8] (128 bits)

Edit: updated for Rust 1.x.
You can't transmute arbitrary thing to arbitrary thing (like Header to &[u8]) because transmute() is akin to reinterpret_cast in C++: it literally reinterprets the bytes which its argument consists of, and in order for this to be successful the source and the target type should have the same size. But even if Header and &[u8] had the same size, it wouldn't make sense to convert between them: Header is an actual value while &[u8] is a pointer with a size of data behind that pointer.
In order to interpret a piece of data as a slice of bytes you need to perform three steps: obtain a raw pointer to the data, convert it to a pointer to u8 and then convert that to a slice of u8. This means enhancing the raw pointer value with the length, which in this case is equal to the structure size. The last step can easily be done with std::slice::from_raw_parts():
use std::slice;
use std::mem;
let p: *const Header = &header; // the same operator is used as with references
let p: *const u8 = p as *const u8; // convert between pointer types
let s: &[u8] = unsafe {
slice::from_raw_parts(p, mem::size_of::<Header>())
};
However, doing this in most cases is not a good idea. Even though the struct is marked as packed, it still leaves the problem with the byte order, so at the very least the code which does this won't be portable (or rather, the data it serializes in such form may not be recoverable by the same program compiled for another architecture).

You can make any object into a byte slice with a function like this:
/// Safe to use with any wholly initialized memory `ptr`
unsafe fn raw_byte_repr<'a, T>(ptr: &'a T) -> &'a [u8]
{
std::mem::transmute(std::raw::Slice{
data: ptr as *const _ as *const u8,
len: std::mem::size_of::<T>(),
})
}
Run in Rust playpen.
I don't know the full safety analysis, but as long as you don't access any uninit values it should be fine.

Related

Is my understanding of the following Rust "reqwest" code correct?

I've been toying around with Rust and have come across the following code:
fn request(&url) -> Result<(), Box<dyn std::error::Error>> {
let mut res = reqwest::get(&url)?;
let mut body = String::new();
res.read_to_string(&mut body)?;
println!("Status: {}", res.status());
println!("Headers:\n{:#?}", res.headers());
println!("Body:\n{}", body);
Ok(())
}
It is my understanding that:
fn request(&url) -> Result<(), Box<dyn std::error::Error>> {
Defines a function that has a single (borrowed) parameter and uses Result to handle errors.
let mut res = reqwest::get(&url)?;
Defines a mutable variable to store the response object from the reqwest crate's get method.
let mut body = String::new();
Defines a mutable variable to store the responseText string.
res.read_to_string(&mut body)?;
This method stores the responseText in the body variable.
println!("Status: {}", res.status());
println!("Headers:\n{:#?}", res.headers());
println!("Body:\n{}", body);
Prints three formatted strings (with trailing new lines) containing the response status, headers and body.
Ok(())
Handles errors via Result..?
Questions:
What do the empty parenthesis in Result<() and OK(()) mean/do?
What is Box<dyn std::error::Error>?
You're absolutely correct in your understanding.
A Result is an Enum which can either be "Ok" or "Err" - if Ok, then there can be some value of okayness (a result, response, data, output, whatever); similarly, if Err, then there's some concrete error you may want to communicate. With that let's break down the result.
The should be read like this: Result<TypeOfValueIfOkay, TypeOfErrorWhenNotOkay>. These two sub-types can be anything, but they have to be something - can't just ignore it.
So if TypeOfValueIfOkay has to be something, but if you don't want to return something, you can return an empty Tuple. That's the () in Result. It's just efficiently saying "I return nothing at all when everything goes well".
So then the second part TypeOfErrorWhenNotOkay can also just be any type - a string, an int, whatever. It helps for the type to implement the std::error::Error trait helping callers standardize a bit.
Returning "some dynamic object but that implements trait std::error::Error" requires Rust to know the exact size of this value if it is to return it on the caller's stack (the caller's stack needs to be sized to accept it.)
This is where the Box type comes in - it pushes the actual value onto the heap and holds a pointer to it (which can be of predictable fixed size no matter the actual value on the heap.) The <dyn std::error::Error> is an assurance that whatever the boxed value is, it implements the Error trait.
So now the final Ok(()) makes sense. If you read Ok(value): it says the Result enum is variant Ok with the value of "empty tuple" (), i.e. nothing.

"SafeArray cannot be marshaled to this array type" error

I have a C++ COM local server and C# client. The server code:
// MyStruct as define in the _i.h file
typedef /* [uuid] */ DECLSPEC_UUID("...") struct MyStruct
{
SAFEARRAY * FormatData;
LONG aLong;
BOOL aBool;
} MyStruct;
// Server method being invoked
STDMETHODIMP CMyClass::Foo(MyStruct* StreamInfo, int* result)
{
long Length;
BYTE* Data;
GetData(Length, Data);
PackBytes(Length, Data, &(StreamInfo->FormatData));
}
PackBytes converts the BYTE array to SAFEARRAY. It is taken from this stackoverflow question. It sets the boundary & dimension of the SAFEARRAY.
The client code:
MyStruct myStruct;
int rc = obj.Foo(out myStruct);
Where MyStruct is imported from the COM assembly. it appears as
public struct MyStruct
{
public Array FormatData;
int aLong;
int aBool;
}
After running Foo appears the error "SafeArray cannot be marshaled to this array type because it has either nonzero lower bounds or more than one dimension" with additional remark "Make sure your array has the required number of dimensions".
When debugging the server code it seems Data is properly populated in FormatData: as can be seen in screen-shot below. cElements equals Length and the 18 data pieces are equal to the ones in Data.
Hard-coding Length = 1 did not help. Removing the PackByets call made the error disappear (other fields were passed ok). How can this be fixed?
The PackBytes method that you have referenced constructs a SAFEARRAY with lower bound of 1. Constructing it with a lower bound of zero may fix the problem:
SAFEARRAYBOUND bound{ count, 0 };

How do I handle errors from libc functions in an idiomatic Rust manner?

libc's error handling is usually to return something < 0 in case of an error. I find myself doing this over and over:
let pid = fork()
if pid < 0 {
// Please disregard the fact that `Err(pid)`
// should be a `&str` or an enum
return Err(pid);
}
I find it ugly that this needs 3 lines of error handling, especially considering that these tests are quite frequent in this kind of code.
Is there a way to return an Err in case fork() returns < 0?
I found two things which are close:
assert_eq!. This needs another line and it panics so the caller cannot handle the error.
Using traits like these:
pub trait LibcResult<T> {
fn to_option(&self) -> Option<T>;
}
impl LibcResult<i64> for i32 {
fn to_option(&self) -> Option<i64> {
if *self < 0 { None } else { Some(*self) }
}
}
I could write fork().to_option().expect("could not fork"). This is now only one line, but it panics instead of returning an Err. I guess this could be solved using ok_or.
Some functions of libc have < 0 as sentinel (e.g. fork), while others use > 0 (e.g. pthread_attr_init), so this would need another argument.
Is there something out there which solves this?
As indicated in the other answer, use pre-made wrappers whenever possible. Where such wrappers do not exist, the following guidelines might help.
Return Result to indicate errors
The idiomatic Rust return type that includes error information is Result (std::result::Result). For most functions from POSIX libc, the specialized type std::io::Result is a perfect fit because it uses std::io::Error to encode errors, and it includes all standard system errors represented by errno values. A good way to avoid repetition is using a utility function such as:
use std::io::{Result, Error};
fn check_err<T: Ord + Default>(num: T) -> Result<T> {
if num < T::default() {
return Err(Error::last_os_error());
}
Ok(num)
}
Wrapping fork() would look like this:
pub fn fork() -> Result<u32> {
check_err(unsafe { libc::fork() }).map(|pid| pid as u32)
}
The use of Result allows idiomatic usage such as:
let pid = fork()?; // ? means return if Err, unwrap if Ok
if pid == 0 {
// child
...
}
Restrict the return type
The function will be easier to use if the return type is modified so that only "possible" values are included. For example, if a function logically has no return value, but returns an int only to communicate the presence of error, the Rust wrapper should return nothing:
pub fn dup2(oldfd: i32, newfd: i32) -> Result<()> {
check_err(unsafe { libc::dup2(oldfd, newfd) })?;
Ok(())
}
Another example are functions that logically return an unsigned integer, such as a PID or a file descriptor, but still declare their result as signed to include the -1 error return value. In that case, consider returning an unsigned value in Rust, as in the fork() example above. nix takes this one step further by having fork() return Result<ForkResult>, where ForkResult is a real enum with methods such as is_child(), and from which the PID is extracted using pattern matching.
Use options and other enums
Rust has a rich type system that allows expressing things that have to be encoded as magic values in C. To return to the fork() example, that function returns 0 to indicate the child return. This would be naturally expressed with an Option and can be combined with the Result shown above:
pub fn fork() -> Result<Option<u32>> {
let pid = check_err(unsafe { libc::fork() })? as u32;
if pid != 0 {
Some(pid)
} else {
None
}
}
The user of this API would no longer need to compare with the magic value, but would use pattern matching, for example:
if let Some(child_pid) = fork()? {
// execute parent code
} else {
// execute child code
}
Return values instead of using output parameters
C often returns values using output parameters, pointer parameters into which the results are stored. This is either because the actual return value is reserved for the error indicator, or because more than one value needs to be returned, and returning structs was badly supported by historical C compilers.
In contrast, Rust's Result supports return value independent of error information, and has no problem whatsoever with returning multiple values. Multiple values returned as a tuple are much more ergonomic than output parameters because they can be used in expressions or captured using pattern matching.
Wrap system resources in owned objects
When returning handles to system resources, such as file descriptors or Windows handles, it good practice to return them wrapped in an object that implements Drop to release them. This will make it less likely that a user of the wrapper will make a mistake, and it makes the use of return values more idiomatic, removing the need for awkward invocations of close() and resource leaks coming from failing to do so.
Taking pipe() as an example:
use std::fs::File;
use std::os::unix::io::FromRawFd;
pub fn pipe() -> Result<(File, File)> {
let mut fds = [0 as libc::c_int; 2];
check_err(unsafe { libc::pipe(fds.as_mut_ptr()) })?;
Ok(unsafe { (File::from_raw_fd(fds[0]), File::from_raw_fd(fds[1])) })
}
// Usage:
// let (r, w) = pipe()?;
// ... use R and W as normal File object
This pipe() wrapper returns multiple values and uses a wrapper object to refer to a system resource. Also, it returns the File objects defined in the Rust standard library and accepted by Rust's IO layer.
The best option is to not reimplement the universe. Instead, use nix, which wraps everything for you and has done the hard work of converting all the error types and handling the sentinel values:
pub fn fork() -> Result<ForkResult>
Then just use normal error handling like try! or ?.
Of course, you could rewrite all of nix by converting your trait to returning Results and including the specific error codes and then use try! or ?, but why would you?
There's nothing magical in Rust that converts negative or positive numbers into a domain specific error type for you. The code you already have is the correct approach, once you've enhanced it to use a Result either by creating it directly or via something like ok_or.
An intermediate solution would be to reuse nix's Errno struct, perhaps with your own trait sugar on top.
so this would need another argument
I'd say it would be better to have different methods: one for negative sentinel values and one for positive sentinel values.

Reading dynamically growing file using NSInputStream

I should use Objective-C to read some slowly growing file (under Mac OS X).
"Slowly" means that I read to EOF before it grows bigger.
In means of POSIX code in plain syncronous C I can do it as following:
while(1)
{
res = select(fd+1,&fdset,NULL,&fdset,some_timeout);
if(res > 0)
{
len = read(fd,buf,sizeof(buf));
if (len>0)
{
printf("Could read %u bytes. Continue.\n", len);
}
else
{
sleep(some_timeout_in_sec);
}
}
}
Now I want to re-write this in some asynchronous manner, using NSInputSource or some other async Objective-C technique.
The problem with NSInputSource: If I use scheduleInRunLoop: method then once I get NSStreamEventEndEncountered event, I stop receiving any events.
Can I still use NSInputSource or should I pass to using NSFileHandle somehow or what would you recommend ?
I see a few problems.
1) some_Timeout, for select() needs to be a struct timeval *.
2) for sleep() some_timeout needs to be an integer number of seconds.
3) the value in some_timeout is decremented via select() (which is why the last parameter is a pointer to the struct timeval*. And that struct needs to be re-initialized before each call to select().
4) the parameters to select() are highest fd of interest+1, then three separate struct fd_set * objects. The first is for input files, the second is for output files, the third is for exceptions, however, the posted code is using the same struct fd_set for both the inputs and the exceptions, This probably will not be what is needed.
When the above problems are corrected, the code should work.

How to get access to WriteableBitmap.PixelBuffer pixels with C++?

There are a lot of samples for C#, but only some code snippets for C++ on MSDN. I have put it together and I think it will work, but I am not sure if I am releasing all the COM references I have to.
Your code is correct--the reference count on the IBufferByteAccess interface of *buffer is incremented by the call to QueryInterface, and you must call Release once to release that reference.
However, if you use ComPtr<T>, this becomes much simpler--with ComPtr<T>, you cannot call any of the three members of IUnknown (AddRef, Release, and QueryInterface); it prevents you from calling them. Instead, it encapsulates calls to these member functions in a way that makes it difficult to screw things up. Here's an example of how this would look:
// Get the buffer from the WriteableBitmap:
IBuffer^ buffer = bitmap->PixelBuffer;
// Convert from C++/CX to the ABI IInspectable*:
ComPtr<IInspectable> bufferInspectable(AsInspectable(buffer));
// Get the IBufferByteAccess interface:
ComPtr<IBufferByteAccess> bufferBytes;
ThrowIfFailed(bufferInspectable.As(&bufferBytes));
// Use it:
byte* pixels(nullptr);
ThrowIfFailed(bufferBytes->Buffer(&pixels));
The call to bufferInspectable.As(&bufferBytes) performs a safe QueryInterface: it computes the IID from the type of bufferBytes, performs the QueryInterface, and attaches the resulting pointer to bufferBytes. When bufferBytes goes out of scope, it will automatically call Release. The code has the same effect as yours, but without the error-prone explicit resource management.
The example uses the following two utilities, which help to keep the code clean:
auto AsInspectable(Object^ const object) -> Microsoft::WRL::ComPtr<IInspectable>
{
return reinterpret_cast<IInspectable*>(object);
}
auto ThrowIfFailed(HRESULT const hr) -> void
{
if (FAILED(hr))
throw Platform::Exception::CreateException(hr);
}
Observant readers will notice that because this code uses a ComPtr for the IInspectable* we get from buffer, this code actually performs an additional AddRef/Release compared to the original code. I would argue that the chance of this impacting performance is minimal, and it's best to start from code that is easy to verify as correct, then optimize for performance once the hot spots are understood.
This is what I tried so far:
// Get the buffer from the WriteableBitmap
IBuffer^ buffer = bitmap->PixelBuffer;
// Get access to the base COM interface of the buffer (IUnknown)
IUnknown* pUnk = reinterpret_cast<IUnknown*>(buffer);
// Use IUnknown to get the IBufferByteAccess interface of the buffer to get access to the bytes
// This requires #include <Robuffer.h>
IBufferByteAccess* pBufferByteAccess = nullptr;
HRESULT hr = pUnk->QueryInterface(IID_PPV_ARGS(&pBufferByteAccess));
if (FAILED(hr))
{
throw Platform::Exception::CreateException(hr);
}
// Get the pointer to the bytes of the buffer
byte *pixels = nullptr;
pBufferByteAccess->Buffer(&pixels);
// *** Do the work on the bytes here ***
// Release reference to IBufferByteAccess created by QueryInterface.
// Perhaps this might be done before doing more work with the pixels buffer,
// but it's possible that without it - the buffer might get released or moved
// by the time you are done using it.
pBufferByteAccess->Release();
When using C++/WinRT (instead of C++/CX) there's a more convenient (and more dangerous) alternative. The language projection generates a data() helper function on the IBuffer interface that returns a uint8_t* into the memory buffer.
Assuming that bitmap is of type WriteableBitmap the code can be trimmed down to this:
uint8_t* pixels{ bitmap.PixelBuffer().data() };
// *** Do the work on the bytes here ***
// No cleanup required; it has already been dealt with inside data()'s implementation
In the code pixels is a raw pointer into data controlled by the bitmap instance. As such it is only valid as long as bitmap is alive, but there is nothing in the code that helps the compiler (or a reader) track that dependency.
For reference, there's an example in the WriteableBitmap::PixelBuffer documentation illustrating the use of the (otherwise undocumented) helper function data().