How to decode binary/raw google protobuf data - serialization

I have a coredump with encoded protobuf data and I want to decode this data and see the content. I have the .proto file which defines this message in raw protocol buffer.
My proto file looks like this:
$ cat my.proto
message header {
required uint32 u1 = 1;
required uint32 u2 = 2;
optional uint32 u3 = 3 [default=0];
optional bool b1 = 4 [default=true];
optional string s1 = 5;
optional uint32 u4 = 6;
optional uint32 u5 = 7;
optional string s2 = 9;
optional string s3 = 10;
optional uint32 u6 = 8;
}
And protoc version:
$ protoc --version
libprotoc 2.3.0
I have tried the following:
Dump the raw data from the core
(gdb) dump memory b.bin 0x7fd70db7e964 0x7fd70db7e96d
Pass it to protoc
//proto file (my.proto) is in the current dir
$ protoc --decode --proto_path=$pwd my.proto < b.bin
Missing value for flag: --decode
To decode an unknown message, use --decode_raw.
$ protoc --decode_raw < /tmp/b.bin
Failed to parse input.
Any thoughts on how to decode it? The documentation doesn’t explain much on how to go about it.
Edit:
Data in binary format (10 bytes)
(gdb) x/10xb 0x7fd70db7e964
0x7fd70db7e964: 0x08 0xff 0xff 0x01 0x10 0x08 0x40 0xf7
0x7fd70db7e96c: 0xd4 0x38

You used --decode_raw correctly, but your input does not seem to be a protobuf.
For --decode, you need to specify the type name, like:
protoc --decode header my.proto < b.bin
However, if --decode_raw reports a parse error than --decode will too.
It would seem that the bytes you extracted via gdb are not a valid protobuf. Perhaps your addresses aren't exactly right: if you added or removed a byte at either end, it probably won't parse.
I note that according to the addresses you specified, the protobuf is only 9 bytes long, which is only enough space for three or four of the fields to be set. Is that what you are expecting? Perhaps you could post the bytes here.
EDIT:
The 10 bytes you added to your question appear to decode successfully using --decode_raw:
$ echo 08ffff01100840f7d438 | xxd -r -p | protoc --decode_raw
1: 32767
2: 8
8: 928375
Cross-referencing the field numbers, we get:
u1: 32767
u2: 8
u6: 928375

protoc --decode [message_name] [.proto_file_path] < [binary_file_path],
where
[message_name] is the name of the message object in the .proto file. If the message is inside a package in the .proto file, use package_name.message_name.
[.proto_file_path] is the path to the .proto file where the message is defined.
[binary_file_path] is the path to the file you want to decode.
Example for the situation in the question (assuming that my.proto and b.bin are in your current working directory):
protoc --decode header my.proto < b.bin

proto file:
syntax = "proto3";
package response;
// protoc --gofast_out=. response.proto
message Response {
int64 UID
....
}
use protoc:
protoc --decode=response.Response response.proto < response.bin
protoc --decode=[package].[Message type] proto.file < protobuf.response

Related

Why doesn't 'utf8-c8' encoding work when reading filehandles

I wish to read byte sequences that will not decode as valid UTF-8, specifically byte sequences that correspond to high and low surrogates code points. The result should be a raku string.
I read that, in raku, the 'utf8-c8' encoding can be used for this purpose.
Consider code point U+D83F. It is a high surrogate (reserved for the high half of UTF-16 surrogate pairs).
U+D83F has a byte sequence of 0xED 0xA0 0xBF, if encoded as UTF-8.
Slurping a file? Works
If I slurp a file containing this byte sequence, using 'utf8-c8' as the encoding, I get the expected result:
echo -n $'\ud83f' >testfile # Create a test file containing the byte sequence
myprog1.raku:
#!/usr/local/bin/raku
$*OUT.encoding('utf8-c8');
print slurp('testfile', enc => 'utf8-c8');
$ ./myprog1.raku | od -An -tx1
ed a0 bf
✔️ expected result
Slurping a filehandle? Doesn't work
But if I switch from slurping a file path to slurping a filehandle, it doesn't work, even though I set the filehandle's encoding to 'utf8-c8':
myprog2.raku
#!/usr/local/bin/raku
$*OUT.encoding('utf8-c8');
my $fh = open "testfile", :r, :enc('utf8-c8');
print slurp($fh, enc => 'utf8-c8');
#print $fh.slurp; # I tried this too: same error
$ ./myprog2.raku
Error encoding UTF-8 string: could not encode Unicode Surrogate codepoint 55359 (0xD83F)
in block <unit> at ./myprog2.raku line 4
Environment
Edit 2022-10-30: I originally used my distro's package (Fedora Linux 36: Rakudo version 2020.07). I just downloaded the latest Rakudo binary release (2022.07-01). Result was the same.
$ /usr/local/bin/raku --version
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.
$ uname -a
Linux hx90 5.19.16-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Oct 16 22:50:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Fedora
Description: Fedora release 36 (Thirty Six)
Release: 36
Codename: ThirtySix

perl gunzip to buffer and gunzip to file have different byte orders

I'm using Perl v5.22.1, Storable 2.53_01, and IO::Uncompress::Gunzip 2.068.
I want to use Perl to gunzip a Storable file in memory, without using an intermediate file.
I have a variable $zip_file = '/some/storable.gz' that points to this zipped file.
If I gunzip directly to a file, this works fine, and %root is correctly set to the Storable hash.
gunzip($zip_file, '/home/myusername/Programming/unzipped');
my %root = %{retrieve('/home/myusername/Programming/unzipped')};
However if I gunzip into memory like this:
my $file;
gunzip($zip_file, \$file);
my %root = %{thaw($file)};
I get the error
Storable binary image v56.115 more recent than I am (v2.10)`
so the Storable's magic number has been butchered: it should never be that high.
However, the strings in the unzipped buffer are still correct; the buffer starts with pst which is the correct Storable header. It only seems to be multi-byte variables like integers which are being broken.
Does this have something to do with byte ordering, such that writing to a file works one way while writing to a file buffer works in another? How can I gunzip to a buffer without it ruining my integers?
That's not related to unzip but to using retrieve vs. thaw. They both expect different input, i.e. thaw expect the output from freeze while retrieve expects the output from store.
This can be verified with a simple test:
$ perl -MStorable -e 'my $x = {}; store($x,q[file.store])'
$ perl -MStorable=freeze -e 'my $x = {}; print freeze($x)' > file.freeze
On my machine this gives 24 bytes for the file created by store and 20 bytes for freeze. If I remove the leading 4 bytes from file.store the file is equivalent to file.freeze, i.e. store just added a 4 byte header. Thus you might try to uncompress the file in memory, remove the leading 4 bytes and run thaw on the rest.

Generating .gcda coverage files via QEMU/GDB

Executive summary: I want to use GDB to extract the coverage execution counts stored in memory in my embedded target, and use them to create .gcda files (for feeding to gcov/lcov).
The setup:
I can successfully cross-compile my binary, targeting my specific embedded target - and then execute it under QEMU.
I can also use QEMU's GDB support to debug the binary (i.e. use tar extended-remote localhost:... to attach to the running QEMU GDB server, and fully control the execution of my binary).
Coverage:
Now, to perform "on-target" coverage analysis, I cross-compile with
-fprofile-arcs -ftest-coverage. GCC then emits 64-bit counters to keep track of execution counts of specific code blocks.
Under normal (i.e. host-based, not cross-compiled) execution, when the app finishes __gcov_exit is called - and gathers all these execution counts into .gcdafiles (that gcov then uses to report coverage details).
In my embedded target however, there's no filesystem to speak of - and libgcov basically contains empty stubs for all __gcov_... functions.
Workaround via QEMU/GDB: To address this, and do it in a GCC-version-agnostic way, I could list the coverage-related symbols in my binary via MYPLATFORM-readelf, and grep-out the relevant ones (e.g. __gcov0.Task1_EntryPoint, __gcov0.worker, etc):
$ MYPLATFORM-readelf -s binary | grep __gcov
...
46: 40021498 48 OBJECT LOCAL DEFAULT 4 __gcov0.Task1_EntryPoint
...
I could then use the offsets/sizes reported to automatically create a GDB script - a script that extracts the counters' data via simple memory dumps (from offset, dump length bytes to a local file).
What I don't know (and failed to find any relevant info/tool), is how to convert the resulting pairs of (memory offset,memory data) into .gcda files. If such a tool/script exists, I'd have a portable (platform-agnostic) way to do coverage on any QEMU-supported platform.
Is there such a tool/script?
Any suggestions/pointers would be most appreciated.
UPDATE: I solved this myself, as you can read below - and wrote a blog post about it.
Turned out there was a (much) better way to do what I wanted.
The Linux kernel includes portable GCOV related functionality, that abstracts away the GCC version-specific details by providing this endpoint:
size_t convert_to_gcda(char *buffer, struct gcov_info *info)
So basically, I was able to do on-target coverage via the following steps:
Step 1
I added three slightly modified versions of the linux gcov files to my project: base.c, gcc_4_7.c and gcov.h. I had to replace some linux-isms inside them - like vmalloc,kfree, etc - to make the code portable (and thus, compileable on my embedded platform, which has nothing to do with Linux).
Step 2
I then provided my own __gcov_init...
typedef struct tagGcovInfo {
struct gcov_info *info;
struct tagGcovInfo *next;
} GcovInfo;
GcovInfo *headGcov = NULL;
void __gcov_init(struct gcov_info *info)
{
printf(
"__gcov_init called for %s!\n",
gcov_info_filename(info));
fflush(stdout);
GcovInfo *newHead = malloc(sizeof(GcovInfo));
if (!newHead) {
puts("Out of memory!");
exit(1);
}
newHead->info = info;
newHead->next = headGcov;
headGcov = newHead;
}
...and __gcov_exit:
void __gcov_exit()
{
GcovInfo *tmp = headGcov;
while(tmp) {
char *buffer;
int bytesNeeded = convert_to_gcda(NULL, tmp->info);
buffer = malloc(bytesNeeded);
if (!buffer) {
puts("Out of memory!");
exit(1);
}
convert_to_gcda(buffer, tmp->info);
printf("Emitting %6d bytes for %s\n", bytesNeeded, gcov_info_filename(tmp->info));
free(buffer);
tmp = tmp->next;
}
}
Step 3
Finally, I scripted my GDB (driving QEMU remotely) via this:
$ cat coverage.gdb
tar extended-remote :9976
file bin.debug/fputest
b base.c:88 <================= This breaks on the "Emitting" printf in __gcov_exit
commands 1
silent
set $filename = tmp->info->filename
set $dataBegin = buffer
set $dataEnd = buffer + bytesNeeded
eval "dump binary memory %s 0x%lx 0x%lx", $filename, $dataBegin, $dataEnd
c
end
c
quit
And finally, executed both QEMU and GDB - like this:
$ # In terminal 1:
qemu-system-MYPLATFORM ... -kernel bin.debug/fputest -gdb tcp::9976 -S
$ # In terminal 2:
MYPLATFORM-gdb -x coverage.gdb
...and that's it - I was able to generate the .gcda files in my local filesystem, and then see coverage results over gcov and lcov.
UPDATE: I wrote a blog post showing the process in detail.

Linux ioctl return value interpreted by who?

I'm working with a custom kernel char device which sometimes returns large negative values (around the thousands, say -2000) for its ioctl().
In userspace, I don't get these values returned from the ioctl call. Instead I get a return value of -1 back with errno set to the negated value from the kernel module (+2000).
As far as I can read and google, __syscall_return() is the macro which is supposed to interpret negative return values as errors. But, it only seems to look for values between -1 and -125. So I didn't expect these large negative values to be translated.
Where are these return values translated? Is it expected behaviour?
I am on Linux 2.6.35.10 with EGLIBC 2.11.3-4+deb6u6.
The translation and move to errno occur on the libc level. Both Gnu libc and μClibc treat negative numbers down to at least -4095 as error conditions, per http://www.makelinux.net/ldd3/chp-6-sect-1
See https://github.molgen.mpg.de/git-mirror/glibc/blob/85b290451e4d3ab460a57f1c5966c5827ca807ca/sysdeps/unix/sysv/linux/aarch64/ioctl.S for the Gnu libc implementation of ioctl.
So, with the help of BRPocock I will report my findings here.
The linux kernel will do a error check for all syscalls along the lines of (from unistd.h):
#define __syscall_return(type, res) \
do { \
if ((unsigned long)(res) >= (unsigned long)(-125)) { \
errno = -(res); \
res = -1; \
} \
return (type) (res); \
} while (0)
Libc will also do an error check for all syscalls along the lines of (from syscall.S):
.text
ENTRY (syscall)
PUSHARGS_6 /* Save register contents. */
_DOARGS_6(44) /* Load arguments. */
movl 20(%esp), %eax /* Load syscall number into %eax. */
ENTER_KERNEL /* Do the system call. */
POPARGS_6 /* Restore register contents. */
cmpl $-4095, %eax /* Check %eax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */
ret /* Return to caller. */
PSEUDO_END (syscall)
Glibc gives a reason for the 4096 value (from sysdep.h):
/* Linux uses a negative return value to indicate syscall errors,
unlike most Unices, which use the condition codes' carry flag.
Since version 2.1 the return value of a system call might be
negative even if the call succeeded. E.g., the `lseek' system call
might return a large offset. Therefore we must not anymore test
for < 0, but test for a real error by making sure the value in %eax
is a real error number. Linus said he will make sure the no syscall
returns a value in -1 .. -4095 as a valid result so we can savely
test with -4095. */
__syscall_return seems to be missing from newer kernels, I haven't researched that yet.

PhantomJS: exported PDF to stdout

Is there a way to trigger the PDF export feature in PhantomJS without specifying an output file with the .pdf extension? We'd like to use stdout to output the PDF.
You can output directly to stdout without a need for a temporary file.
page.render('/dev/stdout', { format: 'pdf' });
See here for history on when this was added.
If you want to get HTML from stdin and output the PDF to stdout, see here
Sorry for the extremely long answer; I have a feeling that I'll need to refer to this method several dozen times in my life, so I'll write "one answer to rule them all". I'll first babble a little about files, file descriptors, (named) pipes, and output redirection, and then answer your question.
Consider this simple C99 program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
if (argc < 2) {
printf("Usage: %s file_name\n", argv[0]);
return 1;
}
FILE* file = fopen(argv[1], "w");
if (!file) {
printf("No such file: %s\n", argv[1]);
return 2;
}
fprintf(file, "some text...");
fclose(file);
return 0;
}
Very straightforward. It takes an argument (a file name) and prints some text into it. Couldn't be any simpler.
Compile it with clang write_to_file.c -o write_to_file.o or gcc write_to_file.c -o write_to_file.o.
Now, run ./write_to_file.o some_file (which prints into some_file). Then run cat some_file. The result, as expected, is some text...
Now let's get more fancy. Type (./write_to_file.o /dev/stdout) > some_file in the terminal. We're asking the program to write to its standard output (instead of a regular file), and then we're redirecting that stdout to some_file (using > some_file). We could've used any of the following to achieve this:
(./write_to_file.o /dev/stdout) > some_file, which means "use stdout"
(./write_to_file.o /dev/stderr) 2> some_file, which means "use stderr, and redirect it using 2>"
(./write_to_file.o /dev/fd/2) 2> some_file, which is the same as above; stderr is the third file descriptor assigned to Unix processes by default (after stdin and stdout)
(./write_to_file.o /dev/fd/5) 5> some_file, which means "use your sixth file descriptor, and redirect it to some_file"
In case it's not clear, we're using a Unix pipe instead of an actual file (everything is a file in Unix after all). We can do all sort of fancy things with this pipe: write it to a file, or write it to a named pipe and share it between different processes.
Now, let's create a named pipe:
mkfifo my_pipe
If you type ls -l now, you'll see:
total 32
prw-r--r-- 1 pooriaazimi staff 0 Jul 15 09:12 my_pipe
-rw-r--r-- 1 pooriaazimi staff 336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x 1 pooriaazimi staff 8832 Jul 15 08:34 write_to_file.o
Note the p at the beginning of second line. It means that my_pipe is a (named) pipe.
Now, let's specify what we want to do with our pipe:
gzip -c < my_pipe > out.gz &
It means: gzip what I put inside my_pipe and write the results in out.gz. The & at the end asks the shell to run this command in the background. You'll get something like [1] 10449 and the control gets back to the terminal.
Then, simply redirect the output of our C program to this pipe:
(./write_to_file.o /dev/fd/5) 5> my_pipe
Or
./write_to_file.o my_pipe
You'll get
[1]+ Done gzip -c < my_pipe > out.gz
which means the gzip command has finished.
Now, do another ls -l:
total 40
prw-r--r-- 1 pooriaazimi staff 0 Jul 15 09:14 my_pipe
-rw-r--r-- 1 pooriaazimi staff 32 Jul 15 09:14 out.gz
-rw-r--r-- 1 pooriaazimi staff 336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x 1 pooriaazimi staff 8832 Jul 15 08:34 write_to_file.o
We've successfully gziped our text!
Execute gzip -d out.gz to decompress this gziped file. It will be deleted and a new file (out) will be created. cat out gets us:
some text...
which is what we expected.
Don't forget to remove the pipe with rm my_pipe!
Now back to PhantomJS.
This is a simple PhantomJS script (render.coffee, written in CoffeeScript) that takes two arguments: a URL and a file name. It loads the URL, renders it and writes it to the given file name:
system = require 'system'
renderUrlToFile = (url, file, callback) ->
page = require('webpage').create()
page.viewportSize = { width: 1024, height : 800 }
page.settings.userAgent = 'Phantom.js bot'
page.open url, (status) ->
if status isnt 'success'
console.log "Unable to render '#{url}'"
else
page.render file
delete page
callback url, file
url = system.args[1]
file_name = system.args[2]
console.log "Will render to #{file_name}"
renderUrlToFile "http://#{url}", file_name, (url, file) ->
console.log "Rendered '#{url}' to '#{file}'"
phantom.exit()
Now type phantomjs render.coffee news.ycombinator.com hn.png in the terminal to render Hacker News front page into file hn.png. It works as expected. So does phantomjs render.coffee news.ycombinator.com hn.pdf.
Let's repeat what we did earlier with our C program:
(phantomjs render.coffee news.ycombinator.com /dev/fd/5) 5> hn.pdf
It doesn't work... :( Why? Because, as stated on PhantomJS's manual:
render(fileName)
Renders the web page to an image buffer and save it
as the specified file.
Currently the output format is automatically set based on the file
extension. Supported formats are PNG, JPEG, and PDF.
It fails, simply because neither /dev/fd/2 nor /dev/stdout end in .PNG, etc.
But no fear, named pipes can help you!
Create another named pipe, but this time use the extension .pdf:
mkfifo my_pipe.pdf
Now, tell it to simply cat its inout to hn.pdf:
cat < my_pipe.pdf > hn.pdf &
Then run:
phantomjs render.coffee news.ycombinator.com my_pipe.pdf
And behold the beautiful hn.pdf!
Obviously you want to do something more sophisticated that just cating the output, but I'm sure it's clear now what you should do :)
TL;DR:
Create a named pipe, using ".pdf" file extension (so it fools PhantomJS to think it's a PDF file):
mkfifo my_pipe.pdf
Do whatever you want to do with the contents of the file, like:
cat < my_pipe.pdf > hn.pdf
which simply cats it to hn.pdf
In PhantomJS, render to this file/pipe.
Later on, you should remove the pipe:
rm my_pipe.pdf
As pointed out by Niko you can use renderBase64() to render the web page to an image buffer and return the result as a base64-encoded string.But for now this will only work for PNG, JPEG and GIF.
To write something from a phantomjs script to stdout just use the filesystem API.
I use something like this for images :
var base64image = page.renderBase64('PNG');
var fs = require("fs");
fs.write("/dev/stdout", base64image, "w");
I don't know if the PDF format for renderBase64() will be in a future version of phanthomjs but as a workaround something along these lines may work for you:
page.render(output);
var fs = require("fs");
var pdf = fs.read(output);
fs.write("/dev/stdout", pdf, "w");
fs.remove(output);
Where output is the path to the pdf file.
I don't know if it would address your problem, but you may also check the new renderBase64() method added to PhantomJS 1.6: https://github.com/ariya/phantomjs/blob/master/src/webpage.cpp#L623
Unfortunately, the feature is not documented on the wiki yet :/