Does AVRO schema also get encoded in the binary part? - serialization

An Avro file contains the schema in plain text followed by the data in binary format. I'd like to know whether the schema (or some part of it) also exists in the binary part? I got a hunch that the schema (or just the field names) also get coded in the binary part because when I make some changes in the plain schema part of an AVRO file I get an error message when exporting the schema using the Avro-tool.jar .

When the binary encoding is used, the whole file is using a binary format.
The file starts with a 4 bytes header, then a map containing some metadata immediately follows. This map contains a "avro.schema" entry. The value of this entry is the schema stored as a string. After the map you will find your data.
If you edit the schema manually, read change its size, then length prefix stored just before this string will be incoherent and the file is corrupted.
See Binary encoding specification to learn how various types are binary encoded.
I'm not sure what you are trying to achieve, and I quite sure that is should not be done. But for fun, lets try to edit the schema in place.
For this example I will use the weather.avro file from the avro's source tree:
$ java -jar avro-tools-1.8.0.jar getmeta weather-orig.avro
avro.codec null
avro.schema {"type":"record","name":"Weather","namespace":"test","fields":[{"name":"station","type":"string"},{"name":"time","type":"long"},{"name":"temp","type":"int"}],"doc":"A weather reading."}
$ java -jar avro-tools-1.8.0.jar getschema weather-orig.avro
{
"type" : "record", "name" : "Weather", "namespace" : "test", "doc" : "A weather reading.",
"fields" : [
{"name" : "station", "type" : "string"},
{"name" : "time", "type" : "long"},
{"name" : "temp", "type" : "int"}
]
}
$ java -jar /avro-tools-1.8.0.jar tojson weather-orig.avro
{"station":"011990-99999","time":-619524000000,"temp":0}
{"station":"011990-99999","time":-619506000000,"temp":22}
{"station":"011990-99999","time":-619484400000,"temp":-11}
{"station":"012650-99999","time":-655531200000,"temp":111}
{"station":"012650-99999","time":-655509600000,"temp":78}
OK. This is our source file. Plain simple, two metadata entries and the schema defines three fields. Now, we will try to understand how things are stored in binary and how we can edit the file to change the rename station int station-id.
$ hexdump weather-orig.avro -n 256 -C
00000000 4f 62 6a 01 04 14 61 76 72 6f 2e 63 6f 64 65 63 |Obj...avro.codec|
00000010 08 6e 75 6c 6c 16 61 76 72 6f 2e 73 63 68 65 6d |.null.avro.schem|
00000020 61 f2 02 7b 22 74 79 70 65 22 3a 22 72 65 63 6f |a..{"type":"reco|
00000030 72 64 22 2c 22 6e 61 6d 65 22 3a 22 57 65 61 74 |rd","name":"Weat|
00000040 68 65 72 22 2c 22 6e 61 6d 65 73 70 61 63 65 22 |her","namespace"|
00000050 3a 22 74 65 73 74 22 2c 22 66 69 65 6c 64 73 22 |:"test","fields"|
00000060 3a 5b 7b 22 6e 61 6d 65 22 3a 22 73 74 61 74 69 |:[{"name":"stati|
00000070 6f 6e 22 2c 22 74 79 70 65 22 3a 22 73 74 72 69 |on","type":"stri|
00000080 6e 67 22 7d 2c 7b 22 6e 61 6d 65 22 3a 22 74 69 |ng"},{"name":"ti|
00000090 6d 65 22 2c 22 74 79 70 65 22 3a 22 6c 6f 6e 67 |me","type":"long|
000000a0 22 7d 2c 7b 22 6e 61 6d 65 22 3a 22 74 65 6d 70 |"},{"name":"temp|
000000b0 22 2c 22 74 79 70 65 22 3a 22 69 6e 74 22 7d 5d |","type":"int"}]|
000000c0 2c 22 64 6f 63 22 3a 22 41 20 77 65 61 74 68 65 |,"doc":"A weathe|
000000d0 72 20 72 65 61 64 69 6e 67 2e 22 7d 00 b0 81 b3 |r reading."}....|
000000e0 c4 0a 0c f6 62 fa c9 38 fd 7e 52 00 a7 0a cc 01 |....b..8.~R.....|
000000f0 18 30 31 31 39 39 30 2d 39 39 39 39 39 ff a3 90 |.011990-99999...|
First four bytes, 4f 62 6a 01, are the header
The next thing is a long describing the size of the first block of the "metadata" map. long are encoded using variable-length zig-zag coding, so here 04 means 2 which coherent with the output of getmeta. (remember to read the Avro specification to know how various data type are encoded)
Just after you will find the first key of the map. A key is a string and a string is prefixed by its length in bytes. Here 0x14 means 10 bytes which is the length of "avro.codec" when encoded in UTF-8.
You can then skip the next 10 bytes and go to the next element. Etc. You can advance until you spot the avro.schema part.
Just after this string is the length of the map value (which is a string since it is our schema). That is what you want to modify. We are renaming station into station-id so you want to add 3 to the current length, so f2 02 should now be f8 02 (remember variable length zig zag coding ?).
You can now update the schema string to add "-id"
Enjoy
java -jar /home/cmathieu/Sources/avro-trunk/lang/java/tools/target/avro-tools-1.8.0-SNAPSHOT.jar tojson weather.avro
{"station-id":"011990-99999","time":-619524000000,"temp":0}
{"station-id":"011990-99999","time":-619506000000,"temp":22}
{"station-id":"011990-99999","time":-619484400000,"temp":-11}
{"station-id":"012650-99999","time":-655531200000,"temp":111}
{"station-id":"012650-99999","time":-655509600000,"temp":78}
But as I said, you most likely don't want to do that.

Related

Computer paths in compiled binaries

I am developing a project in Platformio using the mbed framework, and I find my computer paths in the compiled binary, e.g.:
00010440h: 9E 46 70 47 43 3A 5C 55 73 65 72 73 5C 47 47 47 ; žFpGC:\Users\Ggg
00010450h: 5C 2E 70 6C 61 74 66 6F 72 6D 69 6F 5C 70 61 63 ; \.platformio\pac
00010460h: 6B 61 67 65 73 5C 66 72 61 6D 65 77 6F 72 6B 2D ; kages\framework-
00010470h: 6D 62 65 64 2F 70 6C 61 74 66 6F 72 6D 2F 53 69 ; mbed/platform/Si
00010480h: 6E 67 6C 65 74 6F 6E 50 74 72 2E 68 00 70 20 3D ; ngletonPtr.h.p =
00010490h: 3D 20 72 65 69 6E 74 65 72 70 72 65 74 5F 63 61 ; = reinterpret_ca
000104a0h: 73 74 3C 54 20 2A 3E 28 26 5F 64 61 74 61 29 00 ; st<T *>(&_data).
I could find the error strings in the mentioned files, and it points to .platformio\packages\framework-mbed\platform\mbed_error.h. This file says
/** Define this macro to include filenames in error context. For release builds, do not include filename to save memory.
* MBED_PLATFORM_CONF_ERROR_FILENAME_CAPTURE_ENABLED
*/
So I tried to compile in release mode, but nothing changes.
I also tried put this on top of my main.cpp:
#undef MBED_PLATFORM_CONF_ERROR_FILENAME_CAPTURE_ENABLED
#undef MBED_CONF_PLATFORM_ERROR_FILENAME_CAPTURE_ENABLED
notice that there are 2 similar defines, I found the other one in an auto-generated (so non-editable) mbed_config.h in my build directory, and I'm confident that's the right one because I could find it's used in the mbed_error.h (so the comment on top of that file is wrong).
I tried finding that define in all the paths I could think of, but I can't find the code that auto-generates the mbed_config.h file.
I also tried
#define MBED_CONF_PLATFORM_MAX_ERROR_FILENAME_LEN 1
but it still shows the full path.
How can I compile my binaries without showing the full path of my files?

SQLCMD command runs in CMD, but not as BAT

When in a command prompt, even without admin access, I can run:
sqlcmd -S .\SQLEXPRESS01 –E -Q "EXEC sp_BackupDatabases #backupLocation='C:\SQLBackups\full\', #backupType='F'"
and it runs no problem, but when I try to run it through a batch file, I get the following error:
Sqlcmd: 'ûE': Unexpected argument. Enter '-?' for help.
I have created the script, ran it both with and without admin rights, and have done around 3 hours worth of Googling and haven't been able to find a solution that works. I have tried various permutations of with and without quotes around assorted parts, and nothing takes.
I am trying to get this to run as a automated script, so I need to make sure that I can just tell Windows to run this and it will go through.
Whatever tool you used to create the batch file changed one of your hyphens:
sqlcmd -S .\SQLEXPRESS01 –E -Q "EXEC sp_BackupDatabases #backupLocation='C:\SQLBackups\full\', #backupType='F'"
If you dump out this line in a hex editor:
00000000 65 63 68 6F 20 73 71 6C 63 6D 64 20 2D 53 20 2E echo sqlcmd -S .
00000010 5C 53 51 4C 45 58 50 52 45 53 53 30 31 20 96 45 \SQLEXPRESS01 .E
00000020 20 2D 51 20 22 45 58 45 43 20 73 70 5F 42 61 63 -Q "EXEC sp_Bac
00000030 6B 75 70 44 61 74 61 62 61 73 65 73 20 40 62 61 kupDatabases #ba
00000040 63 6B 75 70 4C 6F 63 61 74 69 6F 6E 3D 27 43 3A ckupLocation='C:
00000050 5C 53 51 4C 42 61 63 6B 75 70 73 5C 66 75 6C 6C \SQLBackups\full
00000060 5C 27 2C 20 40 62 61 63 6B 75 70 54 79 70 65 3D \', #backupType=
00000070 27 46 27 22 0D 0A 'F'"..
You'll note the character just after SQLEXPRESS01 isn't a normal hyphen, but character 0x96. Change it to a normal hyphen and your script should work.

POST over SSL returns HTTP 400 - Bad Request

We are having troubles with sending a 'large' SOAP request to one of our sources over SSL. When we send the same request, but with less data in it, it works without any problems. The small file is 10kb, the larger file is 30kb. The SOAP requests are send from OSB (11.1.1.4) in Weblogic (10.3.4).
Our source has checked what happens in their proxy and they see that the proxy waits for a part of the message, but never receives it:
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][4] Input filter: Reading request body.
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][9] Input filter: Bucket type HEAP contains 1 bytes.
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][9] Input filter: Bucket type HEAP contains 8000 bytes.
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][9] Input filter: Bucket type HEAP contains 192 bytes.
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][9] Input filter: Bucket type HEAP contains 534 bytes.
[14/Sep/2016:11:04:56 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][9] Input filter: Bucket type HEAP contains 7376 bytes.
[14/Sep/2016:11:05:38 +0200] [someurl.something.com/sid#7fcdebef0ae8][rid#7fcdebe080a0][/cxf/someServiceService-01-01][4] Error reading request body: The timeout specified has expired
We have HTTP dump on our server and from the logging I can see that the SSL handshaking is ok, we start sending our message, but it stops before it's completly done.
Padded plaintext before ENCRYPTION: len = 328
0000: 50 4F 53 54 20 2F 63 78 00 00 00 00 00 00 00 00 POST /cxf/
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 some
0020: 53 65 72 76 69 63 65 2D 30 31 2D 30 31 20 48 54 Service-01-01 HT
0030: 54 50 2F 31 2E 31 0D 0A 43 6F 6E 74 65 6E 74 2D TP/1.1..Content-
0040: 54 79 70 65 3A 20 74 65 78 74 2F 78 6D 6C 3B 20 Type: text/xml;
0050: 63 68 61 72 73 65 74 3D 75 74 66 2D 38 0D 0A 53 charset=utf-8..S
0060: 4F 41 50 41 63 74 69 6F 6E 3A 20 22 42 65 77 61 OAPAction: "
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 someAction"..
0080: 55 73 65 72 2D 41 67 65 6E 74 3A 20 4A 61 76 61 User-Agent: Java
0090: 31 2E 36 2E 30 5F 33 31 0D 0A 48 6F 73 74 3A 20 1.6.0_31..Host:
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 some.host
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00C0: 6E 65 74 2E 62 65 0D 0A 41 63 63 65 70 74 3A 20 ..Accept:
00D0: 74 65 78 74 2F 68 74 6D 6C 2C 20 69 6D 61 67 65 text/html, image
00E0: 2F 67 69 66 2C 20 69 6D 61 67 65 2F 6A 70 65 67 /gif, image/jpeg
00F0: 2C 20 2A 2F 2A 3B 20 71 3D 2E 32 0D 0A 43 6F 6E , */*; q=.2..Con
0100: 6E 65 63 74 69 6F 6E 3A 20 4B 65 65 70 2D 41 6C nection: Keep-Al
0110: 69 76 65 0D 0A 43 6F 6E 74 65 6E 74 2D 4C 65 6E ive..Content-Len
0120: 67 74 68 3A 20 34 32 35 39 36 0D 0A 0D 0A CB F2 gth: 42596......
0130: DB 39 D1 16 D4 4C D3 05 BB 08 3C 2B A0 1E 39 BF .9...L....<+..9.
0140: A9 15 05 05 05 05 05 05 ........
Padded plaintext before ENCRYPTION: len = 16128
0000: 3F 78 6D 6C 20 76 65 72 73 69 6F 6E 3D 22 31 2E ?xml version="1.
0010: 30 22 20 65 6E 63 6F 64 69 6E 67 3D 22 55 54 46 0" encoding="UTF
0020: 2D 38 22 3F 3E 0A 3C 73 6F 61 70 65 6E 76 3A 45 -8"?>.<soapenv:E
0030: 6E 76 65 6C 6F 70 65 20 78 6D 6C 6E 73 3A 73 6F nvelope xmlns:so
0040: 61 70 65 6E 76 3D 22 68 74 74 70 3A 2F 2F 73 63 apenv="http://sc
... and so forth
This ends before I've seen the complete SOAP request pass through. And we receive:
Padded plaintext after DECRYPTION: len = 328
0000: 48 54 54 50 2F 31 2E 31 20 34 30 30 20 42 61 64 HTTP/1.1 400 Bad
0010: 20 52 65 71 75 65 73 74 0D 0A 44 61 74 65 3A 20 Request..Date:
0020: 4D 6F 6E 2C 20 31 30 20 4F 63 74 20 32 30 31 36 Mon, 10 Oct 2016
0030: 20 30 37 3A 35 32 3A 30 37 20 47 4D 54 0D 0A 53 07:52:07 GMT..S
0040: 65 72 76 65 72 3A 20 41 70 61 63 68 65 0D 0A 53 erver: Apache..S
0050: 74 72 69 63 74 2D 54 72 61 6E 73 70 6F 72 74 2D trict-Transport-
0060: 53 65 63 75 72 69 74 79 3A 20 6D 61 78 2D 61 67 Security: max-ag
0070: 65 3D 33 31 35 33 36 30 30 30 3B 20 69 6E 63 6C e=31536000; incl
0080: 75 64 65 53 75 62 44 6F 6D 61 69 6E 73 0D 0A 4C udeSubDomains..L
0090: 61 73 74 2D 4D 6F 64 69 66 69 65 64 3A 20 54 75 ast-Modified: Tu
00A0: 65 2C 20 30 33 20 4D 61 72 20 32 30 31 35 20 31 e, 03 Mar 2015 1
00B0: 32 3A 32 35 3A 32 31 20 47 4D 54 0D 0A 45 54 61 2:25:21 GMT..ETa
00C0: 67 3A 20 22 62 65 38 2D 35 31 30 36 31 36 64 61 g: "be8-510616da
00D0: 38 63 62 31 64 22 0D 0A 41 63 63 65 70 74 2D 52 8cb1d"..Accept-R
00E0: 61 6E 67 65 73 3A 20 62 79 74 65 73 0D 0A 43 6F anges: bytes..Co
00F0: 6E 74 65 6E 74 2D 4C 65 6E 67 74 68 3A 20 33 30 ntent-Length: 30
0100: 34 38 0D 0A 43 6F 6E 6E 65 63 74 69 6F 6E 3A 20 48..Connection:
0110: 63 6C 6F 73 65 0D 0A 43 6F 6E 74 65 6E 74 2D 54 close..Content-T
0120: 79 70 65 3A 20 74 65 78 74 2F 68 74 6D 6C 0D 0A ype: text/html..
0130: 0D 0A 7E 01 14 86 D8 1F DA 05 97 49 26 2B 2F 65 ...........I&+/e
0140: DB 5E ED 05 F2 AA 01 01 .^......
Our server team has already tried increasing the timeouts in weblogic under Servers - Protocols - HTTP by a factor of 10, but with no success.
Increasing any possible timeout settings in OSB didn't help.
We are pretty sure it is a Weblogic (maybe OSB) issue as sending the larger request from the same server with curl doesn't give any problems.
Our Development environment does NOT have this issue. The problem is, we currently have an upgraded Weblogic (10.3.6)/OSB (11.1.1.7) installed there for an upcoming upgrade of the software on other environments. Same configuration though.
Any insight on what could be going wrong or what we could try would be helpfull. Let me know if you need any additional information.

Redis mass insertion ghost entries?

This is the hexdump of the file I pipe into redis via a nodejs converter:
me#myself ~/scripts $ cat example.txt | node redisProtocol.js | hexdump -C
00000000 2a 39 0d 0a 24 34 0d 0a 53 41 44 44 0d 0a 24 37 |*9..$4..SADD..$7|
00000010 0d 0a 64 6f 6d 61 69 6e 73 0d 0a 24 31 34 0d 0a |..domains..$14..|
00000020 77 77 77 2e 72 65 64 64 69 74 2e 63 6f 6d 0d 0a |www.reddit.com..|
00000030 24 34 0d 0a 53 41 44 44 0d 0a 24 37 0d 0a 64 6f |$4..SADD..$7..do|
00000040 6d 61 69 6e 73 0d 0a 24 31 33 0d 0a 77 77 77 2e |mains..$13..www.|
00000050 34 63 68 61 6e 2e 6f 72 67 0d 0a 24 34 0d 0a 53 |4chan.org..$4..S|
00000060 41 44 44 0d 0a 24 37 0d 0a 64 6f 6d 61 69 6e 73 |ADD..$7..domains|
00000070 0d 0a 24 31 36 0d 0a 77 77 77 2e 66 61 63 65 62 |..$16..www.faceb|
00000080 6f 6f 6b 2e 63 6f 6d 0d 0a |ook.com..|
00000089
when piping to redis-cli --pipe I get:
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1
Which is good.
Now looking into the redis DB executing smembers domains I get:
redis 127.0.0.1:6379> smembers domains
1) "domains"
2) "SADD"
3) "www.reddit.com"
4) "www.4chan.org"
5) "www.facebook.com"
Where do the additional entries "domains" and "SADD" come from? The hexdump looks good, doesn't it?
Using redis version redis-cli 2.6.7. Thanks a lot for any help provided.
Best,
Alex
No, it does not look good. I suppose you expect the hexdump to represent:
SADD domains www.reddit.com
SADD domains www.4chan.org
SADD domains www.facebook.com
However, it starts by '*9' which means Redis expects a command with nine parameters. So Redis processes:
SADD domains www.reddit.com SADD domains www.4chan.org SADD domains www.facebook.com
which gives the result you had.
You need either to execute 3 commands in your stream, each of them starting with *3, or just one command containing:
SADD domains www.reddit.com www.4chan.org www.facebook.com
starting with *5

Viewing body of POST request using curl

I'm making an HTTP POST request using curl like this:
curl -v -i -d 'var=something' 'http://mysite.com/whatever.json'
How can I see the body of the outgoing request?
Motivation: I'm debugging an Objective-C HTTP library. POST requests that are failing in this library are working from the Terminal using curl. I think comparing the bodies of the requests will help debug.
You can use --trace or --trace-ascii to get an traffic dump:
curl --trace - -d 'var=something' 'http://mysite.com/whatever.json'
[...]
=> Send header, 246 bytes (0xf6)
0000: 50 4f 53 54 20 2f 77 68 61 74 65 76 65 72 2e 6a POST /whatever.j
0010: 73 6f 6e 20 48 54 54 50 2f 31 2e 31 0d 0a 55 73 son HTTP/1.1..Us
0020: 65 72 2d 41 67 65 6e 74 3a 20 63 75 72 6c 2f 37 er-Agent: curl/7
0030: 2e 32 31 2e 36 20 28 78 38 36 5f 36 34 2d 70 63 .21.6 (x86_64-pc
0040: 2d 6c 69 6e 75 78 2d 67 6e 75 29 20 6c 69 62 63 -linux-gnu) libc
0050: 75 72 6c 2f 37 2e 32 31 2e 36 20 4f 70 65 6e 53 url/7.21.6 OpenS
0060: 53 4c 2f 31 2e 30 2e 30 65 20 7a 6c 69 62 2f 31 SL/1.0.0e zlib/1
0070: 2e 32 2e 33 2e 34 20 6c 69 62 69 64 6e 2f 31 2e .2.3.4 libidn/1.
0080: 32 32 20 6c 69 62 72 74 6d 70 2f 32 2e 33 0d 0a 22 librtmp/2.3..
0090: 48 6f 73 74 3a 20 6d 79 73 69 74 65 2e 63 6f 6d Host: mysite.com
00a0: 0d 0a 41 63 63 65 70 74 3a 20 2a 2f 2a 0d 0a 43 ..Accept: */*..C
00b0: 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a 20 31 ontent-Length: 1
00c0: 33 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 3..Content-Type:
00d0: 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 78 2d 77 application/x-w
00e0: 77 77 2d 66 6f 72 6d 2d 75 72 6c 65 6e 63 6f 64 ww-form-urlencod
00f0: 65 64 0d 0a 0d 0a ed....
=> Send data, 13 bytes (0xd)
0000: 76 61 72 3d 73 6f 6d 65 74 68 69 6e 67 var=something
[...]
--trace-ascii has less detail:
curl --trace-ascii - -d 'var=something' 'http://mysite.com/whatever.json'
[...]
=> Send header, 246 bytes (0xf6)
0000: POST /whatever.json HTTP/1.1
001e: User-Agent: curl/7.21.6 (x86_64-pc-linux-gnu) libcurl/7.21.6 Ope
005e: nSSL/1.0.0e zlib/1.2.3.4 libidn/1.22 librtmp/2.3
0090: Host: mysite.com
00a2: Accept: */*
00af: Content-Length: 13
00c3: Content-Type: application/x-www-form-urlencoded
00f4:
=> Send data, 13 bytes (0xd)
0000: var=something
[...]
You won't get friendly parsing, but you could always use tcpdump or Wireshark to see exactly what's being sent back and forth.
If you have it available, setting up netcat to listen on a port and then pointing both curl and your failing requests at that host/port would also work.