wcf serialization over http and nettcp binding [duplicate] - wcf

I am wondering what the differences are between binary and text based protocols.
I read that binary protocols are more compacts/faster to process.
How does that work out? Since you have to send the same amount of data? No?
E.g how would the string "hello" differ in size in binary format?

If all you are doing is transmitting text, then yes, the difference between the two isn't very significant. But consider trying to transmit things like:
Numbers - do you use a string representation of a number, or the binary? Especially for large numbers, the binary will be more compact.
Data Structures - How do you denote the beginning and ending of a field in a text protocol? Sometimes a binary protocol with fixed length fields is more compact.

Text protocols are better in terms of readability, ease of reimplementing, and ease of debugging. Binary protocols are more compact.
However, you can compress your text using a library like LZO or Zlib, and this is almost as compact as binary (with very little performance hit for compression/decompression.)
You can read more info on the subject here:
http://www.faqs.org/docs/artu/ch05s01.html

binary protocols are better if you are using control bits/bytes
i.e instead of sending msg:Hello
in binary it can be 0x01 followed by your message (assuming 0x01 is a control byte which stands for msg)
So, since in text protocol you send msg:hello\0 ...it involves 10 bytes
where as in binary protocol it would be 0x01Hello\0 ...this involves 7 bytes
And another example, suppose you want to send a number say 255, in text its 3 bytes
where as in binary its 1 byte i.e 0xFF

The string "hello" itself wouldn't differ in size. The size/performance difference is in the additional information that Serialization introduces (Serialization is how the program represents the data to be transferred so that it can be re-construted once it gets to the other end of the pipe).
For example, when serializing the following in .NET using XML (one of the text serialization methods):
string helloWorld = "Hello World!";
You might get something like (I know this isn't exact):
<helloWorld type="String">Hello World!</helloWorld>
Whereas Binary Serialization would be able to represent that data natively in binary without all the extra markup.

You need to be clear as to what is part of the protocol and what is part of the data.
Text protocols can send binary data and binary protocols can send text data.
The protocol is the part of the message the states "Hi can I connect? I've got some data, where should I put it?, You've got a reply for me? great! thanks, bye!"
Each bit of the conversion is (probably) much smaller in a binary protocol, Take HTTP for example (which is text based):
if you had an encoding standard I bet you could come up with sequence of characters smaller that the 4 Bytes needed for the word 'PUSH'

Some say that binary protocols are more secure, like, for example, Mike Hearn in What should follow the web?.

I wouldn't say that binary formats are more faster to process. If you have a look at CSV or fixed-field-length textual format - it is still can be processed fast.
I would say, everything depends on who is the consumer. If the human being is at the end (like for HTTP or RSS), then there is no need to somehow compact the data, except maybe compressing it.
Binary protocols need parsers/convertors, difficult to extend and keep the backward compatibility. The higher you go in protocol stack, the more human-oriented protocols are (TCP is binary, as packets have to be processed by routers at high speed, but XML is more human-friendly).
I think, size variations does not matter today a lot. For your example, hello will take the same amount in binary format as in text format, because text format is also "binary" for the computer - only the way we interprete the data matters.

Related

Serialize unity3d C# objects to string and back

Which one of the two is recommended approach given my server API is expecting a C# string? Which one will result in lowest string length?
1) Protobuf-net
Using protobuf-net to convert object <-> byte array
Use Convert.ToBase64String methods for converting byte array <-> string
2) Use Json .Net directly to convert object <-> string
We have Protobuf-net working in our project with byte[] server APIs. Now our server is migrating to string APIs instead of byte[]. We are not sure whether we should move to Json .Net or stay with protobuf-net and use Convert Base 64 for extra string to byte[] conversion.
What do you suggest?
Okay, so this is my thought process which I'm hoping can help you decide between the two:
Before deciding which one is better we need to have a better grasp of the context of the problem. Optimization is always something that has to be done under well defined "fitness" parameters.
What I mean by this is:
If you're most constrained by CPU usage, I would test to see which code uses more CPU to execute.
If bandwidth is an issue, you'd want to look at the method that sends the smallest packets. (In which case base64 of binary serialization should be the answer.)
If code readability is a factor, you should probably look at which code is easier to read / understand while taking less text to write. (In which case I suspect that the JSON route will have better readability)
In general, I would caution against over-optimization. Mainly because you might spend more time thinking and comparing than would be lost by your "unoptimized" code :)
That is to say, only optimize when you can clearly define your bottle-neck.
Hope this helped :)

Identify compression method used on blob/binary data

I have some binary data (blobs) from a database, and I need to know what compression method was used on them to decompress them.
How do I determine what method of compression that has been used?
Actually it is easier. Assume one of the standard methods was used, there possibly are some magic bytes at the beginning. I suggest taiking the hex values of the first 3-4 bytes and asking google.
It makes no sense to develop your own compressions, so... unless the case was special, or the programmer stupid, he used one of the well known compression methods. YOu could also take libraires of the most popular ones and just try what they say.
The only way to do this, in general, would be to store which compression method was used when you store the BLOB.
Starting from the blob in db you can do the following:
Store in file
For my use case I used DBeaver to export multiple blobs to separate files.
Find out more about the magic numbers from the file by doing
file -i filename
In my case the files are application/zlib; charset=binary.

Making a file format extensible

I'm writing a particular serialisation system. The first version works well. It's a hierarchial string-key, data-value system. So to get a particular value, you navigate to a particular node and say getInt("some key") etc. etc.
My issue with the current system is that the file size gets quite large very quickly.
I'm going to combat this by adding a string table. The issue with this is that I can't think of a way to support the old system. All I have is a file identifier which is 32 bits long.
I can change the file identifier, but everytime I make another change to the format, I'll need to change the identifier again.
What's an elegant way to implement new features while still supporting the old features?
I've studied the PNG format and creating chunks seems like a good way to go.
Is there any other advice you can give me on chunk dependencies and so forth?
If you need a binary format, look at Protocol Buffers, which Google uses internally for RPCs as well as long-term serialization of records. Each field of a protocol buffer is identified by an integer ID. Old applications ignore (and pass through) the fields that they don't understand, so you can safely add new fields. You never reuse deprecated field IDs or change the type of a field.
Protocol buffers support primitive types (bool, int32, int64, string, byte arrays) as well as repeated and even recursively nested messages. Unfortunately they don't support maps, so you have to turn a map into a list of (key, value).
Don't spend all your time fretting about serialization and deserialization. It's not as fun as designing protobufs.

Howto display or view encrypted data in encrypted form?

In the Wikipedia Article on Block Cipher Modes they have a neat little diagram of an
unencrypted image, the same image encrypted using ECB mode and another version of the same image encrypted using another method.
At university I have developed my own implementation of DES (you can find it here) and we must demo our implementation in a presentation.
I would like to display a similar example as shown above using our implementation. However most image files have header blocks associated with them, which when encrypting the file with our implementation, also get encrypted. So when you go to open them in an image viewer, they are assumed to be corrupted and can't be viewed.
I was wondering if anybody new of a simple header-less image format which we could use to display these? Or if anyone had any idea's as to how the original creator of the images above achieved the above result?
Any help would be appreciated,
Thanks
Note: I realise rolling your own cryptography library is stupid, and DES is considered broken, and ECB mode is very flawed for any useful cryptography, this was purely an academic exercise for school. So please, no lectures, I know the drill.
If you are using a high-level language, like Java, python, etc, one thing you could do is load an image and read the pixel data into an array in memory. Then perform the encryption on those raw bytes, then save the image when you are done. Let all of the header data be handled by the libraries of whatever language you are using. In other words, don't treat the file as a raw sequence of bytes. Hope that helps.
Just cut off the headers before you encrypt (save them somewhere). Then encrypt only the rest. Then add the headers in front of the result.
This is especially easy with the Netpbm format, because you only have to know, how many lines to cut off. The data is stored as decimal numbers, so you should probably take that into account when encrypting (convert them to binary first).

Binary file & saved game formatting

I am working on a small roguelike game, and need some help with creating save games. I have tried several ways of saving games, but the load always fails, because I am not exactly sure what is a good way to mark the beginning of different sections for the player, entities, and the map.
What would be a good way of marking the beginning of each section, so that the data can read back reliably without knowing the length of each section?
Edit: The language is C++. It looks like a readable format would be a better shot. Thanks for all the quick replies.
The easiest solution is usually use a library to write the data using XML or INI, then compress it. This will be easier for you to parse, and result in smaller files than a custom binary format.
Of course, they will take slightly longer to load (though not much, unless your data files are 100's of MBs)
If you're determined to use a binary format, take a look at BER.
Are you really sure you need binary format?
Why not store in some text format so that it can be easily parseable, be it plain text, XML or YAML.
Since you're saving binary data you can't use markers without length.
Simply write the number of records of any type and then structured data, then it will be
easy to read again. If you have variable length elements like string the also need length information.
2
player record
player record
3
entities record
entities record
entities record
1
map
If you have a marker, you have to guarantee that the pattern doesn't exist elsewhere in your binary stream. If it does exist, you must use a special escape sequence to differentiate it. The Telnet protocol uses 0xFF to mark special commands that aren't part of the data stream. Whenever the data stream contains a naturally occurring 0xFF, then it must be replaced by 0xFFFF.
So you'd use a 2-byte marker to start a new section, like 0xFF01. If your reader sees 0xFF01, it's a new section. If it sees 0xFFFF, you'd collapse it into a single 0xFF. Naturally you can expand this approach to use any length marker you want.
(Although my personal preference is a text format (optionally compressed) or a binary format with length bytes instead of markers. I don't understand how you're serializing it without knowing when you're done reading a data structure.)