Does protobuf-net have built-in compression for serialization? - serialization

I was doing some comparison between BinaryFormatter and protobuf-net serializer and was quite pleased with what I found, but what was strange is that protobuf-net managed to serialize the objects into a smaller byte array than what I would get if I just wrote the value of every property into an array of bytes without any metadata.
I know protobuf-net supports string interning if you set AsReference to true, but I'm not doing that in this case, so does protobuf-net provide some compression by default?
Here's some code you can run to see for yourself:
var simpleObject = new SimpleObject
Id = 10,
Name = "Yan",
Address = "Planet Earth",
Scores = Enumerable.Range(1, 10).ToList()
using (var memStream = new MemoryStream())
var binaryWriter = new BinaryWriter(memStream);
// 4 bytes for int
// 3 bytes + 1 more for string termination
// 12 bytes + 1 more for string termination
// 40 bytes for 10 ints
// 61 bytes, which is what I expect
Console.WriteLine("BinaryWriter wrote [{0}] bytes",
using (var memStream = new MemoryStream())
ProtoBuf.Serializer.Serialize(memStream, simpleObject);
// 41 bytes!
Console.WriteLine("Protobuf serialize wrote [{0}] bytes",
EDIT: forgot to add, the SimpleObject class looks like this:
public class SimpleObject
[DataMember(Order = 1)]
public int Id { get; set; }
[DataMember(Order = 2)]
public string Name { get; set; }
[DataMember(Order = 3)]
public string Address { get; set; }
[DataMember(Order = 4)]
public List<int> Scores { get; set; }

No it does not; there is no "compression" as such specified in the protobuf spec; however, it does (by default) use "varint encoding" - a variable-length encoding for integer data that means small values use less space; so 0-127 take 1 byte plus the header. Note that varint by itself goes pretty loopy for negative numbers, so "zigzag" encoding is also supported which allows small magnitude numbers to be small (basically, it interleaves positive and negative pairs).
Actually, in your case for Scores you should also look at "packed" encoding, which requires either [ProtoMember(4, IsPacked = true)] or the equivalent via TypeModel in v2 (v2 supports either approach). This avoids the overhead of a header per value, by writing a single header and the combined length. "Packed" can be used with varint/zigzag. There are also fixed-length encodings for scenarios where you know the values are likely large and unpredictable.
Note also: but if your data has lots of text you may benefit from additionally running it through gzip or deflate; if it doesn't, then both gzip and deflate could cause it to get bigger.
An overview of the wire format is here; it isn't very tricky to understand, and may help you plan how best to further optimize.

At least the c++ library does support writing to and from compressed streams:
I'm not sure though if that has been ported to the .Net implementation.


reading large semicolon separated files via streamreader and inserting in sql db in

I need to read large csv files and to insert them into SQL, my idea was to use streamreader and read the file line by line because if I store the content in a variable, program crashes. So thats what i thought:
using FileStream fs
Dim list as String
Dim MyFile as String = ("C:\\Test.txt")
Using fs as FileStream = File.Open(MyFile, FileMode.Open, FileAccess.ReadWrite, FileShare.None) 'file is opened in a protected mode
firstline= fs.ReadLine 'treat the firstline as columnname
rest = fs.ReadLine 'the rest as rest
Do While (Not rest Is Nothing) 'read the complete file
Filestream.TextFieldType = FileIO.FieldType.Delimited
End Using
ResultBlock.Text = "File not readable"
End Try
i wrote list.Add(rest) which is actually a bad idea because the content is stored in a variable then, but i need to read and insert line for line in a sql database which seems to be pretty complicated though, does anyone has an idea how i could handle that?
If you can't read the file into memory because it's too big then what you need is some sort of buffer that holds the records in memory and writes to the database when the list gets to a certain size.
If you really want to keep it manageable then the reader, the writer, and the buffer should all be completely separate from each other. That sounds like more work because it's more classes, but it's actually simpler because each class only does one thing.
I would create a class that represents the item that you're reading from the file, with properties for each record. Like if each line in the file represents a person with a name and employee number, create a class like
public class Person
public string FirstName {get;set;}
public string LastName {get;set;}
public string EmployeeNumber {get;set;}
You'll need a buffer. The job of the buffer is to have items put into it, and flush to a writer when it reaches its maximum size. Perhaps like this:
public interface IBuffer<T>
void AddItem(T item);
public interface IWriter<T>
void Write(IEnumerable<T> items);
public class WriterBuffer<T> : IBuffer<T>
private readonly IWriter<T> _writer;
private readonly int _maxSize;
private readonly List<T> _buffer;
public WriterBuffer(IWriter<T> writer, int maxSize)
_writer = writer;
_maxSize - maxSize;
public void AddItem(T item)
if(_buffer.Count >= _maxSize)
Then, your reader class doesn't know about the writer at all. All it knows is that it writes to the buffer.
public class PersonFileReader
private readonly string _filename;
private readonly IBuffer<Person> _buffer;
public PersonFileReader(string filename, IBuffer<Person> buffer)
_filename = filename;
_buffer = buffer;
public void ReadFile()
//Reads from file.
//Creates a new Person for each record
//Calls _buffer.Add(person) for each Person.
public class PersonSqlWriter : IWriter<Person>
private readonly string _connectionString;
public PersonSqlWriter(string connectionString)
_connectionString = connectionString;
public void Write(IEnumerable<Person> items)
//Writes the list of items to the database
//using _connectionString;
The result is that each of these classes does only one thing. You can use them separately from the others and test them separately from the others. That applies the Single Responsibility Principle. No one class is too complicated because each one has only one responsibility. It also applies the Dependency Inversion principle. The reader doesn't know what the buffer does. It just depends on the interface. The buffer doesn't know what the writer does. And the writer doesn't care where the data comes from.
Now the complexity is in creating the objects. You need a file name, a connection string, and a maximum buffer size. That means something like
var filename = "your file name";
var maxBufferSize = 50;
var connectionString = "your connection string"
var reader = new PersonFileReader(
new WriterBuffer<Person>(
new PersonSqlWriter(connectionString),
Your classes are simpler, but wiring them all together has gotten a little more complicated. That's where dependency injection comes in. It manages this for you. I won't go into that yet because it might be information overload. But if you mention what sort of application this is - web, WCF service, etc., then I might be able to provide a concrete example of how a dependency injection container like Windsor, Autofac, or Unity can manage this for you.
This was all new to me several years ago. At first it just looked like more code. But it actually makes it easier to write small, simple classes, which in turn makes building complex applications much easier.
Have a look at below links:
BulkCopy How can I insert 10 million records in the shortest time possible?
This one contains code samples:
You can also use Import Wizard (

Using JSON with VB.NET ASP.NET 2.0

Total newby question here, I've been struggling with it for hours!
I'm trying to understand how to actually use, and create JSON data. I've been Googling all afternoon and trying to understand what I fine here having downloaded the Newtonsoft DLLs.
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
using (JsonWriter jsonWriter = new JsonTextWriter(sw))
jsonWriter.Formatting = Formatting.Indented;
jsonWriter.WriteValue("DVD read/writer");
jsonWriter.WriteValue("500 gigabyte hard drive");
jsonWriter.WriteValue("200 gigabype hard drive");
Should create something that looks like:
"CPU": "Intel",
"PSU": "500W",
"Drives": [
"DVD read/writer"
"500 gigabyte hard drive",
"200 gigabype hard drive" ]
and I am sure it does... but how do I view it? How do I turn that into an object that the browser can output.
It seems to me that the first stage I need to resolve is "how to create" JSON files/strings, next stage will be how to actually use them. If it helps answer the question, what I'm aiming for initially is to be able to use AJAX Autocomplete from a search page generated from my MySQL database, I was hoping I could write a simple SQL query and have that returned using something similar to the above, but I'm clearly going about it all wrong!
BTW, the example above is in C#, I have successfully converted the process to VB, as that's what I am using, but any responses would be much appreciated as VB examples!
I came across this post about two years after it was posted, but I had the exact same question and noticed that the question wasn't really answered. To answer OP's question, this will get you the JSON string in his example.
The upshot is that you need to get the JSON string back to the browser. You can either place it in a javascript variable (be sure to clean up line enders and single quotes if you do this) or pass it back as the result of an ajax query.
We actually use the built-in Javascript serializer since it has support both on the server and the client side and is quite easy to use. Assuming that you have an existing object, this code goes on the server side:
''' <summary>
''' This method safely serializes an object for JSON by removing all of the special characters (i.e. CRLFs, quotes, etc)
''' </summary>
''' <param name="oObject"></param>
''' <param name="fForScript">Set this to true when the JSON will be embedded directly in web page (as opposed to being passed through an ajax call)</param>
''' <returns></returns>
''' <remarks></remarks>
Public Function SerializeObjectForJSON(ByVal oObject As Object, Optional ByVal fForScript As Boolean = False) As String
If oObject IsNot Nothing Then
Dim sValue As String
sValue = (New System.Web.Script.Serialization.JavaScriptSerializer).Serialize(oObject)
If fForScript Then
' If this serialized object is being placed directly on the page, then we need to ensure that its CRLFs are not interpreted literlally (i.e. as the actual JS values)
' If we don't do this, the script will not deserialize correctly if there are any embedded crlfs.
sValue = sValue.Replace("\r\n", "\\r\\n")
' Fix quote marks
Return CleanString(sValue)
Return sValue
End If
Return String.Empty
End If
End Function
On the client side, deserialization is trivial:
// The result should be a json-serialized record
oRecord = Sys.Serialization.JavaScriptSerializer.deserialize(result.value);
Once you have deserialized the object, you can use its properties directly in javascript:
alert('CPU = ' + oRecord.CPU);
In terms of generating the JSON try
public class HardwareInfo
[JsonProperty(PropertyName = "CPU")]
public string Cpu { get; set; }
[JsonProperty(PropertyName = "PSU")]
public string Psu { get; set; }
public ICollection<string> Drives { get; set; }
public string SerializeHardwareInfo()
var info = new HardwareInfo
Cpu = "Intel",
Psu = "500W",
Drives = new List<string> { "DVD read/writer", "500 gigabyte hard drive", "200 gigabype hard drive" }
var json = JsonConvert.SerializeObject(info, Formatting.Indented);
// {
// "CPU": "Intel",
// "PSU": "500W",
// "Drives": [
// "DVD read/writer",
// "500 gigabyte hard drive",
// "200 gigabype hard drive"
// ]
// }
return json;
The formatting argument is optional. Best of luck.

Performant Entity Serialization: BSON vs MessagePack (vs JSON)

Recently I've found MessagePack, an alternative binary serialization format to Google's Protocol Buffers and JSON which also outperforms both.
Also there's the BSON serialization format that is used by MongoDB for storing data.
Can somebody elaborate the differences and the dis-/advantages of BSON vs MessagePack?
Just to complete the list of performant binary serialization formats: There are also Gobs which are going to be the successor of Google's Protocol Buffers. However in contrast to all the other mentioned formats those are not language-agnostic and rely on Go's built-in reflection there are also Gobs libraries for at least on other language than Go.
// Please note that I'm author of MessagePack. This answer may be biased.
Format design
Compatibility with JSON
In spite of its name, BSON's compatibility with JSON is not so good compared with MessagePack.
BSON has special types like "ObjectId", "Min key", "UUID" or "MD5" (I think these types are required by MongoDB). These types are not compatible with JSON. That means some type information can be lost when you convert objects from BSON to JSON, but of course only when these special types are in the BSON source. It can be a disadvantage to use both JSON and BSON in single service.
MessagePack is designed to be transparently converted from/to JSON.
MessagePack is smaller than BSON
MessagePack's format is less verbose than BSON. As the result, MessagePack can serialize objects smaller than BSON.
For example, a simple map {"a":1, "b":2} is serialized in 7 bytes with MessagePack, while BSON uses 19 bytes.
BSON supports in-place updating
With BSON, you can modify part of stored object without re-serializing the whole of the object. Let's suppose a map {"a":1, "b":2} is stored in a file and you want to update the value of "a" from 1 to 2000.
With MessagePack, 1 uses only 1 byte but 2000 uses 3 bytes. So "b" must be moved backward by 2 bytes, while "b" is not modified.
With BSON, both 1 and 2000 use 5 bytes. Because of this verbosity, you don't have to move "b".
MessagePack has RPC
MessagePack, Protocol Buffers, Thrift and Avro support RPC. But BSON doesn't.
These differences imply that MessagePack is originally designed for network communication while BSON is designed for storages.
Implementation and API design
MessagePack has type-checking APIs (Java, C++ and D)
MessagePack supports static-typing.
Dynamic-typing used with JSON or BSON are useful for dynamic languages like Ruby, Python or JavaScript. But troublesome for static languages. You must write boring type-checking codes.
MessagePack provides type-checking API. It converts dynamically-typed objects into statically-typed objects. Here is a simple example (C++):
#include <msgpack.hpp>
class myclass {
std::string str;
std::vector<int> vec;
// This macro enables this class to be serialized/deserialized
int main(void) {
// serialize
myclass m1 = ...;
msgpack::sbuffer buffer;
msgpack::pack(&buffer, m1);
// deserialize
msgpack::unpacked result;
msgpack::unpack(&result,, buffer.size());
// you get dynamically-typed object
msgpack::object obj = result.get();
// convert it to statically-typed object
myclass m2 =<myclass>();
MessagePack has IDL
It's related to the type-checking API, MessagePack supports IDL. (specification is available from:
Protocol Buffers and Thrift require IDL (don't support dynamic-typing) and provide more mature IDL implementation.
MessagePack has streaming API (Ruby, Python, Java, C++, ...)
MessagePack supports streaming deserializers. This feature is useful for network communication. Here is an example (Ruby):
require 'msgpack'
# write objects to stdout
$stdout.write [1,2,3].to_msgpack
$stdout.write [1,2,3].to_msgpack
# read objects from stdin using streaming deserializer
unpacker =$stdin)
# use iterator
unpacker.each {|obj|
p obj
I think it's very important to mention that it depends on what your client/server environment look like.
If you are passing bytes multiple times without inspection, such as with a message queue system or streaming log entries to disk, then you may well prefer a binary encoding to emphasize the compact size. Otherwise it's a case by case issue with different environments.
Some environments can have very fast serialization and deserialization to/from msgpack/protobuf's, others not so much. In general, the more low-level the language/environment the better binary serialization will work. In higher level languages (node.js, .Net, JVM) you will often see that JSON serialization is actually faster. The question then becomes is your network overhead more or less constrained than your memory/cpu?
With regards to msgpack vs bson vs protocol buffers... msgpack is the least bytes of the group, protocol buffers being about the same. BSON defines more broad native types than the other two, and may be a better match to your object model, but this makes it more verbose. Protocol buffers have the advantage of being designed to stream... which makes it a more natural format for a binary transfer/storage format.
Personally, I would lean towards the transparency that JSON offers directly, unless there is a clear need for lighter traffic. Over HTTP with gzipped data, the difference in network overhead are even less of an issue between the formats.
A key difference not yet mentioned is that BSON contains size information in bytes for the entire document and further nested sub-documents.
document ::= int32 e_list
This has two major benefits for restricted environments (e.g. embedded) where size and performance is important.
You can immediately check if the data you're going to parse represents a complete document or if you're going to need to request more at some point (be it from some connection or storage). Since this is most likely an asynchronous operation you might already send a new request before parsing.
Your data might contain entire sub-documents with irrelevant information for you. BSON allows you to easily traverse to the next object past the sub-document by using the size information of the sub-document to skip it. msgpack on the other hands contains the number of elements inside whats called a map (similar to BSON's sub-documents). While this is undoubtedly useful information it doesn't help the parser. You'd still have to parse every single object inside the map and can't just skip it. Depending on the structure of your data this might have a huge impact on performance.
Well,as the author said,MessagePack is originally designed for network communication while BSON is designed for storages.
MessagePack is compact while BSON is verbose.
MessagePack is meant to be space-efficient while BSON is designed for CURD (time-efficient).
Most importantly, MessagePack's type system (prefix) follow Huffman encoding, here I drawed a Huffman tree of MessagePack(click link to see image):
Quick test shows minified JSON is deserialized faster than binary MessagePack. In the tests Article.json is 550kb minified JSON, Article.mpack is 420kb MP-version of it. May be an implementation issue of course.
var msg = require('msgpack');
var fs = require('fs');
var article = fs.readFileSync('Article.mpack');
for (var i = 0; i < 10000; i++) {
// test_json.js
var msg = require('msgpack');
var fs = require('fs');
var article = fs.readFileSync('Article.json', 'utf-8');
for (var i = 0; i < 10000; i++) {
So times are:
Anarki:Downloads oleksii$ time node test_mp.js
real 2m45.042s
user 2m44.662s
sys 0m2.034s
Anarki:Downloads oleksii$ time node test_json.js
real 2m15.497s
user 2m15.458s
sys 0m0.824s
So space is saved, but faster? No.
Tested versions:
Anarki:Downloads oleksii$ node --version
Anarki:Downloads oleksii$ npm list msgpack
└── msgpack#0.1.7
I made quick benchmark to compare encoding and decoding speed of MessagePack vs BSON. BSON is faster at least if you have large binary arrays:
BSON writer: 2296 ms (243487 bytes)
BSON reader: 435 ms
MESSAGEPACK writer: 5472 ms (243510 bytes)
MESSAGEPACK reader: 1364 ms
Using C# Newtonsoft.Json and MessagePack by neuecc:
public class TestData
public byte[] buffer;
public bool foobar;
public int x, y, w, h;
static void Main(string[] args)
int loop = 10000;
var buffer = new TestData();
TestData data2;
byte[] data = null;
int val = 0, val2 = 0, val3 = 0;
buffer.buffer = new byte[243432];
var sw = new Stopwatch();
for (int i = 0; i < loop; i++)
data = SerializeBson(buffer);
val2 = data.Length;
var rc1 = sw.ElapsedMilliseconds;
for (int i = 0; i < loop; i++)
data2 = DeserializeBson(data);
val += data2.buffer[0];
var rc2 = sw.ElapsedMilliseconds;
for (int i = 0; i < loop; i++)
data = SerializeMP(buffer);
val3 = data.Length;
val += data[0];
var rc3 = sw.ElapsedMilliseconds;
for (int i = 0; i < loop; i++)
data2 = DeserializeMP(data);
val += data2.buffer[0];
var rc4 = sw.ElapsedMilliseconds;
Console.WriteLine("Results:", val);
Console.WriteLine("BSON writer: {0} ms ({1} bytes)", rc1, val2);
Console.WriteLine("BSON reader: {0} ms", rc2);
Console.WriteLine("MESSAGEPACK writer: {0} ms ({1} bytes)", rc3, val3);
Console.WriteLine("MESSAGEPACK reader: {0} ms", rc4);
catch (Exception e)
static private byte[] SerializeBson(TestData data)
var ms = new MemoryStream();
using (var writer = new Newtonsoft.Json.Bson.BsonWriter(ms))
var s = new Newtonsoft.Json.JsonSerializer();
s.Serialize(writer, data);
return ms.ToArray();
static private TestData DeserializeBson(byte[] data)
var ms = new MemoryStream(data);
using (var reader = new Newtonsoft.Json.Bson.BsonReader(ms))
var s = new Newtonsoft.Json.JsonSerializer();
return s.Deserialize<TestData>(reader);
static private byte[] SerializeMP(TestData data)
return MessagePackSerializer.Typeless.Serialize(data);
static private TestData DeserializeMP(byte[] data)
return (TestData)MessagePackSerializer.Typeless.Deserialize(data);

resuming files when uploading to a server using wcf

I'm using WCF and I am trying to resume my upload with next code on the server app:
class DataUploader : IDataUploader
public void Upload(UploadMessage msg)
int speed = msg.AvgSpeed * 1024; // convert to KB
Stream stream= msg.DataStream;
string name = msg.VirtualPath;
int seekPoint; // this is get reading the partial uploaded file
using (FileStream fs = new FileStream(#"C:\savedfile.dat, FileMode.Append))
int bufferSize = 4 * 1024; // 4KB buffer
byte[] buffer = new byte[bufferSize];
int bytes;
while ((bytes = stream.Read(buffer, startPoint, bufferSize)) > 0)
fs.Write(buffer, 0, bytes);
I'm trying to begin to read the stream from a specified point (startPoint) cause the first bytes have already been uploaded. So I could append only remaining bytes to the file partially uploaded. By this way i get an error with the buffersize and can't use seeking because a method not supported exception so I think maybe this approach is not right. Help!!
My service contract:
interface IDataUploader
void Upload(UploadMessage msg);
My message contract:
public class UploadMessage
[MessageHeader(MustUnderstand = true)]
public string VirtualPath { get; set; }
[MessageHeader(MustUnderstand = true)]
public int AvgSpeed { get; set; }
[MessageBodyMember(Order = 1)]
public Stream DataStream { get; set; }
It seems like you are using a standard soap message rather than the streaming binding. Check out the this link
If you don't want to use WCF's streaming api, which is proprietary to WCF, I would considering creating a 'chunking' method from the client if the client is uploading the file. Similar to how FTP can resume, I would query the server to see the current offset, send up a block or set of blocks, write them to my persistance (memory, db, file, etc), and then continue with multiple calls from the client sending smaller blocks (be careful of serialization as that can introduce unnecessary delays). This technique be something you want to investigate since it sounds like the client is 'streaming' to the server.
Btw, you may want to look at the following article to determine if your use of MessageContract is appropriate, as opposed to a DataContract.
If you want resume functionality you cannot do it this way. Your client must send the file in chunks and it must maintain the id of last successfully updated chunk. The service must process chunks and append them to storage.
If the most basic implementation it means that your client must divide file into chunks of well known size and call the upload operation for each chunk. The message must also contains the chunk Id and probably also chunk size (or something identifying the last chunk). This can be also combined with reliable session to allow automatic resend of lost chunks and to enforce in order delivery.
There is also example of channel implementation which does chunking internally.

Does the "Cartridge" pattern exist?

AndroMDA uses the term "cartridge" (e.g. for out-of-the-box NHibernate support).
As I understood it, it takes an API/component, wrapps it, never adds new features, simplifies it, often taking away "the full power", but works well for most cases.
My questions:
Is the term widely used?
Can one properly define it?
Should the suffix "Cartridge" be used in class/method names?
An example: is the following Base64 helper a cartridge for Base64 conversion?
You give away all the power for performance-tuning, but if you simply want to decode a simple (and small) string it works fine:
public static string Decode(string data)
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder utf8Decode = encoder.GetDecoder();
byte[] todecode_byte = System.Convert.FromBase64String(data);
int charCount = utf8Decode.GetCharCount(todecode_byte, 0, todecode_byte.Length);
char[] decoded_char = new char[charCount];
utf8Decode.GetChars(todecode_byte, 0, todecode_byte.Length, decoded_char, 0);
string result = new String(decoded_char);
return result;
return "";
It's called the Facade Pattern. Presumably the AndroMDA folks are big fans of old video game machines...