How to minimize serialized data - binary serialization has huge overhead

How to minimize serialized data - binary serialization has huge overhead - serialization

I'd like to reduce the message size when sending serialized integer over the network.
In the below section buff.Length is 256 - to great an overhead to be efficient!
How it can be reduced to the minimum (4 bytes + minimum overhead)?
int val = RollDice(6);
// Should 'memoryStream' be allocated each time!
MemoryStream memoryStream = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(memoryStream, val);
byte[] buff = memoryStream.GetBuffer();
Thanks in advance,
--- KostaZ

Have a look at protobuf.net...it is a very good serialization lib (you can get it on NuGet). Also, ideally you should be using a "using" statement around your memory stream.
To respond to the comment below, then the most efficient method depends on your use case. If you know exactly what you need to serialize and don't need a general purpose serializer then you could write your own binary formatter, which might have no overhead at all (there is some detail here custom formatters).
This link has a comparison of the BinaryFormatter and protobuf.net for your reference.

Related

FileStream faster way to read and write big file

I have a speed problem and memory efficiency, I'm reading the cutting the big chunk of bytes from the .bin files then writing it in another file, the problem is that to read the file i need to create a huge byte array for it:
Dim data3(endOFfile) As Byte ' end of file is here around 270mb size
Using fs As New FileStream(path, FileMode.Open, FileAccess.Read, FileShare.None)
fs.Seek(startOFfile, SeekOrigin.Begin)
fs.Read(data3, 0, endOFfile)
End Using
Using vFs As New FileStream(Environment.GetFolderPath(Environment.SpecialFolder.Desktop) & "\test.bin", FileMode.Create) 'save
vFs.Write(data3, 0, endOFfile)
End Using
so it takes a long time to procedure, what's the more efficient way to do it?
Can I somehow read and write in the same file stream without using a bytes array?

I've never done it this way but I would think that the Stream.CopyTo method should be the easiest method and as quick as anything.
Using inputStream As New FileStream(...),
outputStream As New FileStream(...)
inputStream.CopyTo(outputStream)
End Using
I'm not sure whether that overload will read all the data in one go or use a default buffer size. If it's the former or you want to specify a buffer size other than the default, there's an overload for that:
inputStream.CopyTo(outputStream, bufferSize)
You can experiment with different buffer sizes to see whether it makes a difference to performance. Smaller is better for memory usage but I would expect bigger to be faster, at least up to a point.
Note that the CopyTo method requires at least .NET Framework 4.0. If you're executing this code on the UI thread, you might like to call CopyToAsync instead, to avoid freezing the UI. The same two overloads are available, plus a third that accepts a CancellationToken. I'm not going to teach you how to use Async/Await here, so research that yourself if you want to go that way. Note that CopyToAsync requires at least .NET Framework 4.5.

How to read byte data from a StorageFile in cppwinrt?

A 2012 answer at StackOverflow (“How do I read a binary file in a Windows Store app”) suggests this method of reading byte data from a StorageFile in a Windows Store app:
IBuffer buffer = await FileIO.ReadBufferAsync(theStorageFile);
byte[] bytes = buffer.ToArray();
That looks simple enough. As I am working in cppwinrt I have translated that to the following, within the same IAsyncAction that produced a vector of StorageFiles. First I obtain a StorageFile from the VectorView using theFilesVector.GetAt(index);
//Then this line compiles without error:
IBuffer buffer = co_await FileIO::ReadBufferAsync(theStorageFile);
//But I can’t find a way to make the buffer call work.
byte[] bytes = buffer.ToArray();
“byte[]” can’t work, to begin with, so I change that to byte*, but then
the error is “class ‘winrt::Windows::Storage::Streams::IBuffer’ has no member ‘ToArray’”
And indeed Intellisense lists no such member for IBuffer. Yet IBuffer was specified as the return type for ReadBufferAsync. It appears the above sample code cannot function as it stands.
In the documentation for FileIO I find it recommended to use DataReader to read from the buffer, which in cppwinrt should look like
DataReader dataReader = DataReader::FromBuffer(buffer);
That compiles. It should then be possible to read bytes with the following DataReader method, which is fortunately supplied in the UWP docs in cppwinrt form:
void ReadBytes(Byte[] value) const;
However, that does not compile because the type Byte is not recognized in cppwinrt. If I create a byte array instead:
byte* fileBytes = new byte(buffer.Length());
that is not accepted. The error is
‘No suitable constructor exists to convert from “byte*” to “winrt::arrayView::<uint8_t>”’
uint8_t is of course a byte, so let’s try
uint8_t fileBytes = new uint8_t(buffer.Length());
That is wrong - clearly we really need to create a winrt::array_view. Yet a 2015 Reddit post says that array_view “died” and I’m not sure how to declare one, or if it will help. That original one-line method for reading bytes from a buffer is looking so beautiful in retrospect. This is a long post, but can anyone suggest the best current method for simply reading raw bytes from a StorageFile reference in cppwinrt? It would be so fine if there were simply GetFileBytes() and GetFileBytesAsync() methods on StorageFile.
---Update: here's a step forward. I found a comment from Kenny Kerr last year explaining that array_view should not be declared directly, but that std::vector or std::array can be used instead. And that is accepted as an argument for the ReadBytes method of DataReader:
std::vector<unsigned char>fileBytes;
dataReader.ReadBytes(fileBytes);
Only trouble now is that the std::vector is receiving no bytes, even though the size of the referenced file is correctly returned in buffer.Length() as 167,513 bytes. That seems to suggest the buffer is good, so I'm not sure why the ReadBytes method applied to that buffer would produce no data.
Update #2: Kenny suggests reserving space in the vector, which is something I had tried, this way:
m_file_bytes.reserve(buffer.Length());
But it didn't make a difference. Here is a sample of the code as it now stands, using DataReader.
buffer = co_await FileIO::ReadBufferAsync(nextFile);
dataReader = DataReader::FromBuffer(buffer);
//The following line may be needed, but crashes
//co_await dataReader.LoadAsync(buffer.Length());
if (buffer.Length())
{
m_file_bytes.reserve(buffer.Length());
dataReader.ReadBytes(m_file_bytes);
}
The crash, btw, is
throw hresult_error(result, hresult_error::from_abi);
Is it confirmed, then, that the original 2012 solution quoted above cannot work in today's world? But of course there must be some way to read bytes from a file, so I'm just missing something that may be obvious to another.
Final (I think) update: Kenny's suggestion that the vector needs a size has hit the mark. If the vector is first prepared with m_file_bytes.assign(buffer.Length(),0) then it does get filled with file data. Now my only worry is that I don't really understand the way IAsyncAction is working and maybe could have trouble looping this asynchronously, but we'll see.

The array_view bridges the gap between Windows APIs and C++ array types. In this example, the ReadBytes method expects the caller to provide some array that it can copy bytes into. The array_view forwards a pointer to the caller's array as well as its size. In this case, you're passing an empty vector. Try resizing the vector before calling ReadBytes.

When you know how many bytes to expect (in this case 2 bytes), this worked for me:
std::vector<unsigned char>fileBytes;
fileBytes.resize(2);
DataReader reader = DataReader::FromBuffer(buffer);
reader.ReadBytes(fileBytes);
cout<< fileBytes[0] << endl;
cout<< fileBytes[1] << endl;

Serializing Multiple Objects into ByteArray

I am wondering one thing; How can I serialize multiple objects to a byte array. My goal is to send the serialized object over tcp, receive it, then deserialize it, and recreate it.
My concept is:
The first thing in the byte array will be the "Packet Header" -- This will tell the receiver what type of packet it is; "Chat Message", "File Transfer", etc etc. Then after the header I will add the packet itself. Then at last there will be an "EOF Header" (This will tell the server if the whole packet is received). - The headers are enumerables(as Byte).

where you get these errors would be helpful (essential even) but it is probably related to this:
Public Shared Function Deserialize(Data As Byte()) As Packet
Dim MS As New MemoryStream(Data)
Dim BF As New BinaryFormatter
MS.Position = 0
' or
'MS.Seek(0, SeekOrigin.Begin)
Return DirectCast(BF.Deserialize(MS), Packet)
End Function
After seeding the memstream, the stream position is left at the end. You need to reset it so the BF can read all the bytes. (and you really dont need things like BOF and EOF in the serialized data - even if you are sending multiple things, if you put them in a list, they will either de/serialize in toto or not).
Also look at ProtoBuf-NET - much faster serializer making much smaller packets, and it will let you deserialize into a different assembly-culture-class which NET's BF does not do without basically tricking it.

File Compressed by GZIP grows instead of shrinking

I used the code below to compress files and they keep growing instead of shrinking. I comressed a 4 kb file and it became 6. That is understandable for a small file because of the compression overhead. I tried a 400 mb file and it became 628 mb after compressing. What is wrong? See the code. (.net 2.0)
Public Sub Compress(ByVal infile As String, ByVal outfile As String)
Dim sourceFile As FileStream = File.OpenRead(inFile)
Dim destFile As FileStream = File.Create(outfile)
Dim compStream As New GZipStream(destFile, CompressionMode.Compress)
Dim myByte As Integer = sourceFile.ReadByte()
While myByte <> -1
compStream.WriteByte(CType(myByte, Byte))
myByte = sourceFile.ReadByte()
End While
sourceFile.Close()
destFile.Close()
End Sub

If the underlying file is itself highly unpredictable (already compressed or largely random) then attempting to compress it will cause the file to become bigger.
Going from 400 to 628Mb sounds highly improbable as an expansion factor since the deflate algorithm (used for GZip) tends towards a maximum expansion factor of 0.03% The overhead of the GZip header should be negligible.
Edit: The 4.0 c# release indicates that the compression libraries have been improved to not cause significant expansion of uncompressable data. This suggests that they were not implementing the "fallback to the raw stream blocks" mode. Try using SharpZipLib's library as a quick test. That should provide you with close to identical performance when the stream is incompressible by deflate. If it does consider moving to that or waiting for the 4.0 release for a more performant BCL implementation. Note that the lack of compression you are getting strongly suggests that there is no point you attempting to compress further anyway

Are you sure that writing byte by byte to the stream is a really good idea? It will certainly not have ideal performance characteristics and maybe that's what confuses the gzip compressing algorithm too.
Also, it might happen that the data you are trying to compress is just not really well-compressable. If I were you I would try your code with a text document of the same size as text documents tend to compress much better than random binary.
Also, you could try using a pure DeflateStream as opposed to a GZipStream as they both use the same compression algorithm (deflate), the only difference is that gzip adds some additional data (like error checking) so a DeflateStream might yield smaller results.
My VB.NET is a bit rusty so I'll rather not try to write a code example in VB.NET. Instead, here's how you should do it in C#, it should be relatively straightforward to translate it to VB.NET for someone with a bit of experience: (or maybe someone who is good at VB.NET could edit my post and translate it to VB.NET)
FileStream sourceFile;
GZipStream compStream;
byte[] buffer = new byte[65536];
int bytesRead = 0;
while (bytesRead = sourceFile.Read(buffer, 0, 65536) > 0)
{
compStream.Write(buffer, 0, bytesRead);
}

This is a known anomaly with the built-in GZipStream (And DeflateStream).
I can think of two workarounds:
use an alternative compressor.
build some logic that examines the size of the "compressed" output and compares it to the size of the input. If larger, chuck the output and just store the data.
DotNetZip includes a "fixed" GZipStream based on a managed port of zlib. (It takes approach #1 from above). The Ionic.Zlib.GZipStream can replace the built-in GZipStream in your apps with a simple namespace swap.

Thank you all for good answers. Earlier on I tried to compress .wmv files and one text file. I changed the code to DeflateStream and it seems to work now. Cheers.

Hacky Sql Compact Workaround

So, I'm trying to use ADO.NET to stream a file data stored in an image column in a SQL Compact database.
To do this, I wrote a DataReaderStream class that takes a data reader, opened for sequential access, and represents it as a stream, redirecting calls to Read(...) on the stream to IDataReader.GetBytes(...).
One "weird" aspect of IDataReader.GetBytes(...), when compared to the Stream class, is that GetBytes requires the client to increment an offset and pass that in each time it's called. It does this even though access is sequential, and it's not possible to read "backwards" in the data reader stream.
The SqlCeDataReader implementation of IDataReader enforces this by incrementing an internal counter that identifies the total number of bytes it has returned. If you pass in a number either less than or greater than that number, the method will throw an InvalidOperationException.
The problem with this, however, is that there is a bug in the SqlCeDataReader implementation that causes it to set the internal counter to the wrong value. This results in subsequent calls to Read on my stream throwing exceptions when they shouldn't be.
I found some infomation about the bug on this MSDN thread.
I was able to come up with a disgusting, horribly hacky workaround, that basically uses reflection to update the field in the class to the correct value.
The code looks like this:
public override int Read(byte[] buffer, int offset, int count)
{
m_length = m_length ?? m_dr.GetBytes(0, 0, null, offset, count);
if (m_fieldOffSet < m_length)
{
var bytesRead = m_dr.GetBytes(0, m_fieldOffSet, buffer, offset, count);
m_fieldOffSet += bytesRead;
if (m_dr is SqlCeDataReader)
{
//BEGIN HACK
//This is a horrible HACK.
m_field = m_field ?? typeof (SqlCeDataReader).GetField("sequentialUnitsRead", BindingFlags.NonPublic | BindingFlags.Instance);
var length = (long)(m_field.GetValue(m_dr));
if (length != m_fieldOffSet)
{
m_field.SetValue(m_dr, m_fieldOffSet);
}
//END HACK
}
return (int) bytesRead;
}
else
{
return 0;
}
}
For obvious reasons, I would prefer to not use this.
However, I do not want to buffer the entire contents of the blob in memory either.
Does any one know of a way I can get streaming data out of a SQL Compact database without having to resort to such horrible code?

I contacted Microsoft (through the SQL Compact Blog) and they confirmed the bug, and suggested I use OLEDB as a workaround. So, I'll try that and see if that works for me.

Actually, I decided to fix the problem by just not storing blobs in the database to begin with.
This eliminates the problem (I can stream data from a file), and also fixes some issues I might have run into with Sql Compact's 4 GB size limit.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to minimize serialized data - binary serialization has huge overhead - serialization

Related

FileStream faster way to read and write big file

How to read byte data from a StorageFile in cppwinrt?

Serializing Multiple Objects into ByteArray

File Compressed by GZIP grows instead of shrinking

Hacky Sql Compact Workaround

Categories

Resources