FileStream faster way to read and write big file - vb.net

I have a speed problem and memory efficiency, I'm reading the cutting the big chunk of bytes from the .bin files then writing it in another file, the problem is that to read the file i need to create a huge byte array for it:
Dim data3(endOFfile) As Byte ' end of file is here around 270mb size
Using fs As New FileStream(path, FileMode.Open, FileAccess.Read, FileShare.None)
fs.Seek(startOFfile, SeekOrigin.Begin)
fs.Read(data3, 0, endOFfile)
End Using
Using vFs As New FileStream(Environment.GetFolderPath(Environment.SpecialFolder.Desktop) & "\test.bin", FileMode.Create) 'save
vFs.Write(data3, 0, endOFfile)
End Using
so it takes a long time to procedure, what's the more efficient way to do it?
Can I somehow read and write in the same file stream without using a bytes array?

I've never done it this way but I would think that the Stream.CopyTo method should be the easiest method and as quick as anything.
Using inputStream As New FileStream(...),
outputStream As New FileStream(...)
inputStream.CopyTo(outputStream)
End Using
I'm not sure whether that overload will read all the data in one go or use a default buffer size. If it's the former or you want to specify a buffer size other than the default, there's an overload for that:
inputStream.CopyTo(outputStream, bufferSize)
You can experiment with different buffer sizes to see whether it makes a difference to performance. Smaller is better for memory usage but I would expect bigger to be faster, at least up to a point.
Note that the CopyTo method requires at least .NET Framework 4.0. If you're executing this code on the UI thread, you might like to call CopyToAsync instead, to avoid freezing the UI. The same two overloads are available, plus a third that accepts a CancellationToken. I'm not going to teach you how to use Async/Await here, so research that yourself if you want to go that way. Note that CopyToAsync requires at least .NET Framework 4.5.

Related

How to read all bytes of a file in winrt with ReadBufferAsync?

I have an array of objects that each needs to load itself from binary file data. I create an array of these objects and then call an AsyncAction for each of them that starts it reading in its data. Trouble is, they are not loading entirely - they tend to get only part of the data from the files. How can I make sure that the whole thing is read? Here is an outline of the code: first I enumerate the folder contents to get a StorageFile for each file it contains. Then, in a for loop, each receiving object is created and passed the next StorageFile, and it creates its own Buffer and DataReader to handle the read. m_file_bytes is a std::vector.
m_buffer = co_await FileIO::ReadBufferAsync(nextFile);
m_data_reader = winrt::Windows::Storage::Streams::DataReader::FromBuffer(m_buffer);
m_file_bytes.resize(m_buffer.Length());
m_data_reader.ReadBytes(m_file_bytes);
My thought was that since the buffer and reader are class members of the object they would not go out of scope and could finish their work uninterrupted as the next objects were asked to load themselves in separate AsyncActions. But the DataReader only gets maybe half of the file data or less. What can be done to make sure it completes? Thanks for any insights.
[Update] Perhaps what is going is that the file system can handle only one read task at a time, and by starting all these async reads each is interrupting the previous one -? But there must be a way to progressively read a folder full of files.
[Update] I think I have it working, by adopting the principle of concentric loops - the idea is not to proceed to the next load until the first one has completed. I think - someone can correct me if I'm wrong, that the file system cannot do simultaneous reads. If there is an accepted and secure example of how to do this I would still love to hear about it, so I'm not answering my own question.
#include <wrl.h>
#include <robuffer.h>
uint8_t* GetBufferData(winrt::Windows::Storage::Streams::IBuffer& buffer)
{
::IUnknown* unknown = winrt::get_unknown(buffer);
::Microsoft::WRL::ComPtr<::Windows::Storage::Streams::IBufferByteAccess> bufferByteAccess;
HRESULT hr = unknown->QueryInterface(_uuidof(::Windows::Storage::Streams::IBufferByteAccess), &bufferByteAccess);
if (FAILED(hr))
return nullptr;
byte* bytes = nullptr;
bufferByteAccess->Buffer(&bytes);
return bytes;
}
https://learn.microsoft.com/en-us/cpp/cppcx/obtaining-pointers-to-data-buffers-c-cx?view=vs-2017
https://learn.microsoft.com/en-us/windows/uwp/xbox-live/storage-platform/connected-storage/connected-storage-using-buffers

Inserting large blob (bigger than memory) to SQLite

Please do not bother asking why I am wanting to write a 13GB (or more) file to the database - its just the nature of the beast that I have to do this.
My problem is I think its loading it all into memory and I dont want to do that, I need to stream it into the blob either in chunks or on-the-fly using as little memory as possible.
here is the code I currently have:
Using fs As New System.IO.FileStream(InPutFile, System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read)
Using sr As New System.IO.StreamReader(fs)
Dim data(fs.Length - 1) As Byte
fs.Read(data, 0, fs.Length)
Using cmd As New SQLite.SQLiteCommand(TSQLDB)
data = IO.File.ReadAllBytes(InPutFile)
cmd.CommandText = "UPDATE [FileSpace] SET [FileData]=#DAT WHERE [ID]=" & MyID
cmd.Prepare()
cmd.Parameters.Add("#DAT", DbType.Binary, fs.Length)
cmd.Parameters("#DAT").Value = data
cmd.ExecuteNonQuery()
cmd.Dispose()
End Using
data = Nothing
End Using
End Using
Any help would be most appreciated :) Thanks!
I think there might be limits to SQLite - just looking on the sqlite.org website, blobs are limited to 2147483647 bytes, which only equates to approx 2GB. There is a command I have found: SQLITE_LIMIT_LENGTH
I think the only other way around this is going to be for me to insert this is to split up the file into 2GB chunks and insert them that way, then re-join the blobs into one file when saving. Nothing is ever easy! - thanks for the comments though - it gave me food for thought and got me to find this possible resolution.

How to minimize serialized data - binary serialization has huge overhead

I'd like to reduce the message size when sending serialized integer over the network.
In the below section buff.Length is 256 - to great an overhead to be efficient!
How it can be reduced to the minimum (4 bytes + minimum overhead)?
int val = RollDice(6);
// Should 'memoryStream' be allocated each time!
MemoryStream memoryStream = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(memoryStream, val);
byte[] buff = memoryStream.GetBuffer();
Thanks in advance,
--- KostaZ
Have a look at protobuf.net...it is a very good serialization lib (you can get it on NuGet). Also, ideally you should be using a "using" statement around your memory stream.
To respond to the comment below, then the most efficient method depends on your use case. If you know exactly what you need to serialize and don't need a general purpose serializer then you could write your own binary formatter, which might have no overhead at all (there is some detail here custom formatters).
This link has a comparison of the BinaryFormatter and protobuf.net for your reference.

Multithreading a filestream in vb2005

I am trying to build a resource file for a website basically jamming all the images into a compressed file that is then unpacked on the output buffers to the client.
my question is in vb2005 can a filestream be multi threaded if you know the size of the converted file, ala like a bit torrent and work on pieces of the filestream ( the individual files in this case) and add them to the resource filestream when they are done instead of one at a time?
If you need something similar to the torrents way of writing to a file, this is how I would implement it:
Open a FileStream on Thread T1, and create a queue "monitor" for step 2
Create a queue that will be read from T1, but written by multiple network reader threads. (the queue data structure would look like this: (position where to write, size of data buffer, data buffer).
Fire up the threads
:)
Anyway, from your comments, your problem seems to be another one..
I have found something in, but I'm not sure if it works:
If you want to write data to a file,
two parallel methods are available,
WriteByte() and Write(). WriteByte()
writes a single byte to the stream:
byte NextByte = 100;
fs.WriteByte(NextByte);
Write(), on the other hand, writes out
an array of bytes. For instance, if
you initialized the ByteArray
mentioned before with some values, you
could use the following code to write
out the first nBytes of the array:
fs.Write(ByteArray, 0, nBytes);
Citation from:
Nagel, Christian, Bill Evjen, Jay
Glynn, Morgan Skinner, and Karli
Watson. "Chapter 24 - Manipulating
Files and the Registry". Professional
C# 2005 with .NET 3.0. Wrox Press. ©
2007. Books24x7. http://common.books24x7.com/book/id_20568/book.asp
(accessed July 22, 2009)
I'm not sure if you're asking if a System.IO.FileStream object can be read from or written to in a multi-threaded fashion. But the answer in both cases is no. This is not a supported scenario. You will need to add some form of locking to ensure serialized access to the resource.
The documentation calls out multi-threaded access to the object as an unsupported scenario
http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

File Compressed by GZIP grows instead of shrinking

I used the code below to compress files and they keep growing instead of shrinking. I comressed a 4 kb file and it became 6. That is understandable for a small file because of the compression overhead. I tried a 400 mb file and it became 628 mb after compressing. What is wrong? See the code. (.net 2.0)
Public Sub Compress(ByVal infile As String, ByVal outfile As String)
Dim sourceFile As FileStream = File.OpenRead(inFile)
Dim destFile As FileStream = File.Create(outfile)
Dim compStream As New GZipStream(destFile, CompressionMode.Compress)
Dim myByte As Integer = sourceFile.ReadByte()
While myByte <> -1
compStream.WriteByte(CType(myByte, Byte))
myByte = sourceFile.ReadByte()
End While
sourceFile.Close()
destFile.Close()
End Sub
If the underlying file is itself highly unpredictable (already compressed or largely random) then attempting to compress it will cause the file to become bigger.
Going from 400 to 628Mb sounds highly improbable as an expansion factor since the deflate algorithm (used for GZip) tends towards a maximum expansion factor of 0.03% The overhead of the GZip header should be negligible.
Edit: The 4.0 c# release indicates that the compression libraries have been improved to not cause significant expansion of uncompressable data. This suggests that they were not implementing the "fallback to the raw stream blocks" mode. Try using SharpZipLib's library as a quick test. That should provide you with close to identical performance when the stream is incompressible by deflate. If it does consider moving to that or waiting for the 4.0 release for a more performant BCL implementation. Note that the lack of compression you are getting strongly suggests that there is no point you attempting to compress further anyway
Are you sure that writing byte by byte to the stream is a really good idea? It will certainly not have ideal performance characteristics and maybe that's what confuses the gzip compressing algorithm too.
Also, it might happen that the data you are trying to compress is just not really well-compressable. If I were you I would try your code with a text document of the same size as text documents tend to compress much better than random binary.
Also, you could try using a pure DeflateStream as opposed to a GZipStream as they both use the same compression algorithm (deflate), the only difference is that gzip adds some additional data (like error checking) so a DeflateStream might yield smaller results.
My VB.NET is a bit rusty so I'll rather not try to write a code example in VB.NET. Instead, here's how you should do it in C#, it should be relatively straightforward to translate it to VB.NET for someone with a bit of experience: (or maybe someone who is good at VB.NET could edit my post and translate it to VB.NET)
FileStream sourceFile;
GZipStream compStream;
byte[] buffer = new byte[65536];
int bytesRead = 0;
while (bytesRead = sourceFile.Read(buffer, 0, 65536) > 0)
{
compStream.Write(buffer, 0, bytesRead);
}
This is a known anomaly with the built-in GZipStream (And DeflateStream).
I can think of two workarounds:
use an alternative compressor.
build some logic that examines the size of the "compressed" output and compares it to the size of the input. If larger, chuck the output and just store the data.
DotNetZip includes a "fixed" GZipStream based on a managed port of zlib. (It takes approach #1 from above). The Ionic.Zlib.GZipStream can replace the built-in GZipStream in your apps with a simple namespace swap.
Thank you all for good answers. Earlier on I tried to compress .wmv files and one text file. I changed the code to DeflateStream and it seems to work now. Cheers.