We have an app that exports very long CSV files, thousands to millions of lines. We use a StringBuilder and lots of .Appends to construct the file, and this runs out of memory around 500,000 lines. StringWriter is based on StringBuilder, and fails at the same point in my tests.
MemoryStream (or Tributary) should not have any problems dealing with a file this large, but its API is based on byte[]. I know I can convert, but this makes the code somewhat (lots?) more difficult to read.
Is there a simpler solution for writing multiple strings to a stream that I'm missing, or perhaps a clean way to implement such functionality (an extension perhaps?)
Just so there's some closure on this: Mark's answer was the one I was looking for. By building up the stream using StreamWriter it works. StringWriter fails. I still have some encoding issues, but those are minor in comparison.
Related
A little while ago, I started learning Kotlin, and I have done its basics, variables, classes, lists, and arrays, etc. but the book I was learning from seemed to miss one important aspect, reading and writing to a file, maybe a function like "fwrite" in C++
So I searched google, and yes, reading and writing bytes were easy enough. However, I being used to C++'s open personality, wanted to make a "kind of" database.
In C++ I would simply make a struct and keep appending it to a file, and then read all the stored objects one by one, by placing "fread" in a for loop or just reading into an array of the struct in one go, as the struct was simply just the bytes allocated to the variables inside it.
However in Kotlin, there is no struct, instead, we use Data Class to group data. I was hoping there was an equally easy way to store data in a file in form of Data Class and read it into maybe a List of that class, or if that is not possible, maybe some other way to store grouped data that would be easy to read and write.
Easiest way is to use a serialization library. Kotlin already provides something for that
TL;DR;
Add KotlinX Serialization to your project, choose the serialization format you prefer (protobuf or cbor will fit, go for json if you prefer something more human readable although bigger in size), use the proper serializer for generating your ByteArray and write it to a file using Kotlin methods for that
Generating the ByteArray might be tricky, not sure as I'm telling this from memory. What I can tell for sure is that if you choose JSON you can get the string representation and write to a file. So I'm assuming the same will be valid for binary formats (but writing to a file in binary instead of strings)
What you need can be fulfilled by ROOM DATABASE
It is officially recommended by GOOGLE, It uses your Android application's internal Database which is made using SQLITE
You can read more info about ROOM at
https://developer.android.com/jetpack/androidx/releases/room?gclid=Cj0KCQjw5ZSWBhCVARIsALERCvwjmJqiRPAnSYjhOzhPXg8dJEYnqKVgcRxSmHRKoyCpnWAPQIWD4gAaAlBnEALw_wcB&gclsrc=aw.ds
It provided Data Object Class (DAO) and Entity Classes through which one can access the database TABLE using SQL Queries.
Also, it will check your queries at compile time for any errors in it.
Note: You need to have basic SQL Knowledge for building the queries for CRUD Operations
Which one of the two is recommended approach given my server API is expecting a C# string? Which one will result in lowest string length?
1) Protobuf-net
Using protobuf-net to convert object <-> byte array
Use Convert.ToBase64String methods for converting byte array <-> string
2) Use Json .Net directly to convert object <-> string
We have Protobuf-net working in our project with byte[] server APIs. Now our server is migrating to string APIs instead of byte[]. We are not sure whether we should move to Json .Net or stay with protobuf-net and use Convert Base 64 for extra string to byte[] conversion.
What do you suggest?
Okay, so this is my thought process which I'm hoping can help you decide between the two:
Before deciding which one is better we need to have a better grasp of the context of the problem. Optimization is always something that has to be done under well defined "fitness" parameters.
What I mean by this is:
If you're most constrained by CPU usage, I would test to see which code uses more CPU to execute.
If bandwidth is an issue, you'd want to look at the method that sends the smallest packets. (In which case base64 of binary serialization should be the answer.)
If code readability is a factor, you should probably look at which code is easier to read / understand while taking less text to write. (In which case I suspect that the JSON route will have better readability)
In general, I would caution against over-optimization. Mainly because you might spend more time thinking and comparing than would be lost by your "unoptimized" code :)
That is to say, only optimize when you can clearly define your bottle-neck.
Hope this helped :)
How may I create and read a packet in VB.NET?
I desire to create an application that sends an object of some sort, and then have the client de-serialize that object, and perhaps establish a 2-way communication where the client sends a piece of info and the server replies with an apt object for it.
Check out ProtoBuf-Net. Fast, small, robust, somewhat easy (sparse docs) and free. Lots of info here on SO and at this link. It will serialize something to a file or mem stream, in less than 10 lines of code (plus some Class/Property attributes) and output something much, MUCH smaller than the NET binary serializer. The basic code is simple:
Try
Dim fs As New FileStream(mUserFile, FileMode.Create, FileAccess.Write)
Serializer.Serialize(fs, _Profiles)
fs.Close()
fs.Dispose()
Catch ex As Exception
MessageBox.Show("PBN Error", MsgTitle, MessageBoxButtons.OK, _
MessageBoxIcon.Exclamation)
End Try
In this case, a Collection of 5 or 6 ListOf items were serialized (ie nested), but it could have just as easily been a class. Loading/Deserializing is just as easy.
There might be a way around it which I never found, but when I tried something like what you describe, the NET binary serializer would only deserialize into the same assembly-class-culture type which created it. This is good for making the output proprietary to your project, very bad for data exchange. Output was also gigantic (Serialize an empty dictionary in NET results in 3000 bytes while PBN needed 300). The ONLY place that the NET serializer is a little better suited is when the assembly is obfuscated; MS knows how to get the data and is not sharing with the rest of the class. Even then, it only adds a few steps to the process.
PBN works with all the collection things like List Of, Dictionary etc but wont natively do things like Rectangles, Point and Size. It is not hard to write a converter to feed it something that will work (I wrote one for Bitmap yesterday).
The biggest downside to VB developers is that all the docs, examples and talk/help are from/for C#. That not only makes some VB people's eyes glaze over, but makes it look like it is a C#-specific solution. Likewise, the info (wire types, packets etc) makes it sound like a network data exchange solution. In reality, it will work just as well with VB for a variety of situations.
XHR2 differences states
The ability to transfer ArrayBuffer, Blob, File and FormData objects.
What are the differences between ArrayBuffer and Blob ?
Why should I care about being able to send them over XHR2 ? (I can understand value of File and FormData)
This is an effort to replace the old method which would take a "string" and cut sections of it out.
You would use an ArrayBuffer when you need a typed array because you intend to work with the data, and a blob when you just need the data of the file.
Blobs (according to spec anyway) have space for a MIME and easier to put into the HTML5 file API than other formats (it's more native to it).
The ArrayBuffer lets us work with typed arrays which is much faster than string manipulation to work with specific bytes and lets us define what type the array segments actually are. Since JavaScript is not strictly typed, it's hard to take a file that might be broken into an array of 32bit ints or perhaps 64bit floats (just imagine 8 bit ints-- that'd be a nightmare in terms of performance with string manipulation and bitwise calculations, especially with unicode).
As far as I can tell you can always move a blob to an array buffer or to a string representation, but this being native to XHR allows scripts to be faster which is the main advantage.
I'd use a blob for working with the file API, but I'd use the array for preforming computation on the data.
VB.NET, .NET 4
Hello all,
I have a List(Of Byte) that is filled with bytes from the serial buffer on a SerialPort.DataRecieved Event. I then try to parse the data. Part of the parsing process involves deleting elements of the List(Of Byte). Should I be concerned about the List being modified by a DataRecieved Event that might be raised during the parsing process? I realize that probably depends on what I'm trying to do, but, assuming I should be concerned (e.g., the parsing process needs List.Count to not change until parsing is finished), how should I go about making sure any Add calls wait until the parser is done? I guess the answer is something like SyncLock, but I've never really understood how SyncLock works. Any basic help would be appreciated!
Thanks in advance,
Brian
Well, it's not the greatest use of CPU cycles, removing bytes from a List(Of Byte) is an O(n) operation. Making the overall processing step O(n^2). It is still quite difficult to put any kind of pressure on the cpu doing so, serial ports are glacially slow. You should only ever modify working code if you have measured it to be a perf problem.
If you're not there yet then consider creating a new array or List from the old one. That's O(n), the extra storage cannot hurt considering the slow data rates. The code should be cleaner too.
As far as threading goes, be sure to do this in the DataReceived handler. That's thread-safe and avoids putting undue pressure on the UI thread in case you invoke.