A little while ago, I started learning Kotlin, and I have done its basics, variables, classes, lists, and arrays, etc. but the book I was learning from seemed to miss one important aspect, reading and writing to a file, maybe a function like "fwrite" in C++
So I searched google, and yes, reading and writing bytes were easy enough. However, I being used to C++'s open personality, wanted to make a "kind of" database.
In C++ I would simply make a struct and keep appending it to a file, and then read all the stored objects one by one, by placing "fread" in a for loop or just reading into an array of the struct in one go, as the struct was simply just the bytes allocated to the variables inside it.
However in Kotlin, there is no struct, instead, we use Data Class to group data. I was hoping there was an equally easy way to store data in a file in form of Data Class and read it into maybe a List of that class, or if that is not possible, maybe some other way to store grouped data that would be easy to read and write.
Easiest way is to use a serialization library. Kotlin already provides something for that
TL;DR;
Add KotlinX Serialization to your project, choose the serialization format you prefer (protobuf or cbor will fit, go for json if you prefer something more human readable although bigger in size), use the proper serializer for generating your ByteArray and write it to a file using Kotlin methods for that
Generating the ByteArray might be tricky, not sure as I'm telling this from memory. What I can tell for sure is that if you choose JSON you can get the string representation and write to a file. So I'm assuming the same will be valid for binary formats (but writing to a file in binary instead of strings)
What you need can be fulfilled by ROOM DATABASE
It is officially recommended by GOOGLE, It uses your Android application's internal Database which is made using SQLITE
You can read more info about ROOM at
https://developer.android.com/jetpack/androidx/releases/room?gclid=Cj0KCQjw5ZSWBhCVARIsALERCvwjmJqiRPAnSYjhOzhPXg8dJEYnqKVgcRxSmHRKoyCpnWAPQIWD4gAaAlBnEALw_wcB&gclsrc=aw.ds
It provided Data Object Class (DAO) and Entity Classes through which one can access the database TABLE using SQL Queries.
Also, it will check your queries at compile time for any errors in it.
Note: You need to have basic SQL Knowledge for building the queries for CRUD Operations
We have an app that exports very long CSV files, thousands to millions of lines. We use a StringBuilder and lots of .Appends to construct the file, and this runs out of memory around 500,000 lines. StringWriter is based on StringBuilder, and fails at the same point in my tests.
MemoryStream (or Tributary) should not have any problems dealing with a file this large, but its API is based on byte[]. I know I can convert, but this makes the code somewhat (lots?) more difficult to read.
Is there a simpler solution for writing multiple strings to a stream that I'm missing, or perhaps a clean way to implement such functionality (an extension perhaps?)
Just so there's some closure on this: Mark's answer was the one I was looking for. By building up the stream using StreamWriter it works. StringWriter fails. I still have some encoding issues, but those are minor in comparison.
Which one of the two is recommended approach given my server API is expecting a C# string? Which one will result in lowest string length?
1) Protobuf-net
Using protobuf-net to convert object <-> byte array
Use Convert.ToBase64String methods for converting byte array <-> string
2) Use Json .Net directly to convert object <-> string
We have Protobuf-net working in our project with byte[] server APIs. Now our server is migrating to string APIs instead of byte[]. We are not sure whether we should move to Json .Net or stay with protobuf-net and use Convert Base 64 for extra string to byte[] conversion.
What do you suggest?
Okay, so this is my thought process which I'm hoping can help you decide between the two:
Before deciding which one is better we need to have a better grasp of the context of the problem. Optimization is always something that has to be done under well defined "fitness" parameters.
What I mean by this is:
If you're most constrained by CPU usage, I would test to see which code uses more CPU to execute.
If bandwidth is an issue, you'd want to look at the method that sends the smallest packets. (In which case base64 of binary serialization should be the answer.)
If code readability is a factor, you should probably look at which code is easier to read / understand while taking less text to write. (In which case I suspect that the JSON route will have better readability)
In general, I would caution against over-optimization. Mainly because you might spend more time thinking and comparing than would be lost by your "unoptimized" code :)
That is to say, only optimize when you can clearly define your bottle-neck.
Hope this helped :)
First of all, thanks in advance for your help.
I've decided to ask for help in forums like this one because after several months of hard working, I couldn't find a solution for my problem.
This can be described as 'Why an object created in VB.net isn't released by the GC when it is disposed even when the GC was forced to be launched?"
Please consider the following piece of code. Obviously my project is much more complex, but I was able to isolate the problem:
Imports System.Data.Odbc
Imports System.Threading
Module Module1
Sub Main()
'Declarations-------------------------------------------------
Dim connex As OdbcConnection 'Connection to the DB
Dim db_Str As String 'ODBC connection String
'Sentences----------------------------------------------------
db_Str = "My ODBC connection String to my MySQL database"
While True
'Condition: Infinite loop.
connex = New OdbcConnection(db_Str)
connex.Open()
connex.Close()
'Release created objects
connex.Dispose()
'Force the GC to be launched
GC.Collect()
'Send the application to sleep half a second
System.Threading.Thread.Sleep(500)
End While
End Sub
End Module
This simulates a multithreaded application making connections to a MySQL database. As you can see, the connection is created as a new object, then released. Finally, the GC was forced to be launched. I've seen this algorithm in several forums but also in the MSDN online help, so as far as I am concerned, I am not doing anything wrong.
The problem begins when the application is launched. The object created is disposed within the code, but after a while, the availiable memory is exhausted and the application crashes.
Of course, this problem is hard to see in this little version, but on the real project, the application runs out of memory very quickly (due to the amount of connections made over the time) and as result, the uptime is only two days. Then I need to restart the application again.
I installed a memory profiler on my machine (Scitech .Net Memory profiler 4.5, downloadable trial version here). There is a section called 'Investigate memory leaks'. I was absolutely astonished when I saw this on the 'Real Time' tab. If I am correct, this graphic is telling me that none of the objects created on the code have been actually released:
The surprise was even bigger when I saw this other screen. According to this, all undisposed objects are System.Transactions type, which I assume are internally managed within the .Net libraries as I am not creating any object of this type on my code. Does it mean there is a bug on the VB.net Standard libraries???:
Please notice that in my code, I am not executing any query. If I do, the ODBCDataReader object won't be released either, even if I call the .Close() method (surprisingly enough, the number of unreleased objects of this type is exactly the same as the unreleased objects of type System.Transactions)
Another important thing is the statement GC.Collect(). This is used by the memory profiler to refresh the information to be displayed. If you remove it from the code, the profiler wont' update the real time diagram properly, giving you the false impression that everything is correct.
Finally, if you ommit the connex.Open() statement, the screenshot #1 will render a flat line (that means all the objects created have been successfully released), but unfortunatelly, we can't make any query against the database if the connection hasn't been opened.
Can someone find a logical explanation to this and also, a workaround for effectively releasing the objects?
Thank you all folks.
Nico
Dispose has nothing to do with garbage collection. Garbage collection is exclusively about managed resources (memory). Dispose has no bearing on memory at all, and is only relevant for unmanaged resources (database connections, file handles, gdi resource, sockets... anything not memory). The only relationship between the two has to do with how an object is finalized, because many objects are often implemented such that disposing them will suppress finalization and finalizing them will call .Dispose(). Explicitly Disposing() an object will never cause it to be collected1.
Explicitly calling the garbage collector is almost always a bad idea. .Net uses a generational garbage collector, and so the main effect of calling it yourself is that you'll hold onto memory longer, because by forcing the collection earlier you're likely to check the items before they are eligible for collection at all, which sends them into a higher-order generation that is collected less often. These items otherwise would have stayed in the lower generation and been eligible for collection when the GC next ran on it's own. You may need to use GC.Collect() now for the profiler, but you should try to remove it for your production code.
You mention your app runs for two days before crashing, and are not profiling (or showing results for) your actual production code, so I also think the profiler is in part misleading you here. You've pared down the code to something that produced a memory leak, but I'm not sure it's the memory leak you are seeing in production. This is partly because of the difference in time to reproduce the error, but it's also "instinct". I mention that because some of what I'm going to suggest might not make sense immediately in light of your profiler results. That out of the way, I don't know for sure what is going on with your lost memory, but I can make a few guesses.
The first guess is that your real code has try/catch block. An exception is thrown... perhaps not on every connection, but sometimes. When that happens, the catch block allows your program to keep running, but you skipped over the connex.Dispose() line, and therefore leave open connections hanging around. These connections will eventually create a denial of service situation for the database, which can manifest itself in a number of ways. The correction here is to make sure you always use a finally block for anything you .Dispose(). This is true whether or not you currently have a try/catch block, and it's important enough that I would say the code you've posted so far is fundamentally wrong: you need a try/finally. There is a shortcut for this, via a using block.
The next guess is that some of your real commands end up fairly large, possibly with large strings or image (byte[]) data involved. In this case, items end up on a special garbage collector generation called the Large Object Heap (LOH). The LOH is rarely collected, and almost never compacted. Think of compaction as analogous to what happens when you defrag a hard drive. If you have items going to the LOH, you can end up in a situation where the physical memory itself is freed (collected), but the address space within your process (you are normally limited to 2GB) is not freed (compacted). You have holes in your memory address space that will not be reclaimed. The physical RAM is available to your system for other processes, but over time this still results in the same kind of OutOfMemory exception you're seeing. Most of the time this doesn't matter: most .Net programs are short-lived user-facing apps, or ASP.Net apps where the entire thread can be torn down after a page is served. Since you're building something like a service that should run for days, you have to be more careful. The fix may involve significantly re-working some code, to avoid creating the large objects at all. That may mean re-using a single or small set of byte arrays over and over, or using streaming techniques instead of string concatenation or string builders for very large sql queries or sql query data. It may also mean you find this easier to do as a scheduled task that runs daily and shuts itself down at the end of the day, or a program that is invoked on demand.
A final guess is that something you are doing results in your connection objects still being in some way reachable by your program. Event handlers are a common source of mistakes of this sort, though I would find it strange to have event handlers on your connections, especially as this is not part of your example.
1 I suppose I could contrive a scenario that would make this happen. A simple way would be to build an object assumes a global collection for all objects of that type... the objects add themselves to the collection at construction and remove themselves at disposal. In this way, the object could not be collected before disposal, because before that point it would still be reachable... but that would be a very flawed program design.
Thank you all guys for your very helpful answers.
Joel, you're right. This code produces 'a leak' which is not necesarily the same as 'the leak' problem I have on my real project, though they reproduce the same symptoms, that is, the number of unreleased objects keep growing (and eventually will exhaust the memory) on the code mentioned above. So I wonder what's wrong with it as everything seems to be properly coded. I don't understand why they are not disposed/collected. But according to the profiler, they are still in memory and eventually will prevent to create new objects.
One of your guesses about my 'real' project hit the nail on the head. I've realized that my 'catch' blocks didn't call for object disposal, and this has been now fixed. Thanks for your valuable suggestion. However, I implemented the 'using' clause in the code in my example above and didn't actually fix the problem.
Hans, you are also right. After posting the question, I've changed the libraries on the code above to make connections to MySQL.
The old libraries (in the example):
System.Data.Odbc
The new libraries:
System.Data
Microsoft.Data.Odbc
Whith the new ones, the profiler rendered a flat line, whithout any further changes on the code, which it was what I've been looking after. So my conclussion is the same as yours, that is there may be some internal error in the old ones that makes that thing to happen, which makes them a real 'troublemaker'.
Now I remember that I originally used the new ones on my project (the System.Data and Microsoft.Data.Odbc) but I soon changed for the old ones (the System.Data.Odbc) because the new ones doesn't allow Multiple Active Recordsets (MARS) opened. My application makes a huge amount of queries against the MySQL database, but unfortunately, the number of connections are limited. So I initially implemented my real code in such a way that it made only a few connections, but they were shared accross the code (passing the connection between functions as parameter). This was great because (for example) I needed to retrieve a recordset (let's say clients), and make a lot of checks at the same time (example, the client has at least one invoice, the client has a duplicated email address, etc, which involves a lot of side queries). Whith the 'old' libraries, the same connection allowed to create multiple commands and execute different queries.
The 'new' libraries don't allow MARS. I can only create one command (that is, to execute a query) per session/connection. If I need to execute another one, I need to close the previous recordset (which isn't actually possible as I am iterating over it), and then to make the new query.
I had to find the balance between both problems. So I end up using the 'new libraries' because of the memory problems, and I recoded my application to not share the connections (so each procedure will create a new one when needed), as well as reducing the number of connections the application can do at the same time to not exhaust the connection pool.
The solution is far to ideal as it introduces spurious logic on the application (the ideal case scenario would be to migrate to SQL server), but it is giving me better results and the application is being more stable, at least in the early stages of the new version.
Thanks again for your suggestions, I hope you will find mines usefult too.
Cheers.
Nico
Good morning all.
I'm relatively new to the Visual Basic realm (although a traditional web based script developer), i've come to ask you a question. I am reading data from an XML file. This local XML file will be updated by another application, and I will need to periodically re-evaluate the XML file, and only import new data into a list box. Furthermore, I want to be able to click on a particular item in the listbox, and display the other values about that particular XML entry.
So, I suppose this is a multi part question. What is the proper way to import only NEW data into the program, what is the proper way to store the data, and how do I associate a value in a listbox with the data stored elsewhere?
I've considered multidimensional arrays, but have been told that strings to char arrays and then back to strings is a terrible way to manage the data, but was never offered an alternative.
I will be satisfied with a list of topics to study up on and/or an example for an answer to this question.
I would probably use classes that implement INotifyPropertyChanged and a BindingList. Then you just need to listen to ListChanged events off of the list and update the list box then.
I have a blog post that discusses binding classes and interfaces if you want to learn more about them: Data Binding Classes, Interfaces, and Attributes in Windows Forms 2.0. It might be a little dated by now, I haven't reviewed it since I wrote it in March, 2007.
As a start look at the XmlDocument and XmlReader classes.
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.aspx
http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx
XmlDocument helps load a document into memory and allows you to look at the document in any way you desire, depending on the size of the file there may be implications as to how long pulling in the file takes
XmlReader allows access on the fly, and gives you access very much like a DataReader. I.e. keeping track of your position in the dataset and not retaining any data once you have inspected it.
For keeping a track of updates, it depends where the XML is stored.
If it is in a file a FileSystemWatcher may help in determining when you need to update....
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx