Potential memory leaks with Signal-R and Newtonsoft serialization

Potential memory leaks with Signal-R and Newtonsoft serialization - asp.net-core

We are using Signal-R to synchronize an important (about 200MB) object (serialized using Newtonsoft) from one application to another. But every time the object is serialized and sent through Signal-R, the size of both applications increase drastically (about 200MB each time, what a surprise).
From our analysis, it seems that the memory is growing due to something within Signal-R.
Any ideas?
Thanks

Related

Akka Stream application using more memory than the jvm's heap

Summary:
I have a Java application that uses akka streams that's using more memory than I have specified the jvm to use. The below values are what I have set through the JAVA_OPTS.
maximum heap size (-Xmx) = 700MB
metaspace (-XX) = 250MB
stack size (-Xss) = 1025kb
Using those values and plugging them into the formula below, one would assume the application would be using around 950MB. However that is not the case and it's using over 1.5GB.
Max memory = [-Xmx] + [-XX:MetaspaceSize] + number_of_threads * [-Xss]
Question: Thoughts on how this is possible?
Application overview:
This java application uses alpakka to connect to pubsub and consumes messages. It utilizes akka stream's parallelism where it performs logic on the consumed messages and then it produces those messages to a kafka instance. See the heap dump below. Note, the heap is only 912.9MB so something is taking up 587.1MB and getting the memory usage over 1.5GB
Why is this a problem?
This application is deployed on a kubernetes cluster and the POD has a memory limit specified to 1.5GB. So when the container, where the java application is running, consumes more that 1.5GB the container is killed and restarted.

The short answer is that those do not account for all the memory consumed by the JVM.
Outside of the heap, for instance, memory is allocated for:
compressed class space (governed by the MaxMetaspaceSize)
direct byte buffers (especially if your application performs network I/O and cares about performance, it's virtually certain to make somewhat heavy use of those)
threads (each thread has a stack governed by -Xss ... note that if mixing different concurrency models, each model will tend to allocate its own threads and not necessarily provide a means to share threads)
if native code is involved (e.g. perhaps in the library Alpakka is using to interact with pubsub?), that can allocate arbitrary amounts of memory outside of the heap)
the code cache (typically 48MB)
the garbage collector's state (will vary based on the GC in use, including the presence of any tunable options)
various other things that generally aren't going to be that large
In my experience you're generally fairly safe with a heap that's at most (pod memory limit minus 1 GB), but if you're performing exceptionally large I/Os etc. you can pretty easily get OOM even then.
Your JVM may ship with support for native memory tracking which can shed light on at least some of that non-heap consumption: most of these allocations tend to happen soon after the application is fully loaded, so running with a much higher resource limit and then stopping (e.g. via SIGTERM with enough time to allow it to save results) should give you an idea of what you're dealing with.

Lucene Index converting gen0 memory to unmanaged resource

I am using Lucene NET v4.8 beta, and I have noticed that my service consume more and more memory whenever I add huge amounts of data to the index.
I analyzed this through dotMemory, seen below. The spike of memory at minute 4 is when I added a bunch of data to the Index. I understand the increase in gen0 memory because I am creating IndexWriter object and performing write/commit operations. However, it seems as if, once the GC is called, the gen0 objects it collects becomes part of unmanaged memory instead (for example, at minute 5:30 in the screenshot, we can see that gen0 memory usage got collected, but total memory remained the same - unmanaged memory grew the same amount that gen0 memory got collected).
I have already implemented the Dispose pattern in my code properly for IndexWriter and IndexReader, but this issue remains. The only 'fix' I was able to do I was able to do was forcing Garbage Collect to collect, via GC.Collect() manually in the code, but it still doesn't remove all the extra unmanaged memory.
My questions are, what is causing Lucene NET to behave like this (turning gen0 objects into unmanaged resources)? And is there a way to fix this without having to do GC.Collect() every so often?

How to tackle huge amount of memory allocation on ASP.NET Core 2.0 application EF Core usage?

I stumbled across a very strange issue. Whenever the web application starts, dotnet.exe has a decent memory usage (about 300M). However, when it touches some parts (I sense it is related to EF Core usage), it allocated a huge amount of memory in short amount of time (about 8GB in 2-3 seconds).
This memory usage takes about 10-15 seconds, after that the memory settles at about 600M and it operates normally.
I tried both dottrace and builtin Diagnostics Tools to understand what allocates so much memory, but cannot find anything meaningful:
Dottrace for the most memory consuming thread, but I could not catch a snapshot of the memory while being very high (it only shows me about ~1GB in total and about 800M managed memory).
VS Diagnostic Tools delta between baseline and immediately after the memory spiked
How can I can I get to the root cause of this memory allocation? It is strange that it does not seem to be a leak, since the memory is eventually deallocated.
Question: How to tackle huge amount of memory allocation on ASP.NET Core 2.0 application EF Core usage?
I think the issue is indeed related to number of injected services, but first I will provide more about the application architecture. I rely on a bunch of generic repositories that are injected in a scoped data access which is creates a wrapper upon the data context and help with saving multiple information (for various repositories) in a single transaction if needed:
Repository<T> : IRepository<T>
<- DbContext
ScopedDataAccess : IScopedDataAccess
<- DbContext
<- logging service
<- dozens of IRepository<T>
Everything is "scoped":
services.AddScoped<IScopedDataAccess, ScopedDataAccess>();
services.AddScoped(typeof(IRepository<>), typeof(Repository<>));
I removed about half of the injected repositories in ScopedDataAccess and the required memory reduced to about a half.
What is more strange is that the Diagnostic Tools shows a decrease of memory without being directly tied to a GC kicking in (see the following graph, GC is the upper yellow sign):
Also, I double checked that I have stopped all async jobs (e.g. Quartz).

Not a full answer, but I have done the following and greatly reduce the memory (and CPU usage):
simplified dependency graph by splitting large services requiring lots of injected services
upgraded to ASP.NET Core 2.1
The last step had the most visible effect and my Diagnostics Tools now shows a much friendlier graph:

boost::serialization high memory consumption during serialization

just as the topic suggests I've come across a slight issue with boost::serialization when serializing a huge amount of data to a file. The problem consists of the memory footprint of the serialization part of the application taking around 3 to 3.5 times the memory of my objects being serialized.
It is important to note that the data structure I have is a three dimensional vector of base class pointers and a pointer to that structure. Like this:
using namespace std;
vector<vector<vector<MyBase*> > >* data;
This is later serialised with a code analog to this one:
ar & BOOST_SERIALIZATION_NVP(data);
boost/serialization/vector.hpp is included.
Classes being serialised all inherit from "MyBase".
Now, since the start of my project I've used different archives for serialization from typical binary_archive, text, xml and finally polymorphic binary/xml/text. Every single one of these acts exactly the same way.
Typically this wouldn't be a problem if I had to serialize small amounts of data but the number of classes I have are in the milions (ideally around 10 milion) and the memory usage as I've been able to test it shows consistently that the memory allocated by boost::serialization part of the code is around 2/3 of the application whole memory footprint while writing the file.
This amounts to around 13.5 GB of RAM taken for 4 milion objects where the objects themselves take 4.2GB. Now this is as far as I've been able to take my code since I don't have access to a machine with more than 8GB of physical RAM. I should also note that this is a 64bit application being run on a Windows 7 professional x64 edition but the situation is similar on an Ubuntu box.
Anyone has any idea how I would go about troubleshooting this as it is unacceptable for me to have such high memory requirements for an application that will not use as much memory while running as it does while serializing.
Deserialization isn't as bad, as it allocates around 1.5 times the needed memory. This is something I could live with.
Tried turning tracking off with boost::archive::archive_flags::no_tracking but it acts exactly the same.
Anyone have any idea what I should do?

Using valgrind I found that the main reason of memory consumption is a map inside the library to track pointers. If you are certain that you do not need pointer tracking ( it means you are sure that there is no pointer aliasing) disable tracking. You can find here the main concepts of disable tracking. In short you must do something like this:
BOOST_CLASS_TRACKING(vector<vector<vector<MyBase*> > >, boost::serialization::track_never)
In my question I wrote a version of this macro that you could disable tracking of a template class. This must have a significant impact on your memory consumption.
Also notice that there are pointers inside any containers If you want tracking never you must disable tracking of them too. Currently I could not find any way to do this properly.

Recover memory from w3wp.exe

Is it possible to recover memory lost from w3wp.exe? I thought a session.abandon() should clear up the resources like that? The thing is I have a web application, certain pages make w3wp.exe grow significantly. Like from 40 MB to 400 MB. Now I am going to optimize my code defiantly to reduce this, however for what ever amount the w3wp.exe grows, is there no way to recover the lost memory even when the user has logged out and closed the browser?
I know this worker process will recycle after 30 minutes (default) of idle use, but what if there is no idle use-age for a long time and the worker process still has that portion of memory, it just keeps on growing? Any thoughts people?

The garbage collector will take care of whatever memory needs to be freed, provided that you dispose things correctly, etc. The GC doesn't immediately kick in every time you call Session.Abandon(), as that would be a major performance hit.
That said, every application has a "normal" memory usage, i.e. a stable memory usage (again, provided you don't have leaks), and this figure is different for every application. 400MB can be a lot or it can be nothing, depending on what your app does. I have apps that hover around 400MB and others around 1.5GB and that's OK as long as memory usage stabilizes somewhere. If you see unbounded memory usage then you most likely have a leak somewhere in your app.
Storing large amounts of data in the in-proc session can also quickly rack up the memory usage. Instead, use a file or a database to store this data.

unless you are leaking the memory, the memory manager will re-use this memory so you should not see the process memory keep growing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas