Terracotta EhCache - concurrent modification creates huge byte array during serialization

Terracotta EhCache - concurrent modification creates huge byte array during serialization - serialization

If my heapdump Dominator Tree view looks like below, can I assume the major contributor (~1GB) to my heap is the Thread instance created by Weblogic? And in that Thread a reference to ByteArrayOutputStream is the reason for 1GB?
Follow up question: Can I also assume it is due to object serialization? Maybe triggered by Terracotta ehcahce (from the third line. Not because it is next to the ByeArrayOS, because in our code that is the only place serialization can happen)?

Ehcache does indeed rely on Java Serialization the moment your cache goes beyond heap.
However, it will only ever serialize what you put in the cache. So is it possible that some cached mappings have such huge value or even key?

The Dominator tree is saying that you have a Weblogic thread holding a ByteArrayOutputStream (and a SerializerObjectOutputStream). The Weblogic thread is a classical worker thread currently processing a request. And it is currently stuck on something.
So, this is the equivalent of
ByteArrayOutputStream in = new ByteArrayOutputStream();
Thread.wait();
The thread is holding a ByteArrayOutputStream and it can't be garbage collected since the thread isn't done with it.
Seeing the serializer makes me think that you are currently deserializing from Ehcache disk or offheap.
Is it possible that you are putting pretty huge objects in your cache? As #louis-jacomet mentioned noticed.

Related

When creating lots of ByteBuddy classes, do I need to acquire locks of any kind?

I am creating several ByteBuddy classes (using DynamicTypeBuilder) and loading them. The creation of these classes and the loading of them happens on a single thread (the main thread; I do not spawn any threads myself nor do I submit anything to an ExecutorService) in a relatively simple sequence.
I have noticed that running this in a unit test several times in a row yields different results. Sometimes the classes are created and loaded fine. Other times I get errors from the generated bytecode when it is subsequently used (often in the general area of where I am using withArgumentArrayElements, if it matters; ArrayIndexOutOfBoundsErrors and the like; again other times this all works fine (with the same inputs)).
This feels like a race condition, but as I said I'm not spawning any threads. Since I am not using threads, only ByteBuddy (or the JDK) could be. I am not sure where that would be. Is there a ByteBuddy synchronization mechanism I should be using when creating and loading classes with DynamicTypeBuilder.make() and getLoaded()? Maybe some kind of class resolution is happening (or not happening!) on a background thread or something at make() time, and I am accidentally somehow preventing it from completing? Maybe if I'm going to use these classes immediately (I am) I need to supply a different TypeResolutionStrategy? I am baffled, as should be clear, and cannot figure out why a single-threaded program with the same inputs should produce generated classes that behave differently from run to run.
My pattern for loading these classes is:
Try to load the (normally non-existent) class using Class#forName(name, true, Thread.currentThread().getContextClassLoader()).
If (when) that fails, create the ByteBuddy-generated class and load it using the usual ByteBuddy recipes.
If that fails, it would be only because some other thread might have created the class already. In this unit test, there is no other thread. In any case, if a failure were to occur here, I repeat step 1 and then throw an exception if the load fails.
Are there any ByteBuddy-specific steps I should be taking in addition or instead of these?

Phew! I think I can chalk this up to a bug in my code (thank goodness). Briefly, what looked like concurrency issues was (most likely) an issue with accidentally shared classnames and HashMap iteration order: when one particular subclass was created-and-then-loaded, the other would simply be loaded (not created) and vice versa. The net effect was effects that looked like those of a race condition.

Byte Buddy is fully thread-safe. But it does attempt to create a class every time you invoke load what is a fairly expensive operation. To avoid this, Byte Buddy offers the TypeCache mechanism that allows you to implement an efficient cache.
Note that libraries like cglib offer automatic caching. Byte Buddy does not do this since the cache uses all inputs as keys and references them statically what can easily create memory leaks. Also, the keys are rather inefficient which is why Byte Buddy chose this approach.

Benefit to clearing DataTables

I'm wondering if there is a benefit to clearing data tables of information once you are done, is there a noticeable problem if I don't clean the tables out. I know the process of clearing the table out is only one line, but I"m wondering the benefits it's providing, and if the tables will automatically be cleared when I exit a run of the application or will they remain until a computer is restarted?
Example:
Me.dtSet.Tables("ExampleTable").Clear()

Please, see this thread
It essentially states that there is no benefit to disposing of a DataSet / DataTable.
Also:
DataSet and DataTable don't actually have any unmanaged resources, so Dispose() doesn't actually do much. The Dispose() methods in DataSet and DataTable exists ONLY because of side effect of inheritance - in other words, it doesn't actually do anything useful in the finalization.
It turns out that DataSets, DataViews, DataTables suppress finalization in their constructorsc this is why calling Dispose() on them explicitly does nothing.
Presumably, this happens because, as mentioned above, they don’t have unmanaged resources; so despite the fact that MarshalByValueComponent makes allowances for unmanaged resources, these particular implementations don’t have the need and can therefore forgo finalization.
Overview of this Immense Answer:
Without a doubt, Dispose should be called on any Finalizable objects.
DataTables are Finalizable.
Calling Dispose significantly speeds up the reclaiming of memory.
MarshalByValueComponent calls GC.SuppressFinalize(this) in its Dispose() - skipping this means having to wait for dozens if not hundreds of Gen0 collections before memory is reclaimed.
From here, By: Killercam
If those don't fully answer your question, perhaps read this answer.
An important takeaway to your question ->
"Does Dispose() method does not free up the memory & make object as null ??
Dispose and the disposal pattern is not for reclaiming managed memory or "deleting" managed objects (both things you cannot do and what the Garbage Collector is there for), it is for handling the disposal/release of unmanaged resources or other managed resources that have releasable items, such as SqlConnection. It certainly won't null the reference, but may make it unusable from the time of disposal forwards."

Thread safety when using XML API in VB.NET

I have a question about thread safety using XML in VB.NET.
I have an application that manages an XmlDocument object as the user creates new items/makes changes to existing items. I already know that I need to synchronize calls to XmlDocument.CreateElement(...). My question is, can I then proceed to build the returned element without synchronization, then just synchronize again when appending that element into the XmlDocument?
This is what I think I can do, I just need to make sure it is thread-safe like I think it is:
' "doc" object already exists as an XmlDocument
SyncLock doc
Dim newsub As XmlElement = doc.CreateElement("submission")
End SyncLock
' use "newsub" here without synchronization
SyncLock doc
doc.Item("submissions").AppendChild(newsub)
End SyncLock
When adding the children of "newsub" then I would also synchronize only when creating each element.
As a followup to this question, would I be better off just synchronizing the entire building of the "newsub" object? The reason I think doing it like above is better is for performance, but I am not by any means an expert in whether I am actually making a meaningful impact on performance, or just complicating things.

In general, when using any class derived from XmlNode, you will need synchronization, as it's documentation explicitly states:
Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
This means you'll need synchronization when appending children, as you've shown.
As a followup to this question, would I be better off just synchronizing the entire building of the "newsub" object? The reason I think doing it like above is better is for performance, but I am not by any means an expert in whether I am actually making a meaningful impact on performance, or just complicating things.
It depends - if you're going to be doing anything that may cause it to be usable from multiple threads, then you may need to synchronize it.
In your above code, it should be safe to work with newsub outside of the synchronization, since it's not part of the actual document tree until you append it as a child. This will reduce the amount of time where doc is locked, which could reduce contention if doc is being used from multiple threads.

Context pattern? Why does Core Data need it?

I'm still fairly new to Core Data and am trying to understand why it requires the passing around of a NSManagedObjectContext. As I understand it, passing the context is needed so that multiple threads don't affect the same context, but I was also under the impression that this pattern is sometimes considered an antipattern, as noted here.
Could Core Data theoretically be implemented in a thread safe way that would avoid using this pattern? How do other ORMs (such as Ruby's ActiveRecord for example) avoid this pattern? For example, couldn't CoreData implement a per-NSManagedObject saving method such as in this extension. This light framework doesn't handle multithreading, but couldn't NSManagedObjects use some kind of internal GCD queue(s) to support it, with an internal context they don't expose?
Sorry if I'm missing anything major.

The NSManagedObjectContext is the in-memory container of your applications object graph, just as the persistent store (XML, SQLite, etc.) usually represents the on disk container of your object graph.
There are some advantages to this approach:
Faulting can be applied to a set of objects, or in the case of CoreData the entire object graph
It's a convenient abstraction for forcing the application to batch it's I/O.
It provides a single point of contact for efficiently performing operations over the entire object graph (NSFetchRequests, etc.)
Undo can be applied to the object graph, not just individual objects.
It's also important to remember that CoreData is not an ORM framework, it's an object persistence framework. The primary responsibility of CoreData is to make accessing data stored in a persistent format on disk more efficient. However it doesn't attempt to emulate the functionality of relational databases.
To your point about concurrency, new concurrency models have been introduced in the upcoming release of Mac OSX. You can read more about that at developer.apple.com.
In the abstract though, the concurrency model chosen for a managed object context has more to do with the specifics of an individual application than the context pattern itself. Instances of NSManagedObjectContext should generally never be shared between threads.
In the same way that each thread requires it's own instance of NSAutoReleasePool, each thread should also have it's own MOC. That way, when the thread is done executing, it can commit it's changes to the store on disk and then release the context, freeing up all the memory consumed by objects processed on the thread.
This is a much more efficient paradigm than allowing a single context to continuously consume system resources during the lifecycle of a given application. Of course, this can be done by invoking -reset on the context as well, which will cause all of the NSManagedObject's in use by the context to be turned back in to faults.

You need one NSManagedObjectContext per thread. So you would have one to fill your UI on the main thread and for longer operations you would have another for each background thread. If you want results to be merged in from those other threads then there is a notification you can subscribe to that provides something to quickly merge what was changed into your main MOC.

Workflow Foundation 4: Activity Caching Thread Safety?

There are certain places in my code where I invoke an activity using the WorkflowInvoker.Invoke method. I'm having a lot of performance issues because I create an activity every time I need to invoke this.
According this MSDN Blog post, I should cache the activity and run the same activity instance rather than creating a new one.
However, my application is multi-threaded. Would it be safe for many threads to use the same instance of the Activity? According to the MSDN documentation, it says its not thread-safe, but it looks like the standard message for almost all classes.
I suspect that it should be thread-safe, since the data that the activity uses is stored in a separate context (as Variables and Arguments) rather than a normal instance member of the activity class.

I have found no problems with threads sharing the same Activity instance. This makes sense because data is passed into the activity through the context (rather than the properties of the Activity object). Activity caching significantly improves performance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas