GC may free the memory and resources, how to make object reusable when cache is hited ?
I want to deploy Ignite with Spark, and I am confused by keeping object across multi-nodes. Can someone explain it to me?
Once cache is populated, a simple Cache.get() operation will return the same object. You can take a look at the Data Grid documentation in Ignite: https://ignite.incubator.apache.org/features/datagrid.html
Also,Ignite comes with a Shared RDD implementation for sharing state across Spark jobs and applications. I think this is perhaps what you need: https://ignite.incubator.apache.org/features/igniterdd.html
Related
How can I use dotMemory to identify all objects that were created and then collected either as of a snapshot or between two snapshots? It seems like it should be able to but I can't find anywhere that this is discussed (or I don't know the right terms to search with).
You need a memory traffic view. Note that memory traffic data can't be collected when dotMemory is attached to the already running application due to MS Profiling API restriction.
https://www.jetbrains.com/help/dotmemory/Analyzing_Traffic.html
I am currently building a google data-flow pipeline that writes to multiple big query tables at run-time. The problem I am currently facing is, I need to re-use the resources like big query service instance, table info etc. (I do not want to re-create those resources every time) but I am not able to cache them in an efficient way.
Currently I am using a simple factory to cache them (using static concurrent hash map). The pipeline does not seem to pick those from the cache (actually it does it for couple of times but most of them are re-created).
I saw some work around with fixed size session windows but I need more simpler solution if there exists any.
So, is there any best practices or solution to the current problem I am facing.
Is there any way to share resources between windows ?
Actually I misplaced the logging information which let to invert the result (my bad). But the solution with Static Factory separate from the pipeline job seem to resolve the resource sharing issue. Hope this helps to anyone having similar issue further :)
How does Apache Ignite's indexing work? I haven't found those technical details in the documentation.
Is it using a B-tree?
Where is the index stored?
How is it stored?
What performance (in Big-O notation) does the index provide after build in usage?
How fast does it build, when does it build?
Ignite can store arbitrary serializable Java objects. How does it deal with composites when I want to index a field of a sub-sub-object?
Ignite Cache is a key-value store. Am I able to have different classes (=types as objects) as values? In other words, is Ignite Cache Schemaless? If yes, how does this fit with my SQL-queries?
Ignite Cache is a key-values store. How does do the keys come into play if I SQL-query for my values? What am I querying for?
The keys can be arbitrary, serializable Java objects - am I able to query for the keys or only the values?
This information is not covered really much in docs because it is mostly implementation detail and can change from version to version. After all the source code is available if you are interested in details.
To be specific I'm talking about Ignite 1.5 which is about to be released.
Before 1.5 the default data structure was a snap-tree (variant of avl-tree), since 1.5 skip-list option was added as well and it is a default now.
In java heap or in off-heap memory depending on config.
Reliably :) I don't understand this question.
log(N) on update and lookup.
Index is getting updated on each transaction commit (or just cache update in case of atomic cache), there is no separate build phase. You can expect you indexes to be in correct state after each update.
Ignite has two options (since 1.5): either to store objects in binary format which allows to get separate field values or keep the whole object deserialized and use reflection.
etc.
Have fun!
I decided to use infinispan distributed grid to extend my application to support cluster but I encountered a limitation when using this kind of shared resource.
How can I retrieve all the values or keys in the Distributed cache? I'm asking this because in their documentation all the collection methods are not recommended for running in production (meaning keySet()).
Right now I have a local bucket/cache with the pairs key/value but in order to process the values I need to retrieve the keys and iterate throught the set.
Set set = cache.keySet();
When having a large number of entries in the local cache, the keySet() returns a copy and this is a heavy load for the memory.
I tried to use the query feature but there are some network calls if I want to find the values and I don't need that. Also the query feature does not support complex filters.
Do you know which is the best approach when using infinispan in production?
As this is an experimental phase I'm using the last infinispan version.
Thanks a lot.
Map/Reduce functionality allows you to iterate over all the entries stored and also migrates the logic where the data is, so doesn't add a lot of burden.
We are using keySet() on production for informational purpose only. Performance do not seem to be a big issue under low data loads but of course you should use such methods with great care because they could have large performance impact depending by how you are using the cache. Remote cache queries seems a pretty handy feature to me.
We are setting up a Jboss cluster and we are building an own distributed cache solution built upon Jboss cache (Cant use it as 2nd level cache to ORM layer in our case). We want to use invalidation and not replication as cache mode. As far as i can see after (very) little testing both solutions seem to work, objects are put into the cache and objects seem to be evicted when they are updated on any of the servers.
This leads me to believe that PojoCache with AOP instrumentation is only needed when using replication so that you can replicate only updated field values and not whole objects. Am I correct here or are there any other advantages with using PojoCache over TreeCache in our scenario? And if PojoCache have advantages, do we still need AOP instrumentation and to annotate our entities with #PojoCacheable (yes, we are using JBCache 1.4.1) since we are not using relication?
Regards
Jonas Heineson
PoJoCache has the ability through AOP to:
only replicate changed fields and not whole objects. Makes a difference if e.g. your person object containes a huge image of the person and you only change the password
detect changes and thus can automatically put them on the list to be replicated.
TreeCache (plain) does not need AOP, but can thus not replicate individual fields or detect what has changed so that you need to trigger replication yourself.
If you don't replicate, those points are probably irrelevant.
IIrc, you don't need the #PojocaCacheable annotation for Pojo cache - without it, you need to specify the classes to be enhanced in a different way.
I have the feeling that if you are not replicating, the plain TreeCache will be enough.