ClassNotFoundException while deploying new application version (with changed session object) in active Apache Ignite grid - ignite

We are currently integrating Apache Ignite in our application to share sessions in a cluster.
At this point we can successfully share sessions between two local tomcat instances, but there's one use case, which is not working so far.
When running the two local instances with the exact same code, it all works great. But when the Ignite logic is integrated in our production cluster, we'll encounter the following use case:
Node 1 and Node 2, runs version 1 of the application
At this point we'd like to deploy version 2 of the application
Tomcat is stopped at Node 1, version 2 is deployed, and at the end of the deployment Tomcat at Node 1 is started again.
We now have Node 1 with version 2 of the code and Node 2, still with version 1
Tomcat is stopped at Node 2, version 2 is deployed, and at the end of the deployment Tomcat at Node 2 is started again.
We now have Node 1 with version 2 of the code and Node 2, with version 2
Deployment is finished
When reproducing this use case locally with two tomcat instances in the same grid, the Ignite web session clustering fails. What I tested, was removing one 'String property' of a class (Profile) which resided in the users session. When starting Node 1 with this changed class, I get the following exception:
Caused by: java.lang.ClassNotFoundException:
Optimized stream class checksum mismatch
(is same version of marshalled class present on all nodes?) [expected=4981, actual=-27920, cls=class nl.package.profile.Profile]
This will be a common/regular use case for our deployments. My question is: how to handle this use cases? Are there ways in Ignite to resolve/workaround this kind of issue?

In my understanding your use case perfectly fits for Ignite Binary objects [1].
This feature allows to store objects in class-free format and to modify objects structure in runtime without full cluster restart when a version of an object is changed.
Your Person class should implement org.apache.ignite.binary.Binarylizable interface that will give you full control on serialization and deserialization logic. With this interface you can even have two nodes in the cluster that use different versions of Person class at both deserialization & serialization time by reading/writing only required fields from/to binary format.
[1] https://apacheignite.readme.io/docs/binary-marshaller

Related

AWS EKS node group migration stopped sending logs to Kibana

I encounter a problem while using EKS with fluent bit and I will be grateful for the community help, first I'll describe the cluster.
We are running EKS cluster in a VPC that had an unmanaged node group.
The EKS cluster network configuration is marked as "public and private" and
using fluent-bit with Elasticsearch service we show logs in Kibana.
We've decided that we want to move to managed node group in that cluster and therefore migrated from the unmanaged node group to a managed node group successfully.
Since our migration we cannot see any logs in Kibana, when getting the logs manually from the fluent bit pods there are no errors.
I toggled debug level logs for fluent bit to get better look at it.
I can see that fluent-bit gathers all the log files and then I saw that we get messages:
[debug] [out_es] HTTP Status=403 URI=/_bulk
[debug] [retry] re-using retry for task_id=63 attemps=3
[debug] [sched] retry=0x7ff56260a8e8 63 in 321 seconds
Furthermore, we have managed node group in other EKS clusters but we did not migrate to them they were created with managed node group.
The created managed node group were created from the same template we have from working managed node group with the only difference is the compute power.
The template has nothing special in it except auto scale.
I compared between the node group IAM role of working node group logs and my non working node group and the Roles seems to be the same.
As far for my fluent bit configuration I have the same configuration in few EKS clusters and it works so I don't think that the root cause but if anyone thinks something else I can add it if requested.
Someone had that kind of problem? why node group migration could cause such issue?
Thanks in advance!
Lesson learned, always look at the access policy of the resource you are having issue with, maybe it does not match your node group role

Infinspan console shows only one node for clustered servers in the cache node view

We are working with infinspan version 9.4.8 in a domain mode with cluster of two hosts servers with two nodes.
In the statistics of the cluster view we can see that both nodes get hits but when we look at the view of the cache nodes for a distributed cache we can see only one node in the nodes view
In console of infinspan 8 we used to have the two nodes in the cache nodes view but after upgrading to version 9 it is not the case
Could you please advise if it is bug in the console for version 9.4.8 or something is missed in the configuration
This is a bug which has just been fixed and will be included in the upcoming 9.4.18.Final release. The issue is tracked by ISPN-11265.
In the future please utilise the Infinispan JIRA directly if you suspect a bug.

Ignite thin Client unstable behavior

I am newbie to ignite and trying to play around with the example https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/client/ClientPutGetExample.java
i first tried the example with one server node and executed the client everything work fine.
then i started a second node with the following config
IgniteClient igniteClient = Ignition.startClient(new ClientConfiguration().setAddresses("127.0.0.1:10800","127.0.0.1:10801" )))
with CacheMode.REPLICATED;
i re-run the code it work fine, then i kept the same config and i shut down
one of the nodes
then i re-run the code the result is unstable sometimes it gives me Ignite cluster is unavailable sometimes it gives me an empty cache
Thin client put-get example started.
Created cache [put-get-example].
Loaded [null] from the cache.
1-as per the documentation ignite thin client is supposed to failover one of the
running nodes.
2- why the cache is note replicated?
is there something that i am missing here
thank you for your help
This looks like IGNITE-11599 - Thin Client will not failover properly if some of addresses were not up when it started.
It is fixed recently but did not get in any released versions. I'm afraid you will have to work around it by doing manual failovers.

What can i do when allow_store_upgrade fails?

I'm using neo4j in a glassfish server through a modified version of Alex Smirnov neo4j JCA connector.
My version is available here : https://github.com/Riduidel/neo4j-connector
I'm using this connector with neo4j 1.8.
As a consequence, when i want to use it, i first install the connector in my Glassfish application server, then use this connector in applications wishing to connect to.
It works OK when using it with fresh stores.
But, when using it with stores created with previous version, I encounter weird bugs.
Typically, I got today the following stack
javax.resource.spi.ResourceAllocationException: Error in allocating a connection. Cause: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#3bbd53b1 from NONE to STOPPED
...
...
.../* JCA internal exception stack */
...
...
Caused by: com.sun.appserv.connectors.internal.api.PoolingException: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#494b584c from NONE to STOPPED
at com.sun.enterprise.resource.pool.ConnectionPool.createSingleResource(ConnectionPool.java:924)
at com.sun.enterprise.resource.pool.ConnectionPool.createResource(ConnectionPool.java:1185)
at com.sun.enterprise.resource.pool.datastructure.RWLockDataStructure.addResource(RWLockDataStructure.java:98)
... 66 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Failed to transition org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader#494b584c from NONE to STOPPED
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:388)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:82)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:116)
at org.neo4j.kernel.InternalAbstractGraphDatabase.run(InternalAbstractGraphDatabase.java:227)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:79)
at org.neo4j.kernel.EmbeddedGraphDatabase.<init>(EmbeddedGraphDatabase.java:70)
at com.netoprise.neo4j.AbstractNeo4jManagedConnectionFactory.createDatabase(AbstractNeo4jManagedConnectionFactory.java:165)
at com.netoprise.neo4j.AbstractNeo4jManagedConnectionFactory.createDatabase(AbstractNeo4jManagedConnectionFactory.java:127)
at com.netoprise.neo4j.Neo4jManagedConnectionFactory.createManagedConnection(Neo4jManagedConnectionFactory.java:163)
at com.sun.enterprise.resource.allocator.ConnectorAllocator.createResource(ConnectorAllocator.java:160)
at com.sun.enterprise.resource.pool.ConnectionPool.createSingleResource(ConnectionPool.java:907)
... 68 more
Caused by: java.lang.AssertionError
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:265)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.cleanWriteLocks(LuceneDataSource.java:260)
at org.neo4j.index.impl.lucene.LuceneDataSource.<init>(LuceneDataSource.java:185)
at org.neo4j.index.lucene.LuceneIndexProvider.load(LuceneIndexProvider.java:72)
at org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader.loadIndexImplementations(InternalAbstractGraphDatabase.java:1171)
at org.neo4j.kernel.InternalAbstractGraphDatabase$DefaultKernelExtensionLoader.init(InternalAbstractGraphDatabase.java:1143)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:382)
... 78 more
A fast inspection reveals that this exception is linked to an undeletable "write.lock" file. My write.lock file can't be deleted because I guess migration is not over.
How can I make sure the migration is done before using it without migrating it outside of Glassfish ?
Is there a way to ahve exclusive store migrations in that context ? And if so, how ?
And is it the solution for my problem ?
EDIT 1 Added exception message.
EDIT 2 All this only happen when loaded graph was previously used with a Neo4j 1.5 and now with a Neo4j 1.8 connector. when graph is created by connector, absolutely no error happens.
EDIT 3 Strangely enough, this happens as long as there is no debugger plugged into that code : as soon as I try to debug it, the issue stop appearing. Which make me thinking there may be a migration cleanup mechanism that remvoe the write lock once migration is done, and this cleanup is not performed when using my neo4j JCA connector. Is it a valid observation ?
I am not too familiar with the JCA connector, but to be sure, I would just write a very small migration java class that opens the database, lets it migrate and shut down. Then try it again with the JCA connector?
After further investigations, truth revealed to not be in multiple calls to the EmbeddedGraphDatabase constructor, but instead to multiple identicail IndexProvider being loaded.
I use neo4j embedded in an open-source JCA connector.
In this connector, the org.neo4j.kernel.Service class is replaced by a custom one which contains a workaround regarding service loading for JBoss non shared libraries.
Unfortunatly, in our context, this workaround implies loading twice the index provider :
once using the EAR classloader
once using the Glassfish library classloader.
Why ?
Because, as our neo4j instance is using for application data AND for authentication, neo4j connector jar is put in ${domain}/lib. As a consequence, due to Classloader delegation in application server, the EAR classloader delegates to the Glassfish library classloader, and find this way the LuceneIndexProvider. Then, the Glassfish library classloader is directly used to load the same LuceneIndexProvider class.
This concludes by us having two LuceneIndexProvider objects, both trying to migrate the lucene index. Which lead to the AssertionError as the write.lock file created by the first object should be deleted by the second one, which can't do that.
I've then changed slightly that very specific class to use JBoss workaround only when default loading mechanism do not return any class (seee commit here). This small change worked like a charm, so I think you can considered this issue as fixed.

Servlet Exception + Class Cast Exception + Glassfish + Netbeans + JPA Entities + Vaadin

I get this error:
StandardWrapperValve[Vaadin Servlet]: PWC1406: Servlet.service() for servlet Vaadin Servlet threw exception
java.lang.ClassCastException: com.delhi.entities.Category cannot be cast to com.delhi.entities.Category
when I try to run my webapps on glassfish v2.
Category is a JPA entity object
the offending code according to the server log is:
for (Category c : categories) {
mymethod();
}
categories is derived from:
List<Category> categories = q.getResultList();
Any idea what went wrong?
This is a class loader issue. If a class is loaded by different class loaders, it's objects cannot be assigned to each other. You have probably passed an object from one WAR into another one. There are several options to resolve this:
Put all your code into a single WAR.
Use some form of remoting between your WARs. Serialization takes care of the class loader problem.
Try putting all you WARs into a single EAR. If that doesn't work, put all code into JARs that are on the EAR's Classpath in the MANIFEST.MF.
I once had the same problem and the environment I had was following:
I had Glassfish v4
Netbeans with following projects
webpage war project containing entities
and ear project with that webpage war project
The problem was that in war's project settings I had checked [x] Run>Deploy on save. This was causing deploying war project everyime I hit save. It was sometimes leading to PermGen (memory) problems and unability to deploy EAR correctly (because e.g. in between undeploying and deploying EAR - this "crazy" Netbeans was deploying this war).
Solution: If Netbeans && using EAR, then uncheck deploy on save in project properties.
EDIT:
it seems that this error is connected with
SEVERE: The web application [/faces] created a ThreadLocal with key of type [org.glassfish.pfl.dynamic.codegen.impl.CurrentClassLoader$1] (value [org.glassfish.pfl.dynamic.codegen.impl.CurrentClassLoader$1#249ea63a]) and a value of type [org.glassfish.web.loader.WebappClassLoader] (value [WebappClassLoader (delegate=true; repositories=WEB-INF/classes/)]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.
I've had same problem today. Solution was closing EntityManagerFactory after use.
This answer helped me:
https://stackoverflow.com/a/13823219/2455506
I'm experiencing this problem too with Glassfish v2 and Glassfish v3.
Can I ask you a question: Are you attempting to initialize any persistence object when the application is deployed (through a servlet loaded on startup or a context listener)?
Like bguiz, I've noticed this problem only happens on redeploy. A new deploy to a freshly restarted Glassfish server, never has this problem.
Like FelixM mentioned, I'm convinced this is a class loader issue, however I don't believe it's an issue with multiple wars (I only have 1 deployed to my server). In Glassfish 3, I can see that my WAR is utilizing 2 Glassfish "engines". One for the web(war) and one for the jpa. From what I understand, these are different containers each with their own classloader. I'm guessing Glassfish v2 works in the same manner.
I'm using Spring and (re)initialize some persistence objects on (re)deploy. What I'm thinking, is that while the web engine is reinitializing the war, the jpa engine is still using the old class definitions. Often if I retry the redeploy after this initial failure, it may succeed (sometimes it may take more than one retry but eventually I can get it to succeed without a restart - having better success with Glassfish v3 than v2).
At this point I'm thinking that either these two classloaders are out of sync or there is some sort of race condition on redeploy allowing this operation to sometimes succeed. I've tried to force the classloader, writing code like this
HashMap<Object, Object> properties = new HashMap<Object, Object>();
properties.put(PersistenceUnitProperties.CLASSLOADER, this.getClass().getClassLoader());
entityManagerFactory = Persistence.createEntityManagerFactory(jpaContext, properties);
but it didn't seem to have any affect.
I'm also wondering if eliminating the initialization at startup could fix the problem, giving the appserver time to resynchronize both engines before using any jpa classes (which is why I asked my follow up question).
My observation is that it only happens when using a hot redeploy or a static redeploy. This only applies, of course, if you get a class cast exception where both the to and from classes are the same.
Workarounds:
Don't use undeploy and deploy instead of redeploy
Restart app server
Remove static members of the affected classes
Use a remote interface (serialization makes this go away)
IMO I think the class loader was unable to reload the class and the old version was reused, resulting in the error.
This article doesn't talk about this error directly, but it is good background info on how the class loader works.