recently we've deployed the application with ignite cache enabled. We have three servers and the cache mode is replicated. All three servers are server nodes. After deploying the application, the exception "cache has been closed or destroyed" is thrown randomly.
I've checked the previous question from StackOverflow (Apache Ignite Cache Error:caused by java.lang.illegalstateexception : Cache has been closed or destroyed: cacheName) but it cannot solve my problem.
Is there anyone encountered the same exception before? If yes, how to solve it?
This exception means that either IgniteCache.close() was called on a particular instance of IgniteCache that you're using, of IgniteCache.destroy/Ignite.destroyCache(..) was called for this cache anywhere in the cluster. Check your code and make sure this is not happening.
Another scenario is client disconnection described in the thread link to which you provided.
Related
when I am trying to run two ignite servers I am getting the following errors.
1) Failed to find class with given class loader for unmarshalling.
2) Caused by: java.lang.ClassNotFoundException: rg.netlink.app.config.ServerConfigurationFactory$2
even after peerClassLoadingENabled on both servers this error keeps persisting.
please help.
How can I run two ignite servers. Did anybody successfully run two ignite servers.
Can you figure out what's ServerConfigurationFactory$2?
I would imagine that for some reason your Ignite node contains some class in its configuration which is absent on other nodes. Nodes pass their configuration on discovery so this will cause problems. Make sure that you only use stock Ignite configuration classes and do not override them with custom implementations/wrappers.
I've noticed a strange behaviour of Apache Ignite which occurs fairly reliably on my 5-node Apache Ignite cluster but can be replicated with even a two node cluster. I use Apache Ignite 2.7 for Net in the Linux environment deployed in a Kubernetes cluster (each pod hosts one node).
The problem as follows. Assume we've got a cluster which consists of 2 Apache Ignite nodes, A and B. Both nodes start and initialize. A couple of Ignite Services are deployed on each node during the initialization phase. Among all, a service named QuoteService is deployed on the node B.
So far so good. The cluster works as expected. Then, the node B crashes or gets stopped for whatever reason and then restarts. All the ignite services hosted on the node B get redeployed. The node rejoins the cluster.
However, when a service on the node A is trying to call the QuoteService expected to be available on the node B, an exception gets thrown with the following message: Failed to find deployed service: QuoteService. It is strange as the line registering the service did run during the restart of the node B:
services.DeployMultiple("QuoteGenerator", new Services.Ignite.QuoteGenerator(), 8, 2);
(deploying the service as singleton does not make any difference)
A restart of either node A or node B separately does not help. The problem can only be resolved by shutting down the entire Ignite cluster and restarting all the nodes.
This condition can be reproduced even when 5 nodes are running.
This bug report may look a bit unspecific but it is hard to specify the concrete reproduce steps as the replication involves setting up at least two ignite nodes and stopping and restarting them in a sequence. So let me pose the questions this way:
1. Have you ever noticed such a condition or did you received similar reports from other users?
2. If so, what steps can you recommend to address this problem?
3. Should I wait for the next version of Apache Ignite as I read that the service deployment mechanism is currently being overhauled?
UPD:
Getting a similar problem on a running cluster even if I don't stop/start nodes. I will open another question on SA and it seems to have a different genesis.
I've figured out what caused the described behavior (although I don't understand why exactly).
I wanted to ensure that the Ignite service is only deployed on the current node so I used the following C# code to deploy the service:
var services = ignite.GetCluster().ForLocal().GetServices();
services.DeployMultiple("FlatFileService", new Services.Ignite.FlatFileService(), 8, 2);
When I changed my code to rely only on a NodeFilter to limit the deployment of the service to a specific set of nodes and got rid of "GetCluster().ForLocal().", the bug disappeared. The final code is as follows:
var flatFileServiceCfg = new ServiceConfiguration
{
Service = new Services.Ignite.FlatFileService(),
Name = "FlatFileService",
NodeFilter = new ProductServiceNodeFilter(),
MaxPerNodeCount = 2,
TotalCount = 8
};
var services = ignite.GetServices();
services.DeployAll(new[] { flatFileServiceCfg, ... other services... });
It is still strange, however, why the old code did work until the topology changed.
We are creating a simplemessagecontainerlistener for every host and after every message we stop the container. Is it possible to close the rabbitMQ connection from the container?. Currently we are running into memory leak in our application due to many rabbitMQ threads to the hosts.
Why a new container start/stop for each message? Why not use rabbitTemplate.receive() instead.
There is only one connection by default; channels are cached according to the configuration and are only cached when closed if you have increased the cache size.
What is the nature of the "memory leak" ?
When asking questions like this, show your configuration.
While checking SystemOut.log during a reported slowness in the application I found StaleConnectionException occuring frequently. This exception was not observed earlier and I doubt that if this is the reason for slowness and needs to be resolved.
StaleConnectionException usually happens, when WebSphere was disconnected from the database. It can be caused by database restart or by network issue e.g. firewall which disconnects requests after some time. If it happens frequently, make sure that Purge policy for that datasource is set to Entire Pool, not Failing Connections. If you have firewall between WAS and DB set Aged timeout to lower value than timeout on firewall (try with 1200 for example).
Can this be a reason for slowness?
It can a bit, as when application gets StaleConnectionException, that request is failing and either application has implemented logic to retry it or end user will get error and will retry the same request.
My application has 50 service endpoints (such as /mysite/myService.svc). It's hosted in IIS. Intermittently (once every two or three days) a service stops responding. It's never the same service that hangs. While a service is hung, some of the other services work fine and some other are also hung.
All clients (from different computers) get this error:
ServiceModel.CommunicationException
Message: An error occurred while receiving the HTTP response to
https://server/mysite/myservice1.svc.
This could be due to the service endpoint binding not using the HTTP
protocol. This could also be due to an HTTP request context being
aborted by the server (possibly due to the service shutting down).
See server logs for more details.
No exceptions are raised by the server when the client attempts to call the service that is hung. All I have is that error on the client side.
I have to manually recycle the application pool to fix the problem.
Do you know what could be the cause? How can I investigate this issue? I'm willing to take a memory dump of the worker process when a service is hung but I would not know what to search for in the dump.
Update (Aug 13 2009): I have almost ruled out the idea that the server runs out of connections (see comment in Shiraz Bhaiji's answer). I might have a new lead: I log all server-side exceptions in a log file. So in theory, when this occurs on the client, no exceptions are raised on the server; otherwise I'd have proof of that in my logs. But what if an error does occur on the server but is happening at a low level where exceptions are not routed to my exception handling code? I have posted this question about scenarios where low level exceptions cannot be handled. I'll keep you informed of the progress of my investigation.
Sounds like you are running out of connections.
By default WCF has a timeout and therefore holds a connection open for 10 mins.
When you recycle the app pool all connections are closed, and therefore things work again.
To fix it check your code to make sure that you close connections / dispose of proxies.
To resolve this, we set establishSecurityContext to False on the binding.
I have not come across this particular issue but would suggest to turn on tracing/message logging for the WCF service in the config for the service and/or the client app (if you have control over that). I've done this in the last few days for a service that I needed to troubleshoot.
The MSDN link here is a good starting point.
Also see the table in this post for the varying levels of trace detail you can configure. There are several levels which can go from exception only logging to full message details. It is quite quick to set this up in the app.config file.
To parse the log file output use the SvcTraceViewer.exe that comes with the Windows SDK, which if you have it installed should be located in this folder: C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin