Create Cluster Configuration in Ignite web console is not working - ignite

Create CLuster COnfig The "Create Cluster Configuration button" not working from webconsole https://console.gridgain.com/configuration/overview..
Moreover when i launch the console.gridgain.com from my browser. I am getting below error
Failed to load clusters: Cannot start/stop cache within lock or transaction [cacheNames=ClusterCache, operation=dynamicStartCache]

I think this means you have tried to use getOrCreateCache from within an Apache Ignite Transaction.
I recommend getting all of your caches before you start a Transaction. Maybe there's something else but you will need to share more details.

Seems Gridgain ignite team has made a fix and it is now resolved.

Related

AWS EKS node group migration stopped sending logs to Kibana

I encounter a problem while using EKS with fluent bit and I will be grateful for the community help, first I'll describe the cluster.
We are running EKS cluster in a VPC that had an unmanaged node group.
The EKS cluster network configuration is marked as "public and private" and
using fluent-bit with Elasticsearch service we show logs in Kibana.
We've decided that we want to move to managed node group in that cluster and therefore migrated from the unmanaged node group to a managed node group successfully.
Since our migration we cannot see any logs in Kibana, when getting the logs manually from the fluent bit pods there are no errors.
I toggled debug level logs for fluent bit to get better look at it.
I can see that fluent-bit gathers all the log files and then I saw that we get messages:
[debug] [out_es] HTTP Status=403 URI=/_bulk
[debug] [retry] re-using retry for task_id=63 attemps=3
[debug] [sched] retry=0x7ff56260a8e8 63 in 321 seconds
Furthermore, we have managed node group in other EKS clusters but we did not migrate to them they were created with managed node group.
The created managed node group were created from the same template we have from working managed node group with the only difference is the compute power.
The template has nothing special in it except auto scale.
I compared between the node group IAM role of working node group logs and my non working node group and the Roles seems to be the same.
As far for my fluent bit configuration I have the same configuration in few EKS clusters and it works so I don't think that the root cause but if anyone thinks something else I can add it if requested.
Someone had that kind of problem? why node group migration could cause such issue?
Thanks in advance!
Lesson learned, always look at the access policy of the resource you are having issue with, maybe it does not match your node group role

Ignite thin Client unstable behavior

I am newbie to ignite and trying to play around with the example https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/client/ClientPutGetExample.java
i first tried the example with one server node and executed the client everything work fine.
then i started a second node with the following config
IgniteClient igniteClient = Ignition.startClient(new ClientConfiguration().setAddresses("127.0.0.1:10800","127.0.0.1:10801" )))
with CacheMode.REPLICATED;
i re-run the code it work fine, then i kept the same config and i shut down
one of the nodes
then i re-run the code the result is unstable sometimes it gives me Ignite cluster is unavailable sometimes it gives me an empty cache
Thin client put-get example started.
Created cache [put-get-example].
Loaded [null] from the cache.
1-as per the documentation ignite thin client is supposed to failover one of the
running nodes.
2- why the cache is note replicated?
is there something that i am missing here
thank you for your help
This looks like IGNITE-11599 - Thin Client will not failover properly if some of addresses were not up when it started.
It is fixed recently but did not get in any released versions. I'm afraid you will have to work around it by doing manual failovers.

Unable to configure Hazelcast clustering at openfire server

I am trying to setup Hazelcast cluster in my LAN. There are two open fire server openfire1 and openfire2 behind a HAProxy sharing a common db server.
I have installed and configured open fire and hazelcast on both. Now when I see from Openfire Admin Console in the clustering section: My openfire1 is not listing the other node in cluster. openfire2 also does not allow me to enable cluster,whenever I see the error I see following log:
2014.12.06 12:12:46 com.jivesoftware.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:466)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 314)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:334)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.admin.system_002dclustering_jsp._jspService(system_00 2dclustering_jsp.java:123)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1359)
at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:11 8)
at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)
at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)
at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1330)
at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)
Plz tell me if i am doing something wrong ?
now i have overcome this problem only by using the correct version of hazlecast.jar corresponding to openfire 3.9.3 ;The Hazelcast plugin version corresponding to the Openfire release (3.9.3) is 1.2.0 .
Direct link to download : https://community.igniterealtime.org/external-link.jspa?url=http%3A%2F%2Fwww.igniterealtime.org%2Fprojects%2Fopenfire%2Fplugins-dev%2Fhazelcast.jar

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

Where are Ambari Macros set

In Knox config file in Ambari we have defined:
<url>http://{{namenode_host}}:{{namenode_http_port}}/webhdfs</url>
The problem is we have 2 namenodes, one active and one passive for high availability. Our active namenode01 failed so namenode02 became active.
This caused problems for a lot scripts as they were hardcoded to point to namenode01. So we used a command to failover namenode02 back to namenode01 using a terminal, not Ambari.
Now, the macro {{namenode_host}} is defined as namenode02 and not namenode01.
So, where is {{namenode_host}} defined?
Or, do we need to failover namenode01 to namenode02, then failover again to namenode01 using Ambari to update the macro?
If we need to failover the namenode using Ambari, I'm assuming we need to select the "Restart" option? There isn't a direct failover command.
See issue here:
https://issues.apache.org/jira/browse/AMBARI-12763
This was committed to Ambari to support HA mode for Knox. However if you're still looking for the location take a look at the file that's edited in the patch. That file is the place where the macros are set. You'll have to find it on your local machine though.
Should be something like params_linux.py