Frequent TcpCommunicationSpi error in Gridgain Client node - ignite

We have 3 node Gridgain cluster in Kubernetes. Gridgain version is 8.8.8 and client nodes written in java given following error frequently. As I sew data inserted and SQL queries are working. What is the reason for this and is this effected to data because we use persistence enabled gridgain server. So data loss should not possible.
2021-10-01 06:07:45:113 ERROR TcpCommunicationSpi: Failed to read data from remote connection (will wait for 2000ms).
org.apache.ignite.IgniteCheckedException: Failed to select events on selector.
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2300)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1891)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException: null
at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:204)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2109)
... 3 common frames omitted

Related

Exception in thread "main" org.apache.kafka.common.KafkaException: Failed to construct kafka consumer: SSL Error

I have this error
Caused by: org.apache.kafka.common.KafkaException: SSL trust store is specified, but trust store password is not specified.
at org.apache.kafka.common.security.ssl.SslFactory.createTruststore(SslFactory.java:184)
at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:104)
at org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:41)
Although same code is deployed in other environments work just fine.
The code is trying to work as a kafka consumer
If I query using Kafka console consumer, it fetches data properly

Strange intermittent issue with MariaDB Connector/J and AWS RDS Aurora

Config goes as follows:
AWS RDS Aurora cluster, single region, single master, two read replicas
Java nodes running within the same VPC
MariaDB Connector/J with Aurora failover mode
HikariCP as the connection pool
HikariCP has the default maxLifetime of 30 minutes
Aurora has the default wait_timeout of 8 hours
In order to instruct MariaDB Connector/J to execute a given query against an Aurora read replica vs master, I'm calling Connection.setReadOnly(true) after obtaining it from HikariDataSource.
Everything works well except... about twice a week on average, 2% of the Java node fleet will hit an exception while trying to query a read replica. The exceptions occur on the different nodes within a second or two from one another, against the same replica.
The underlying exception stack looks as follows:
Caused by: java.sql.SQLInvalidAuthorizationSpecException: (conn=2445137) Communications link failure with secondary host xxx.xxx.us-east-1.rds.amazonaws.com:3306. (conn=1644010) Connection is closed
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.createException(ExceptionFactory.java:66)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionFactory.create(ExceptionFactory.java:153)
at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:273)
at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:229)
at org.mariadb.jdbc.ClientSidePreparedStatement.execute(ClientSidePreparedStatement.java:149)
at org.mariadb.jdbc.ClientSidePreparedStatement.executeQuery(ClientSidePreparedStatement.java:163)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
at com.querydsl.sql.AbstractSQLQuery.iterateSingle(AbstractSQLQuery.java:370)
... 21 more
Caused by: java.sql.SQLException: Communications link failure with secondary host xxx.xxx.us-east-1.rds.amazonaws.com:3306. (conn=1644010) Connection is closed
on HostAddress{host='xxx.xxx.us-east-1.rds.amazonaws.com', port=3306},master=false. Driver has reconnect connection
at org.mariadb.jdbc.internal.failover.AbstractMastersListener.throwFailoverMessage(AbstractMastersListener.java:563)
at org.mariadb.jdbc.internal.failover.FailoverProxy.handleFailOver(FailoverProxy.java:391)
at org.mariadb.jdbc.internal.failover.FailoverProxy.executeInvocation(FailoverProxy.java:324)
at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:294)
at com.sun.proxy.$Proxy27.executeQuery(Unknown Source)
at org.mariadb.jdbc.ClientSidePreparedStatement.executeInternal(ClientSidePreparedStatement.java:220)
... 26 more
Next, if I review the MySQL error log for that Aurora instance, it will report aborting the connections seconds or sometimes a minute after the exceptions occur on the Java nodes:
1644010 [Note] Aborted connection 1644010 to db: 'xxx' user: 'xxx' host: 'xxx' (Got an error reading communication packets)
Has anyone run into anything similar? Any ideas as to what could be causing this?

Ignite Connectivity Issues

We have 3 Nodes of Ignite 2.7.6 cluster where we defined off heap memory/data region "datawarm" with max size 10GB and persistence enabled. We are facing very strange issue for more than a month. While connecting via SQLLine or Java Thin Client API, suddenly out of three nodes only 2 or sometime 1 node only providing the response (random order each time). After restart of the cluster the issue got resolved each time however after 3-4 hours of restart again it started. During checking the logs on the node where the connection not established only found below. We don't have any clue how to resolve this.
Jul 21 18:08:13 node1.example.com Ignite[6097]: 2020-07-21 18:08:13,219 DEBUG c.e.d.Logger [grid-nio-worker-client-listener-0-#30] Got client connection from address: /10.0.0.12:42262

classNotFoundException when running two ignite servers

when I am trying to run two ignite servers I am getting the following errors.
1) Failed to find class with given class loader for unmarshalling.
2) Caused by: java.lang.ClassNotFoundException: rg.netlink.app.config.ServerConfigurationFactory$2
even after peerClassLoadingENabled on both servers this error keeps persisting.
please help.
How can I run two ignite servers. Did anybody successfully run two ignite servers.
Can you figure out what's ServerConfigurationFactory$2?
I would imagine that for some reason your Ignite node contains some class in its configuration which is absent on other nodes. Nodes pass their configuration on discovery so this will cause problems. Make sure that you only use stock Ignite configuration classes and do not override them with custom implementations/wrappers.

SAP HANA Vora distributed log service refused to start

I installed SAP HANA Vora on a 3 node MapR cluster. While trying to bring up Vora service via Vora Manager UI, I get the following error:
Error occurred while starting all services: vora-dlog refused to
start. Cannot continue Start All Jobs. Error: There are no health
checks registered for service vora-dlog.
The vora-manager log file displays the following error:
vora.vora-dlog: [c.xxxxxxx] : Error while creating dlog store.
nomad[xxxxx]: client: failed to query for node allocations: no known servers
nomad[xxxxx]: client:rpcproxy: No servers available.
All 3 nodes in the cluster have 2 IPs in different subnets. Can anyone suggest how to configure a health check for consul? And what else can be wrong here?
The messages from the VoraMgr log file are not sufficient to understand the actual problem. Are there other messages from dlog before 'Error while creating dlog store.'? I have seen that message e.g. if the disk was full and the dlog could not create its local persistency.
Also, the 2 different networks could cause an issue like you described. You can configure the use of different network interface names on different nodes. However, on each node all Vora services as well as the Vora Manager must use the same network interface name. If using 2 different subnets the configuration must allow network traffic between them. Could you give some additional info on your topology + network configuration?