<redisson.version>3.16.3</redisson.version>
readMode: SLAVE
Not sure why am I seeing this error.
caused by: org.redisson.client.RedisException: ERR WAIT cannot be used with slave instances. Please also note that since Redis 4.0 if a slave is configured to be writable (which is not the default) writes to slaves are just local and are not propagated.. channel: [id: 0x90c58500, L:/172.16.189.108:43192 - R:10.112.94.132/10.112.94.132:15003] command: (WAIT), promise: RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise#35eeb35e(incomplete)], params: [1, 1000]
at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:370)
at org.redisson.client.handler.CommandDecoder.decodeCommandBatch(CommandDecoder.java:271)
at org.redisson.client.handler.CommandDecoder.decodeCommand(CommandDecoder.java:210)
at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:137)
at org.redisson.client.handler.CommandDecoder.decode(CommandDecoder.java:113)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
at io.netty.handler.codec.ReplayingDecoder.callDecode(ReplayingDecoder.java:366)
Related
I have a service running a task definition with three containers:
service itself
envoy
x-ray daemon
And I want to trace and monitor my services interacting with each other with x-ray.
But I don't see any data in x-ray.
I can see the request logs and everything in the envoy logs but there are no error messages about missing connection to the x-ray daemon.
Envoy container has three env variables:
APPMESH_VIRTUAL_NODE_NAME = mesh/mesh-name/virtualNode/service-virtual-node
ENABLE_ENVOY_XRAY_TRACING = 1
ENVOY_LOG_LEVEL = trace
The x-ray daemon is pretty plain and has just a name and an image (amazon/aws-xray-daemon:1).
But when looking in the logs of the x-ray dameon, there is only the following:
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Initializing AWS X-Ray daemon 3.0.0
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] Using buffer memory limit of 76 MB
2022-05-31T14:48:05.042+02:00 2022-05-31T12:48:05Z [Info] 1216 segment buffers allocated
2022-05-31T14:48:05.051+02:00 2022-05-31T12:48:05Z [Info] Using region: eu-central-1
2022-05-31T14:48:05.788+02:00 2022-05-31T12:48:05Z [Error] Get instance id metadata failed: RequestError: send request failed
2022-05-31T14:48:05.788+02:00 caused by: Get http://169.254.169.254/latest/meta-data/instance-id: dial tcp xxx.xxx.xxx.254:80: connect: invalid argument
2022-05-31T14:48:05.789+02:00 2022-05-31T12:48:05Z [Info] Starting proxy http server on 127.0.0.1:2000
As far as I read, the error you can see in these logs doesn't affect the functionality (https://repost.aws/questions/QUr6JJxyeLRUK5M4tadg944w).
I'm pretty sure I'm missing a configuration or access right.
It's running already on staging but I set this up several weeks ago and I don't find any differences between the configurations.
Thanks in advance!
In my case, I made a copy-paste mistake by copying trailing line break into the name of the environment variable ENABLE_ENVOY_XRAY_TRACING which wasn't visible in the overview and only inside the text field.
We have a setup wherein, one ignite server node serves 15 to 20 thick client nodes and 40 to 50 thin client nodes, thin client connection is singlton,
In operation, some times we get below error,
org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable [sock=Socket[addr=hostnm19.hostx.com/10.13.10.19,port=30519,localport=57552]]
On the Server node, we are inserting data inside a third party store using CacheStoreAdapters
Don't know where it goes wrong since out of 100 operations one operation fails with the above error.
Also, let me know what can we do for this failure handling.
Apache Ignite version: 2.8
Edits: (Code Snippet)
ClientConfiguration cfg = new ClientConfiguration()
.setAddresses("host:port");
IgniteClient client = Ignition.startClient(cfg); // this client is singleton
client.getOrCreateCache("ABC_CACHE").put(key, val);
StatckTrace:
org.apache.ignite.client.ClientConnectionException: Ignite cluster is unavailable [sock=Socket[addr=hostnm19.hostx.com/10.13.10.19,port=30519,localport=57552]]
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:499)
at org.apache.ignite.internal.client.thin.TcpClientChannel.handleIOError(TcpClientChannel.java:491)
at org.apache.ignite.internal.client.thin.TcpClientChannel.access$100(TcpClientChannel.java:92)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:538)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.readInt(TcpClientChannel.java:572)
at org.apache.ignite.internal.client.thin.TcpClientChannel.processNextResponse(TcpClientChannel.java:272)
at org.apache.ignite.internal.client.thin.TcpClientChannel.receive(TcpClientChannel.java:234)
at org.apache.ignite.internal.client.thin.TcpClientChannel.service(TcpClientChannel.java:171)
at org.apache.ignite.internal.client.thin.ReliableChannel.service(ReliableChannel.java:160)
at org.apache.ignite.internal.client.thin.ReliableChannel.request(ReliableChannel.java:187)
at org.apache.ignite.internal.client.thin.TcpIgniteClient.getOrCreateCache(TcpIgniteClient.java:114)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:535)
... 36 more
You probably have network or NAT configured which will reset connections when not used, or even sporadically.
In this case, you will have to reconnect.
Another option, are you sure you are connecting to thin client port and not some other port?
I'm using RPC Pattern for processing my objects with RabbitMQ.
You suspect,I have an object, and I want to have that process finishes and After that send ack to RPC Client.
Ack as default has a timeout about 3 Minutes.
My process Take long time.
How can I change this timeout for ack of each objects or what I must be do for handling these like processess?
Modern versions of RabbitMQ have a delivery acknowledgement timeout:
In modern RabbitMQ versions, a timeout is enforced on consumer delivery acknowledgement. This helps detect buggy (stuck) consumers that never acknowledge deliveries. Such consumers can affect node's on disk data compaction and potentially drive nodes out of disk space.
If a consumer does not ack its delivery for more than the timeout value (30 minutes by default), its channel will be closed with a PRECONDITION_FAILED channel exception. The error will be logged by the node that the consumer was connected to.
Error message will be:
Channel error on connection <####> :
operation none caused a channel exception precondition_failed: consumer ack timed out on channel 1
Timeout by default is 30 minutes (1,800,000ms)note 1 and is configured by the consumer_timeout parameter in rabbitmq.conf.
note 1: Timeout was 15 minutes (900,000ms) before RabbitMQ 3.8.17.
if you run rabbitmq in docker, you can describe volume with file rabbitmq.conf, then create this file inside volume and set consumer_timeout
for example:
docker compose
version: "2.4"
services:
rabbitmq:
image: rabbitmq:3.9.13-management-alpine
network_mode: host
container_name: 'you name'
ports:
- 5672:5672
- 15672:15672 ----- if you use gui for rabbit
volumes:
- /etc/rabbitmq/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
And you need create file
rabbitmq.conf
on you server by this way
/etc/rabbitmq/
documentation with params: https://github.com/rabbitmq/rabbitmq-server/blob/v3.8.x/deps/rabbit/docs/rabbitmq.conf.example
Slave status :
Last_IO_Errno: 1595
Last_IO_Error: Relay log write failure: could not queue event from master
Last_SQL_Errno: 0
from error log :
[ERROR] Slave I/O for channel 'db12': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name toku10-bin.000063<D1> log_pos 97223067, Error_code: 1623
[ERROR] Slave I/O for channel 'db12': Relay log write failure: could not queue event from master, Error_code: 1595
I tried to restarting the slave_io thread for many times, still its same.
we need to keep on start io_thread whenever it stopped manually, hope its bug from percona
I have simply written shell and scheduled the same for every 10mins to check if io_thread is not running , start slave io_thread for channel 'db12';. It's working as of now
I would like to SSH into a remote machine running a gridgain instance and connect to it from a local gridgain instance. Can this be done?
How is the gridgain network connection being done? As far as I could sse the node spins up and listens on the first available port on 47100-47200. But it opens some more ports too.
It seems not be sufficient to just e.g. forward 47100 on the remote machine (the remote machines gridgain port) to local 47100. Probably the communication is not just client server but symmetrical with the remote node trying to connect to my home node?
Is there documentation on the network protocol?
I tried a symetrically forwarding the
GridTcpCommunicationSpi.DFLT_PORTs (47100+) and
GridTcpDiscoverySpi.DFLT_PORTs (47500+)
ports.
The nodes are able to connect. On the local node I first get this warning:
WARN GridTcpCommunicationSpi - Connect timed out (consider increasing 'connTimeout' configuration property) [addr=/10.240.136.167:47100]
WARN GridTcpDiscoverySpi - Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 5000.
WARN GridDhtPreloader - <gg-utility-sys-cache> Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
WARN GridTcpDiscoverySpi - Timed out waiting for message to be read (most probably, the reason is in long GC pauses on remote node. Current timeout: 5000.
This is a timeout when somehow trying to connect to connect to 10.240.136.167:47100 - which is the remote machines local IP, which is obviously impossible.
But it looks nice as I get the following:
INFO GridDiscoveryManager - Topology snapshot [ver=2, nodes=2, CPUs=6, heap=2.7GB]
On executing the following broadcast test:
grid.compute().broadcast(new GridRunnable() {
#Override
public void run() {
System.out.println("hello!");
}
});
I get this fatal error on the remote machine, whatever it may be:
[SEVERE][gridgain-#9%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=at$
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[19:58:02,237][SEVERE][gridgain-#11%pub-null%][GridJobProcessor] Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.a$
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$1, taskClsName=at.ac.ait.is.infrase$
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
On the client side I don't see anything but:
INFO GridDeploymentLocalStore - Class locally deployed: class nix.GoogleGridRun$1
hello!
When I try to push the broadcast again via the debugger, then I get the following on the local machine and the same error message as before on the remote machine:
ERROR GridTaskWorker - Failed to obtain remote job result policy for result from GridComputeTask.result(..) method (will fail the whole task): GridJobResultImpl [job=o.g.g.kernal.processors.closure.GridClosureProcessor$10#7e89183d, sib=GridJobSiblingImpl [sesId=4c17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, nodeId=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, isJobDone=false], jobCtx=GridJobContextImpl [jobId=0d17983b841-ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, attrs={}], node=GridTcpDiscoveryNode [id=ef0084a6-f6a7-4501-87a0-3c5eb7c72bca, addrs=[10.240.136.167, 127.0.0.1], sockAddrs=[/10.240.136.167:47500, /10.240.136.167:47500, /127.0.0.1:47500], discPort=47500, order=1, loc=false, ver=6.5.0#20140925-sha1:6dc3d773], ex=class o.g.g.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
, hasRes=true, isCancelled=false, isOccupied=true]
class org.gridgain.grid.GridException: Remote job threw user exception (override or implement GridComputeTask.result(..) method if you would like to have automatic failover for this exception).
at org.gridgain.grid.compute.GridComputeTaskAdapter.result(GridComputeTaskAdapter.java:109)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:819)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker$3.apply(GridTaskWorker.java:812)
at org.gridgain.grid.util.GridUtils.wrapThreadLoader(GridUtils.java:6093)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.result(GridTaskWorker.java:812)
at org.gridgain.grid.kernal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:708)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:906)
at org.gridgain.grid.kernal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1138)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.gridgain.grid.GridDeploymentException: Task was not deployed or was redeployed since task execution [taskName=nix.GoogleGridRun$Test, taskClsName=nix.GoogleGridRun$Test, codeVer=0, clsLdrId=eb17983b841-43f8b9fa-87ae-4a20-99a1-8d36f5eb74a4, seqNum=1411761402302, depMode=SHARED, dep=null]
For more information see:
Troubleshooting: http://bit.ly/GridGain-Troubleshooting
Documentation Center: http://bit.ly/GridGain-Documentation
at org.gridgain.grid.kernal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1107)
at org.gridgain.grid.kernal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1732)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:654)
at org.gridgain.grid.kernal.managers.communication.GridIoManager.access$1800(GridIoManager.java:62)
at org.gridgain.grid.kernal.managers.communication.GridIoManager$6.body(GridIoManager.java:615)
at org.gridgain.grid.util.worker.GridWorker.run(GridWorker.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
On the local host side I have connections between the virtual and real ports
tcp6 0 0 127.0.0.1:47100 127.0.0.1:38272 VERBUNDEN 12280/java
tcp6 0 0 127.0.0.1:38272 127.0.0.1:47100 VERBUNDEN 12280/java
And some more to and from the ssh client (also java)
tcp6 45832 0 78.101.12.107:47101 146.148.119.62:51867 VERBUNDEN 12280/java
tcp6 231 0 78.101.12.107:47501 146.148.119.62:46219 CLOSE_WAIT 12280/java
tcp6 48 0 78.101.12.107:37129 146.148.119.62:22 VERBUNDEN 12280/java
tcp6 1 0 78.101.12.107:47501 146.148.119.62:44391 CLOSE_WAIT 12280/java
78.101.12.107 = local ip
146.148.119.62 = remote ip
I looked at netstat on a successful local 2 node grid I see the following connections being made:
tcp6 0 0 ::1:47501 ::1:43143 VERBUNDEN 10218/java
tcp6 0 0 ::1:47500 ::1:34708 VERBUNDEN 9496/java
tcp6 0 0 ::1:34708 ::1:47500 VERBUNDEN 10218/java
tcp6 0 0 ::1:43143 ::1:47501 VERBUNDEN 9496/java
These are between the GridTcpCommunicationSpi.DFLT_PORTs and GridTcpDiscoverySpi.DFLT_PORTs - so these should maybe be enough.
Any Ideas on what could be wrong?
Home node should be available from cluster as well. You have 2 options:
Setup VPN
Implement and configure GridAddressResolver for all nodes which will turn their local addresses to external addresses. This will require to setup port forwarding in your home network.