Client node hanging while start - ignite

Ignite server is in a docker container and runs fine. I can connect using jdbc client and also using a Java thin client from intellij.
But my issue is when starting a client node from intellij.
During the following code,
"Ignite ignite = Ignition.start(cfg);",
both server and client acknowledges with the following message in console and nothing happens after that.
server: Topology snapshot [ver=18, locNode=5f0fa7bc, servers=1, clients=1, state=ACTIVE, CPUs=6, offheap=1.8GB, heap=5.0GB]
client: Topology snapshot [ver=18, locNode=6182dc8d, servers=1, clients=1, state=ACTIVE, CPUs=6, offheap=1.8GB, heap=5.0GB]
I've some testing code in the client following the "Ignition.start", but it's not executing
IgniteCache < Integer, String > cache = Ignition.start(cfg).getOrCreateCache("HelloWorld"); // put some cache elements
for (int i = 1; i <= 100; i++) {
cache.put(i, Integer.toString(i));
}
for (int i = 1; i <= 100; i++) {
System.out.println("Cache get:" + cache.get(i));
}

Related

Apache ignite client node reconnect getting error org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation

I have started ignite server as well as app as client node using the following configuration
public IgniteConfigurer config() {
return cfg -> {
// The node will be started as a client node.
cfg.setClientMode(true);
// Classes of custom Java logic will be transferred over the wire from this app.
cfg.setPeerClassLoadingEnabled(false);
// Setting up an IP Finder to ensure the client can locate the servers.
final TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Arrays.asList(ip));
cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(ipFinder));
// Cache Metrics log frequency. If 0 then log print disable.
cfg.setMetricsLogFrequency(Integer.parseInt(cacheMetricsLogFrequency));
// setting up storage configuration
final DataStorageConfiguration storageCfg = new DataStorageConfiguration();
storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
storageCfg.setStoragePath(cacheStorage);
// setting up data region for storage
final DataRegionConfiguration defaultRegion = new DataRegionConfiguration();
defaultRegion.setName(cacheDefaultRegionName);
// Sets initial memory region size. When the used memory size exceeds this value, new chunks of memory will be allocated
defaultRegion.setInitialSize(Long.parseLong(cacheRegionInitSize));
storageCfg.setDefaultDataRegionConfiguration(defaultRegion);
cfg.setDataStorageConfiguration(storageCfg);
cfg.setWorkDirectory(cacheStorage);
final TcpCommunicationSpi communicationSpi = new TcpCommunicationSpi();
// Sets message queue limit for incoming and outgoing messages
communicationSpi.setMessageQueueLimit(Integer.parseInt(cacheTcpCommunicationSpiMessageQueueLimit));
cfg.setCommunicationSpi(communicationSpi);
final CacheCheckpointSpi cpSpi = new CacheCheckpointSpi();
cfg.setCheckpointSpi(cpSpi);
final FifoQueueCollisionSpi colSpi = new FifoQueueCollisionSpi();
// Execute all jobs sequentially by setting parallel job number to 1.
colSpi.setParallelJobsNumber(Integer.parseInt(cacheParallelJobs));
cfg.setCollisionSpi(colSpi);
// set failure handler for auto connection if ignite server stop/starts.
cfg.setFailureHandler(new StopNodeFailureHandler());
};
}
everything working fine. Now I have stopped ignite server and again restart ignite server. After restarting ignite server When I do any cache operation on I am getting error like
Caused by: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): mycache1
... 63 more
When I see ignite server logs it shows me the client is connected. See below logs
[17:25:41] ^-- Baseline [id=0, size=1, online=1, offline=0]
[17:25:42] Topology snapshot [ver=2, locNode=ea964803, servers=1, clients=1, state=ACTIVE, CPUs=8, offheap=6.3GB, heap=4.5GB]
[17:25:42] ^-- Baseline [id=0, size=1, online=1, offline=0]
So why it not allowed to perform any cache operation through the application which is running as a client node?.
It looks like you are creating your "mycache1" inside the default data region which is not configured to be persistent.
I.e. you first define a default region to be persistent:
storageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
storageCfg.setStoragePath(cacheStorage);
But further you are re-creating it without setPersistenceEnabled:
final DataRegionConfiguration defaultRegion = new DataRegionConfiguration();
defaultRegion.setName(cacheDefaultRegionName);
// Sets initial memory region size. When the used memory size exceeds this value, new chunks of memory will be allocated
defaultRegion.setInitialSize(Long.parseLong(cacheRegionInitSize));
storageCfg.setDefaultDataRegionConfiguration(defaultRegion);
So you need to replace getDefaultDataRegionConfiguration().setPersistenceEnabled(true); with storageCfg.setDefaultDataRegionConfiguration(defaultRegion); to enable persistence and I think you won't have CacheStoppedException anymore.
As for in-memory configuration (which I think was applied here instead) and dynamically created caches, this is expected behavior. Because in this case, the server knows nothing about the previously created caches and you need to recreate them explicitly. Doing something like:
try{
...
}
catch(Exception exception) {
if (exception instanceof IgniteException) {
final Throwable rootCause = getRootCause(exception);
if(rootCause instanceof CacheStoppedException)
{
ignite.cache("mycache1");
mylogger.info("Connection re-estabilished with the cache.");
}
}

JedisConnectionFactory fails to connect to Elasticache even though redis-cli ping is successful

I'm running Java/Tomcat project in Elastic Beanstalk. I've setup an Elasticache group in the same vpc. Currently, just testing with a single EC2 instance. The app is Spring Boot with spring-boot-starter-redis. It tries to ping Redis with template.getConnectionFactory().getConnection().ping(); on startup and is throwing an exception. Root cause is java.net.ConnectException: Connection refused. If I telnet to the server and port, it works. I installed redis-cli on the same instance and was able to connect and ping to the group and each node. The code also works fine on my local with local redis. Is Jedis connecting to anything other than the visible Elasticache nodes?
#Autowired
private RedisConnectionFactory connectionFactory;
/**
* Configure connection factory as per redis server.
*/
#PostConstruct
public void configureConnectionManager() {
if (cachingEnabled && connectionFactory instanceof JedisConnectionFactory) {
LOGGER.info("Connecting to Redis cache.");
JedisConnectionFactory jedisConnectionFactory =
(JedisConnectionFactory) connectionFactory;
if (port > 0) {
jedisConnectionFactory.setPort(port);
}
if (StringUtils.isNotBlank(hostname)) {
jedisConnectionFactory.setHostName(hostname);
}
jedisConnectionFactory.setUsePool(true);
RedisTemplate<Object, Object> template = new RedisTemplate<>();
template.setConnectionFactory(jedisConnectionFactory);
template.afterPropertiesSet();
LOGGER.info("Testing connection to Redis server on "
+ jedisConnectionFactory.getHostName()
+ ":" + jedisConnectionFactory.getPort());
// This will test the connection and throw a runtime exception
// if the server can't be reached.
template.getConnectionFactory().getConnection().ping();
final RedisCacheManager redisCacheManager =
new RedisCacheManager(template);
redisCacheManager.setDefaultExpiration(ttl);
this.cm = redisCacheManager;
} else {
// Default implementation incase cache turned off or exception.
LOGGER.info("Caching disabled for this session.");
this.cm = new NoOpCacheManager();
}
}

RabbitMQ cluster fail-over issue

Created Cluster with two rabbitMQ nodes. Configuration is as below for rabbit1 and rabbit2 nodes.
1> CachingConnectionFactory connectionFactory = new CachingConnectionFactory();
connectionFactory.setAddresses("rabbit1:5672,rabbit2:5672");
2> node types
rabbit1 - disc node
rabbit2 - ram node
3> producer and consumer programs sits on rabbit2 node(ie> ram node)
4> Producer sample code -
String QueueName = "Queue.";
for(int m=0; m<50000; m++){
// send message
System.out.println(this.rabbitTemplate.getConnectionFactory().getHost());
this.rabbitTemplate.convertAndSend(m);
/*Thread.sleep(100);*/
}
5> consumer code -
String QueueName = "Queue.";
public void run() {
System.out.println("Consumer running host : " + this.connectionFactory.getHost());
SimpleMessageListenerContainer container = new SimpleMessageListenerContainer();
container.setConnectionFactory(this.connectionFactory);
container.setQueueNames(this.queueName);
container.setMessageListener(new MessageListenerAdapter(new TestMessageHandler(this.connectionFactory.getHost()), new JsonMessageConverter()));
container.start();
}
TestMessageHandler class sample code-
public TestMessageHandler(String hostName){
System.out.println("Host: " + hostName);
this.hostName = hostName;
}
// Handle message
public void handleMessage(int message) {
System.out.println("handleMessage Host: " + this.hostName);
System.out.println("Int : " + message);
}
6> Each node executed below policy
cmd> rabbitmqctl set_policy ha-all "^Queue\." "{""ha-mode"":""all""}"
7> Started producer and consumer simultaneously. Could see host name as "rabbit1" then stopped "rabbit1" node with "rabbitmqctl stop_app" command to test fail-over scenario. Then got the below error
WARN [.listener.SimpleMessageListenerContainer]: Consumer raised exception, processing can restart if the connection factory supports it
com.rabbitmq.client.ShutdownSignalException: connection error; reason: {#method<connection.close>(reply-code=541, reply-text=INTERNAL_ERROR, class-id=0, method-id=0), null, ""}
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:678)
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:668)
at com.rabbitmq.client.impl.AMQConnection.handleConnectionClose(AMQConnection.java:624)
at com.rabbitmq.client.impl.AMQConnection.processControlCommand(AMQConnection.java:598)
at com.rabbitmq.client.impl.AMQConnection$1.processAsync(AMQConnection.java:96)
at com.rabbitmq.client.impl.AMQChannel.handleCompleteInboundCommand(AMQChannel.java:144)
at com.rabbitmq.client.impl.AMQChannel.handleFrame(AMQChannel.java:91)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:523)
INFO [.listener.SimpleMessageListenerContainer]: Restarting Consumer: tag=[amq.ctag-5CJ3YJYfMZDnJOnXsds6_Q], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#192.168.97.70:5672/,1), acknowledgeMode=AUTO local queue size=0
after this warning, again am getting host name as "rabbit1" only. actually it should be "rabbit2" as per my understanding but its not happening.
So, here are my Queries -
1> Why am getting host name as "rabbit1" even after stopping?
2> To test the fail-over do we require any load balancer?
3> If my steps are wrong for testing fail-over case, please provide steps for the same?
4> How to distribute queues/messages to particular node, as below, 1-500 messages/queues to node1, 501-1000 messages/queues to node2, etc.
5> Please let me know is there any other approach to test fail-over scenario?
Appreciate any help on this.

WCF service very slow response on large loads

I have a wcf service and when we make like 40 simulatenosly calls to the service and we start seeing a lot of timeous on the client. Please see attached picture of wcf trace log , sorted by duration. The service and client are on same machine, using standard tcp binding.
Error we see are
1) The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '10675199.02:48:05.4775807'.
2) System.IO.PipeException, System.ServiceModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
There was an error writing to the pipe: The pipe is being closed. (232, 0xe8).
Update on further testing...
Looks like this happens only when the service does some work , if I comment out the work I am doing in the service it takes same 1000 / in 3min requests with no problems.
The client has like 40 active connection to the server.
Even with a sample app , i see same errors. I have do following in server ..
public double Add(double n1, double n2) {
for (int i = 0; i < 1000000; i++)
{
Trace.WriteLine("this is a test");
}
return n1 + n2;
}
and do the following on the client
Action<string> a = CallService5000TimesInLoop;
for (int i = 0; i < 30; i++)
{
a.BeginInvoke("Url",null,null);
}

Can't connect to a remote zookeeper from a Kafka producer

I've been playing with Apache Kafka for a few days, and here is my problem,
If I set up the local test described in the "quick start" section on the website, everything is fine, the kafka producer/ consumer, zookeeper server and kafka broker work perfectly.
Now if I run on a remote server (let's call it node2) :
- Zookeeper - port 2181
- Kafka Broker - port 9092
- kafka consumer
And then if I run from my local computer :
- kafka producer
Assuming that there is no firewall on node2.
The connection end up with a timeout.
Here is the error log :
/etc/java/jdk1.6.0_41/bin/java -Didea.launcher.port=7533 -Didea.launcher.bin.path=/home/kevin/Documents/idea-IU-123.169/bin -Dfile.encoding=UTF-8 -classpath /etc/java/jdk1.6.0_41/lib/dt.jar:/etc/java/jdk1.6.0_41/lib/tools.jar:/etc/java/jdk1.6.0_41/lib/jconsole.jar:/etc/java/jdk1.6.0_41/lib/htmlconverter.jar:/etc/java/jdk1.6.0_41/lib/sa-jdi.jar:/home/kevin/Desktop/kafka-0.7.2/examples/target/scala_2.8.0/classes:/home/kevin/Desktop/kafka-0.7.2/project/boot/scala-2.8.0/lib/scala-compiler.jar:/home/kevin/Desktop/kafka-0.7.2/project/boot/scala-2.8.0/lib/scala-library.jar:/home/kevin/Desktop/kafka-0.7.2/core/target/scala_2.8.0/classes:/home/kevin/Desktop/kafka-0.7.2/core/lib_managed/scala_2.8.0/compile/jopt-simple-3.2.jar:/home/kevin/Desktop/kafka-0.7.2/core/lib_managed/scala_2.8.0/compile/log4j-1.2.15.jar:/home/kevin/Desktop/kafka-0.7.2/core/lib_managed/scala_2.8.0/compile/zookeeper-3.3.4.jar:/home/kevin/Desktop/kafka-0.7.2/core/lib_managed/scala_2.8.0/compile/zkclient-0.1.jar:/home/kevin/Desktop/kafka-0.7.2/core/lib_managed/scala_2.8.0/compile/snappy-java-1.0.4.1.jar:/home/kevin/Desktop/kafka-0.7.2/examples/lib_managed/scala_2.8.0/compile/jopt-simple-3.2.jar:/home/kevin/Desktop/kafka-0.7.2/examples/lib_managed/scala_2.8.0/compile/log4j-1.2.15.jar:/home/kevin/Documents/idea-IU-123.169/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain kafka.examples.KafkaConsumerProducerDemo
log4j:WARN No appenders could be found for logger (org.I0Itec.zkclient.ZkConnection).
log4j:WARN Please initialize the log4j system properly.
Exception in thread "Thread-0" java.net.ConnectException: Connection timed out
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:532)
at kafka.producer.SyncProducer.connect(SyncProducer.scala:173)
at kafka.producer.SyncProducer.getOrMakeConnection(SyncProducer.scala:196)
at kafka.producer.SyncProducer.send(SyncProducer.scala:92)
at kafka.producer.SyncProducer.send(SyncProducer.scala:125)
at kafka.producer.ProducerPool$$anonfun$send$1.apply$mcVI$sp(ProducerPool.scala:114)
at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
at kafka.producer.ProducerPool$$anonfun$send$1.apply(ProducerPool.scala:100)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:57)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:43)
at kafka.producer.ProducerPool.send(ProducerPool.scala:100)
at kafka.producer.Producer.zkSend(Producer.scala:137)
at kafka.producer.Producer.send(Producer.scala:99)
at kafka.javaapi.producer.Producer.send(Producer.scala:103)
at kafka.examples.Producer.run(Producer.java:53)
Process finished with exit code 0
And here is my Producer's code :
import java.util.Properties;
import kafka.javaapi.producer.ProducerData;
import kafka.producer.ProducerConfig;
public class Producer extends Thread{
private final kafka.javaapi.producer.Producer<String, String> producer;
private final String topic;
private final Properties props = new Properties();
public Producer(String topic)
{
props.put("zk.connect", "node2:2181");
props.put("connect.timeout.ms", "5000");
props.put("socket.timeout.ms", "30000");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("producer.type", "sync");
props.put("conpression.codec", "0");
producer = new kafka.javaapi.producer.Producer<String, String>(new ProducerConfig(props));
this.topic = topic;
}
public void run() {
String messageStr = new String("Message_test");
producer.send(new ProducerData<String, String>(topic, messageStr));
}
}
**So I also tested to switch
props.put("zk.connect", "node2:2181");
by
props.put("broker.list", "0:node2:9082");
And in that case I can connect successfully.**
See item #3 in http://kafka.apache.org/faq.html
The workaround is to explicitly set hostname property in server.properties of Kafka
You can verify this by using Zookeeper. If you are using kafka 0.7*, open ZkCli console and do get /brokers/ids/0 and you should get all the brokers metadata. Make sure the IP address/hostnames here matches the Zk connect string you are using in producer code -
props.put("zk.connect", "node2:2181");
In my case, I was using a producer running on my local machine connecting to a ubuntu VM (same box, different IP) and this workaround helped.