How to disable ignite baseline auto-just? - ignite

Ignite 2.8.0, I enable persistent, code like this:
IgniteConfiguration igniteCfg = new IgniteConfiguration();
//igniteCfg.setClientMode(true);
DataStorageConfiguration dataStorageCfg = new DataStorageConfiguration();
dataStorageCfg.getDefaultDataRegionConfiguration().setPersistenceEnabled(true);
igniteCfg.setDataStorageConfiguration(dataStorageCfg);
Ignite ignite = Ignition.start(igniteCfg);
Then some exception like below:
Caused by: class org.apache.ignite.spi.IgniteSpiException: Joining persistence node to in-memory cluster couldn't be allowed due to baseline auto-adjust is enabled and timeout equal to 0
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:1997)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1116)
at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:427)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2099)
at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)
... 15 more
Anyony can help me?
Thanks.

After starting first node, invoke ignite.cluster().baselineAutoAdjustEnabled(false)
You can also use bin/control.(sh|bat) --baseline auto_adjust [disable|enable] [timeout <timeoutMillis>] [--yes]
Please note that we don't recommend running mixed persistent~non-persistent clusters since they see very few testing. If you must, make sure that data regions have the same persistenceEnabled settings on all nodes.

Related

Cache partition not replicated

I have 2 nodes with the persistence enabled. I create a cache like so
// all the queues across the frontier instances
CacheConfiguration cacheCfg2 = new CacheConfiguration("queues");
cacheCfg2.setBackups(backups);
cacheCfg2.setCacheMode(CacheMode.PARTITIONED);
globalQueueCache = ignite.getOrCreateCache(cacheCfg2);
where backups is a value > 1
When one of the nodes dies, I get
Exception in thread "Thread-2" javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
at org.apache.ignite.internal.processors.cache.query.GridCacheQueryAdapter.executeScanQuery(GridCacheQueryAdapter.java:597)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:519)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$1.applyx(IgniteCacheProxyImpl.java:517)
at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:3482)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:516)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:843)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:418)
at crawlercommons.urlfrontier.service.ignite.IgniteService$QueueCheck.run(IgniteService.java:270)
Caused by: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute query because cache partition has been lostParts [cacheName=queues, part=2]
... 9 more
I expected the content to have been replicated onto the other node. Why isn't that the case?
Most likely there is a misconfiguration somewhere. Check the following:
you are not working with an existing cache (replace getOrCreateCache to createCache)
you are not having more server nodes than the backup factor is
inspect the logs for "Detected lost partitions" message and what happened prior

How to disable all operations on ignite cache when topology is not valid

I have 2 server nodes and one client node. I am using TopologyValidator to validate the topology.
If any server node left the cluster I want disable all operations. TopologyValidator disables only update operation not get operation. Can you help me to do this?
Currently TopologyValidator disables update operations only.
You can use IgniteCache#close() operations to disable all operations on specific caches.
See: https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#close--
If you do the following:
IgniteCache cache = ignite.getOrCreateCache(config);
cache.put(1L , new Person(1L, "A", "B"));
cache.close();
System.out.println(cache.get(1L)); //exception here.
you will get the following exception on the get call:
[INFO ][exchange-worker-#43%node1%][GridCacheProcessor] Finish proxy initialization, cacheName=test1, localNodeId=...
Exception in thread "main" java.lang.IllegalStateException: Cache has been closed: test1
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.checkProxyIsValid(GatewayProtectedCacheProxy.java:1548)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1580)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:634)
In addition to Alex's answer, you might implement a custom analog of the TopologyValidator. All you need is to listen for the EVT_NODE_LEFT and EVT_NODE_JOINED events to trigger the custom logic, like stopping a cache or switching some application access validator.

Inconsistent behavior of Quartz2 scheduler in Apache Camel

I have an Apache Camel project that is using Quartz2 as the scheduler. The requirement is to make it a cluster. The code is deployed to weblogic 12c. the quartz is configured as per many samples with clustering enabled.
This is my properties file (without the datasource)
org.quartz.scheduler.instanceName = MyScheduler
org.quartz.scheduler.instanceId = AUTO
org.quartz.scheduler.skipUpdateCheck = true
org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
org.quartz.jobStore.misfireThreshold = 60000
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.useProperties=true
org.quartz.JobBuilder.requestRecovery=true
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 20000
When I deploy and start both nodes I see that the QRTZ_SCHEDULER_STATE table has extra entry for one of the nodes:
MyScheduler-routerContext server_node21567108546690
MyScheduler-routerContext-1 server_node11565896495100
MyScheduler-routerContext-1 server_node11567108547295
And I am guessing because of that the one node is being called once in a while while the other node gets called all the time (so occasionally both nodes are invoked at the same time).
I have tried to do a clean restart of weblogic nodes but the issue is still there
This is how my route(s) look like:
from("quartz2://provRegGroup/createUsersTrigger?cron={{create_users_cron}}&job.name=createUsersJob")
.routeId("createUsersRB")
.log("**** starting check for create users");
//where
//create_users_cron=0+0,5,10,15,20,25,30,35,40,45,50,55+*+*+*+?
//expecting one node being called by the scheduler at a time..
I figured out what caused the issue. apparently there were orphan weblogic processes that were running on one (or even both nodes) - this would be a question to our tech archs - why this was such a mess.. ps was showing two weblogic servers running on a node - one that I started recently and one that was there for say a month..
expecting this would never happen to production environment I assume the issue has been resolved..

Apache Ignite + Spark Dataframes: Client vs Server Doubts

I've been trying to integrate ignite and spark. The goal of my application is to write and read spark dataframes to/from ignite. However, I'm facing several issues with larger datasets (> 200 000 000 rows).
I have a 6-node Ignite cluster running on YARN. It has 160Gb of memory and 12 cores. I am trying to save the dataframe using spark (around 20Gb of raw text data) in an Ignite cache (partitioned 1 backup):
def main(args: Array[String]) {
val ignite = setupIgnite
closeAfter(ignite) { _ ⇒
implicit val spark: SparkSession = SparkSession.builder
.appName("Ignite Benchmark")
.getOrCreate()
val customer = readDF("csv", "|", Schemas.customerSchema, "hdfs://master.local:8020/apps/hive/warehouse/ssbplus100/customer")
val part = readDF("csv", "|", Schemas.partSchema, "hdfs:// master.local:8020/apps/hive/warehouse/ssbplus100/part")
val supplier = readDF("csv", "|", Schemas.supplierSchema, "hdfs:// master.local:8020/apps/hive/warehouse/ssbplus100/supplier")
val dateDim = readDF("csv", "|", Schemas.dateDimSchema, "hdfs:// master.local:8020/apps/hive/warehouse/ssbplus100/date_dim")
val lineorder = readDF("csv", "|", Schemas.lineorderSchema, "hdfs:// master.local:8020/apps/hive/warehouse/ssbplus100/lineorder")
writeDF(customer, "customer", List("custkey"), TEMPLATES.REPLICATED)
writeDF(part, "part", List("partkey"), TEMPLATES.REPLICATED)
writeDF(supplier, "supplier", List("suppkey"), TEMPLATES.REPLICATED)
writeDF(dateDim, "date_dim", List("datekey"), TEMPLATES.REPLICATED)
writeDF(lineorder.limit(200000000), "lineorder", List("orderkey, linenumber"), TEMPLATES.NO_BACKUP)
}
}
At some point, the spark application retrieves this error:
class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Out of memory in data region [name=default, initSize=256.0 MiB, maxSize=12.6 GiB, persistenceEnabled=false] Try the following:
^-- Increase maximum off-heap memory size (DataRegionConfiguration.maxSize)
^-- Enable Ignite persistence (DataRegionConfiguration.persistenceEnabled)
^-- Enable eviction or expiration policies
at org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:304)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.allocateDataPage(AbstractFreeList.java:463)
at org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.insertDataRow(AbstractFreeList.java:501)
at org.apache.ignite.internal.processors.cache.persistence.RowStore.addRow(RowStore.java:97)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1302)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4426)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4371)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3083)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$6200(BPlusTree.java:2977)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1726)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1703)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1703)
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1610)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1249)
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:352)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3602)
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2774)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2125)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:140)
at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.localUpdate(DataStreamProcessor.java:400)
at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.processRequest(DataStreamProcessor.java:305)
at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor.access$000(DataStreamProcessor.java:60)
at org.apache.ignite.internal.processors.datastreamer.DataStreamProcessor$1.onMessage(DataStreamProcessor.java:90)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at org.apache.ignite.internal.util.StripedExecutor$Stripe.run(StripedExecutor.java:511)
at java.lang.Thread.run(Thread.java:748)
I think the problem lies in the ignite server being initiated before the spark session, as in the official ignite examples. This server starts caching data that I am writing to the ignite cache and exceeds its default region size max (12Gb, which is different from the 20GB I defined for my yarn cluster). However, I don’t understand how the examples and documentation tells us to create an ignite server before the spark context (and session I assume). I understand that without this the application will hang once all the spark jobs are terminated, but I don’t understand the logic of having a server on the spark application that starts caching data. I’m very confused by this concept, and for now I have setup this ignite instance inside spark to be a client.
This is a strange behavior as all my ignite nodes (running on YARN) have 20GB defined for the default region (I changed it and verified it). This indicates me that the error must come from the ignite servers started on Spark (I think it is one on the driver and one per worker), as I did not changed the default region size in the ignite-config.xml of the spark application (defaults to 12GB as the error demonstrates). However, does this make sense? Should Spark throw out this error being its only goal to read and write data from/to ignite? Is Spark participating in caching any data and does this mean that I should set client mode in the ignite-config.xml of my application, despite the fact that the official examples are not using client mode?
Best regards,
Carlos
First, the Spark-Ignite connector already connects in client mode.
I'm going to assume that you have enough memory, but you can follow the example in the Capacity Planning guide to be sure.
However, I think the problem is that you're following the sample application a bit too closely(!). The sample -- so as to be self-contained -- includes both a server and a Spark client. If you already have an Ignite cluster, you don't need to start a server in your Spark client.
This is a slightly hacked down example from a real application (in Java, sorry):
try (SparkSession spark = SparkSession
.builder()
.appName("AppName")
.master(sparkMaster)
.config("spark.executor.extraClassPath", igniteClassPath())
.getOrCreate()) {
// Get source DataFrame
DataSet<Row> results = ....
results.write()
.outputMode("append")
.format(IgniteDataFrameSettings.FORMAT_IGNITE())
.option(IgniteDataFrameSettings.OPTION_CONFIG_FILE(), igniteCfgFile)
.option(IgniteDataFrameSettings.OPTION_TABLE(), "Results")
.option(IgniteDataFrameSettings.OPTION_STREAMER_ALLOW_OVERWRITE(), true)
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PRIMARY_KEY_FIELDS(), "name")
.option(IgniteDataFrameSettings.OPTION_CREATE_TABLE_PARAMETERS(), "backups=1")
.write();
}
I didn't test, but you should get the idea: you need to provide a URL to an Ignite configuration file; it creates the client to connect to that server behind the scenes.

vertx hazelcast class serialization on OSGi karaf

I want to use vertx cluster with hazelcast on karaf. When I try to write messages to the bus (after cluster is formed) I am getting this serialization error. I was thinking about adding a class definition to hazelcast to tell it where to find the vertx server id class (io.vertx.spi.cluster.hazelcast.impl.HazelcastServerID) but I am not sure how.
On Karaf I had to wrap the vertx-hazelcast jar because it doesn't have a proper manifest file.
<bundle start-level="80">wrap:mvn:io.vertx/vertx-hazelcast/${vertx.version}</bundle>
here is my error
com.hazelcast.nio.serialization.HazelcastSerializationException: Problem while reading DataSerializable, namespace: 0, id: 0, class: 'io.vertx.spi.cluster.hazelcast.impl.HazelcastServerID', exception: io.vertx.spi.cluster.hazelcast.impl.
HazelcastServerID
at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:130)[11:com.hazelcast:3.6.3]
at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:47)[11:com.hazelcast:3.6.3]
at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)[11:com.hazelcast:3.6.3]
at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:170)[11:com.hazelcast:3.6.3]
at com.hazelcast.map.impl.DataAwareEntryEvent.getOldValue(DataAwareEntryEvent.java:82)[11:com.hazelcast:3.6.3]
at io.vertx.spi.cluster.hazelcast.impl.HazelcastAsyncMultiMap.entryRemoved(HazelcastAsyncMultiMap.java:147)[64:wrap_file__C__Users_gadei_development_github_effectus.io_effectus-core_core.test_core.test.exam_target_paxexam_unpack_
5bf4439f-01ff-4db4-bd3d-e3b6a1542596_system_io_vertx_vertx-hazelcast_3.4.0-SNAPSHOT_vertx-hazelcast-3.4.0-SNAPSHOT.jar:0.0.0]
at com.hazelcast.multimap.impl.MultiMapEventsDispatcher.dispatch0(MultiMapEventsDispatcher.java:111)[11:com.hazelcast:3.6.3]
at com.hazelcast.multimap.impl.MultiMapEventsDispatcher.dispatchEntryEventData(MultiMapEventsDispatcher.java:84)[11:com.hazelcast:3.6.3]
at com.hazelcast.multimap.impl.MultiMapEventsDispatcher.dispatchEvent(MultiMapEventsDispatcher.java:55)[11:com.hazelcast:3.6.3]
at com.hazelcast.multimap.impl.MultiMapService.dispatchEvent(MultiMapService.java:371)[11:com.hazelcast:3.6.3]
at com.hazelcast.multimap.impl.MultiMapService.dispatchEvent(MultiMapService.java:65)[11:com.hazelcast:3.6.3]
at com.hazelcast.spi.impl.eventservice.impl.LocalEventDispatcher.run(LocalEventDispatcher.java:56)[11:com.hazelcast:3.6.3]
at com.hazelcast.util.executor.StripedExecutor$Worker.process(StripedExecutor.java:187)[11:com.hazelcast:3.6.3]
at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:171)[11:com.hazelcast:3.6.3]
Caused by: java.lang.ClassNotFoundException: io.vertx.spi.cluster.hazelcast.impl.HazelcastServerID
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)[:1.8.0_101]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)[:1.8.0_101]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)[:1.8.0_101]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)[:1.8.0_101]
at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:137)[11:com.hazelcast:3.6.3]
at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:115)[11:com.hazelcast:3.6.3]
at com.hazelcast.nio.ClassLoaderUtil.newInstance(ClassLoaderUtil.java:68)[11:com.hazelcast:3.6.3]
at com.hazelcast.internal.serialization.impl.DataSerializer.read(DataSerializer.java:119)[11:com.hazelcast:3.6.3]
... 13 more
any suggestions appreciated
thanks.
This normally happens if one object has an acyclic serialization (reading one less / much property). In this case you're on a wrong stream position which means you end up reading the wrong datatype.
Another possible reason is multiple different Hazelcast versions in the classpath (please check that) or different versions on different nodes.
The solution involved classloading magic!
.setClassLoader(HazelcastClusterManager.class.getClassLoader())
I ended up rolling my own hazelcast instance and configuring it the way vertx specification is instructing with the additional classloader configuration trick.
```
ServiceReference serviceRef = context.getServiceReference(HazelcastOSGiService.class);
log.info("Hazelcast OSGi Service Reference: {}", serviceRef);
hazelcastOsgiService = context.getService(serviceRef);
log.info("Hazelcast OSGi Service: {}", hazelcastOsgiService);
hazelcastOsgiService.getClass().getClassLoader();
Map<String, SemaphoreConfig> semaphores = new HashMap<>();
semaphores.put("__vertx.*", new SemaphoreConfig().setInitialPermits(1));
Config hazelcastConfig = new Config("effectus-instance")
.setClassLoader(HazelcastClusterManager.class.getClassLoader())
.setGroupConfig(new GroupConfig("dev").setPassword("effectus"))
// .setSerializationConfig(new SerializationConfig().addClassDefinition()
.addMapConfig(new MapConfig()
.setName("__vertx.subs")
.setBackupCount(1)
.setTimeToLiveSeconds(0)
.setMaxIdleSeconds(0)
.setEvictionPolicy(EvictionPolicy.NONE)
.setMaxSizeConfig(new MaxSizeConfig().setSize(0).setMaxSizePolicy(MaxSizeConfig.MaxSizePolicy.PER_NODE))
.setEvictionPercentage(25)
.setMergePolicy("com.hazelcast.map.merge.LatestUpdateMapMergePolicy"))
.setSemaphoreConfigs(semaphores);
hazelcastOSGiInstance = hazelcastOsgiService.newHazelcastInstance(hazelcastConfig);
log.info("New Hazelcast OSGI instance: {}", hazelcastOSGiInstance);
hazelcastOsgiService.getAllHazelcastInstances().stream().forEach(instance -> {
log.info("Registered Hazelcast OSGI Instance: {}", instance.getName());
});
clusterManager = new HazelcastClusterManager(hazelcastOSGiInstance);
VertxOptions options = new VertxOptions().setClusterManager(clusterManager).setHAGroup("effectus");
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
Vertx v = res.result();
log.info("Vertx is running in cluster mode: {}", v);
// some more code...
```
so the issue is that hazelcast instance doesn't have access to the cleasass inside the vertx-hazelcst bundle.
I am sure there is a shorter cleaner way somewhere..
any better suggestions would be great.