ignite control tool command "reset_lost_partitions" not work - ignite

After all server nodes restart and rejoin the baseline, the cache still failed to be read or written:
javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute the cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=SQL_DEV_MONI_MONITOR, partition=840, key=UserKeyCacheObjectImpl [part=840, val=iii, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1278)
Partitions owned by nodes is reblanced:
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mcpl323", inci="7ffed004-2058-4045-875e-bca5ac37e12a", instance="10.16.23.47:9090", job="prometheus"}
544
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mwpl037", inci="fcea873e-3822-4f1b-ab86-acf71d2e740b", instance="10.16.50.124:9090", job="prometheus"}
382
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mwpl038", inci="d52c1baf-c34e-4c49-af93-89bff06f55f1", instance="10.16.50.123:9090", job="prometheus"}
210
The data region persistence was enabled, but reset_lost_partitions not works:
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!--
Default memory region that grows endlessly. A cache is bound to this memory region
unless it sets another one in its CacheConfiguration.
-->
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<property name="persistenceEnabled" value="true"/>
<!-- 1 GB memory region with disabled eviction -->
<property name="initialSize" value="#{100L * 1024 * 1024}"/>
<property name="maxSize" value="#{100L * 1024 * 1024 * 1024}"/>
<!-- Enabling SEGMENTED_LRU page replacement for this region. -->
<property name="pageReplacementMode" value="SEGMENTED_LRU"/>
<property name="metricsEnabled" value="true"/>
<property name="warmUpConfiguration">
<bean class="org.apache.ignite.configuration.LoadAllWarmUpConfiguration"/>
</property>
</bean>
</property>
</bean>
</property>

If a node has permanently left the grid, you need to update the baseline topology. There are a number of ways of doing that, but the simplest is the control script:
./control.sh --baseline
Then
./control.sh --baseline add [node]
./control.sh --baseline remove [node]

Related

GridGain Near Cache Not storing data

I have a query re. the setup of the GridGain near cache, we have a single server node with the config as listed below and have a single thick client connecting successfully to it ~
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- PEER CLASS LOADING -->
<property name="peerClassLoadingEnabled" value="true"/>
<!-- CACHE CONFIG-->
<property name="cacheConfiguration">
<list>
<!-- ENTER CACHE TEMPLATE-->
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="cache1"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="rebalanceMode" value="SYNC"/>
<property name="nearConfiguration">
<bean class="org.apache.ignite.configuration.NearCacheConfiguration">
<property name="nearEvictionPolicyFactory">
<bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicyFactory">
<property name="maxSize" value="100000"/>
</bean>
</property>
</bean>
</property>
</bean>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="cache2"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="rebalanceMode" value="SYNC"/>
<property name="nearConfiguration">
<bean class="org.apache.ignite.configuration.NearCacheConfiguration">
<property name="nearEvictionPolicyFactory">
<bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicyFactory">
<property name="maxSize" value="100000"/>
</bean>
</property>
</bean>
</property>
</bean>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="cache3"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="rebalanceMode" value="SYNC"/>
<property name="nearConfiguration">
<bean class="org.apache.ignite.configuration.NearCacheConfiguration">
<property name="nearEvictionPolicyFactory">
<bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicyFactory">
<property name="maxSize" value="100000"/>
</bean>
</property>
</bean>
</property>
</bean>
</list>
</property>
<!-- DISCOVERY-->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
<property name="namespace" value="gridgain"/>
<property name="serviceName" value="gridgain-service"/>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
In setting the server up like this it was my understanding that as per the documentation here , that "Once configured in this way, the near cache is created on any node that requests data from the underlying cache, including both server nodes and client nodes. When you get an instance of the cache, as shown in the following example, the data requests go through the near cache.
IgniteCache<Integer, Integer> cache = ignite.cache("myCache");
int value = cache.get(1);
Based on this I do not believe that I have any need to create the near cache config on our client? and have just implemented code as ~
IgniteCache<Object, Object> cache = ignite.cache(ourCacheName);
The issue I see is that when I peek at the local cache to try and find values in there, after searching for them ~
cache_.localPeek(key, CachePeekMode.NEAR)
The objects are not found, despite being searched for several times, and it looks like they are not added to our near cache setup, everything just refers to the underlying cache. Previously we had programmatically created the Near cache on the client and it had worked, but we would like to config the solution on the server if possible. Our client node is just using default config, if this makes a difference.
Any thoughts why we are not seeing a near cache?
Thanks,
LS
In order to use the cache I suggest you create the near cache explicitly using the following syntax:
IgniteCache<Integer, Integer> clientCache = client.getOrCreateNearCache(cacheCfg.getName(), nearCfg);
...
clientCache.get(1);
System.out.println(clientCache.localPeek(1, CachePeekMode.NEAR));
There are some tickets like IGNITE-15960 or IGNITE-1163 with discussions about the API improvements. I suppose the cache has to be declared on the servers first and then you would be able to create it explicitly on the clients. Agree, the docs and API are super confusing and have to be reworked.
Also, the near cache is local to a node, i.e. you might have them for some clients/servers and do not want to create it for other ones.

Ignite FileAlreadyExistsException during WAL Archival

We are using Gridgain Version: 8.8.10
JDK Version: 1.8
We have Ignite cluster with 3 nodes in Azure Kubernetes. We have enabled native persistence. Some of our Ignite pods are going into the CrashLoopBackOff with the below exception
[07:45:45,477][WARNING][main][FileWriteAheadLogManager] Content of WAL working directory needs rearrangement, some WAL segments will be moved to archive: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1. Segments from 0000000000000001.wal to 0000000000000008.wal will be moved, total number of files: 8. This operation may take some time.
[07:45:45,480][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter []
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1159)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1711)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1141)
at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1059)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:945)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:844)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:714)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:683)
at org.apache.ignite.Ignition.start(Ignition.java:344)
at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:290)
Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to move WAL segment [src=/gridgain/wal/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal, dst=/gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal]
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3326)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.prepareAndCheckWalFiles(FileWriteAheadLogManager.java:1542)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:494)
at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:60)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:605)
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1935)
... 11 more
Caused by: java.nio.file.FileAlreadyExistsException: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:450)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3307)
... 16 more
[07:45:45,482][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine).
It seems like the during WAL Archival it is created file with the same name and it is not able to override the file. Is the any specific configuration during WAL Archival which we are missing?
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- set the size of wal segments to 128MB -->
<property name="walSegmentSize" value="#{128 * 1024 * 1024}"/>
<property name="writeThrottlingEnabled" value="true"/>
<!-- Set the page size to 8 KB -->
<property name="pageSize" value="#{8 * 1024}"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<!-- Memory region of 20 MB initial size. -->
<property name="initialSize" value="#{20 * 1024 * 1024}"/>
<!-- Memory region of 8 GB max size. -->
<property name="maxSize" value="#{8L * 1024 * 1024 * 1024}"/>
<!-- Enabling eviction for this memory region. -->
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="persistenceEnabled" value="true"/>
<!-- Increasing the buffer size to 1 GB. -->
<property name="checkpointPageBufferSize" value="#{1024L * 1024 * 1024}"/>
</bean>
</property>
<property name="walPath" value="/gridgain/wal"/>
<property name="walArchivePath" value="/gridgain/walarchive"/>
</bean>
</property>
Anyone has faced a similar issue with Ignite Kubernetes Cluster.
We are observing this in GKE. In AKS it works fine. We are using the Apache Ignite Operator.
https://ignite.apache.org/docs/latest/installation/kubernetes/gke-deployment

Shared Folder as Ignite Store Folder

I want to use network shared folder as persistent store path in DataStorageConfiguration.Ignite stucks there.
Can anyone please tell me how to do in ignite?
I wouldn’t recommend putting Ignite’s persistent files on a network volume. The performance and locking characteristics often lead to problems. Fast, local disks are very much preferable.
But to directly answer your question, as per the documentation:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
<property name="storagePath" value="/opt/storage"/>
<property name="walPath" value="/opt/wal"/>
<property name="walArchivePath" value="/opt/walarch"/>
</bean>
</property>
</bean>

Apache Ignite zone(rack)-aware parititons

I'm battling to configure Apache Ignite to distribute partitions in zone-aware manner. I have Ignite 2.8.0 with 4 nodes running as StatefulSet pods in GKE 1.14 split in two zones. I followed the guide, and the example:
Propagated zone names into pod under AVAILABILITY_ZONE env var.
Then using Web-Console I verified that this env var was loaded correctly for each node.
I setup cache template in node XML config as in the below and created a cache from it using GET /ignite?cmd=getorcreate&cacheName=zone-aware-cache&templateName=zone-aware-cache (I can't see affinityBackupFilter settings in UI, but other parameters from the template got applied, so I assume it worked)
To simplify verification of partition distribution, I the partition number is set to just 2. After creating the cache I observed the following partition distribution:
Then I mapped nodes ids to values in AVAILABILITY_ZONE env var, as reported by nodes, with the following results:
AA146954 us-central1-a
3943ECC8 us-central1-c
F7B7AB67 us-central1-a
A94EE82C us-central1-c
As one can easily see, partition 0 pri/bak resides on nodes 3943ECC8 and A94EE82C which both are in the same zone. What am I missing to make it work?
Another odd thing, is then specifying partition number to be low (e.g. 2 or 4), only 3 out of 4 nodes are used). When using 1024 partitions, all nodes are utilized, but the problem still exists - 346 out of 1024 partitions had their primary/backup colocated in the same zone.
Here is my node config XML:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<!-- Enables Kubernetes IP finder and setting custom namespace and service names. -->
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
<property name="namespace" value="ignite"/>
</bean>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean id="zone-aware-cache-template" abstract="true" class="org.apache.ignite.configuration.CacheConfiguration">
<!-- when you create a template via XML configuration, you must add an asterisk to the name of the template -->
<property name="name" value="zone-aware-cache*"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="backups" value="1"/>
<property name="readFromBackup" value="true"/>
<property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
<property name="copyOnRead" value="true"/>
<property name="eagerTtl" value="true"/>
<property name="statisticsEnabled" value="true"/>
<property name="affinity">
<bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
<property name="partitions" value="2"/> <!-- for debugging only! -->
<property name="excludeNeighbors" value="true"/>
<property name="affinityBackupFilter">
<bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
<constructor-arg>
<array value-type="java.lang.String">
<!-- Backups must go to different AZs -->
<value>AVAILABILITY_ZONE</value>
</array>
</constructor-arg>
</bean>
</property>
</bean>
</property>
</bean>
</list>
</property>
</bean>
</beans>
Update: Eventually excludeNeighbors false/true makes or breaks zone awareness. I'm not sure why it didn't work with excludeNeighbors=false previously for me. I made some scripts to automate my testing. And now it's definite that it's the excludeNeighbors setting. It's all here: https://github.com/doitintl/ignite-gke. Regardless I also opened a bug with IGNITE Jira: https://issues.apache.org/jira/browse/IGNITE-12896. Many thanks to #alamar for his suggestions.
I recommend setting excludeNeighbors to false. It is true in your case, it is not needed, and I get correct partitions mapping when I set it to false (of course, I also run all four nodes locally).
Environment property was enough, did not need to add it manually to user attributes.

Apache Ignite CacheConfiguration repeat for each data set?

I am trying to modify default-config.xml by adding cacheConfiguration tags. Do i need to repeat cacheConfiguration XML tag for each data set RDD that i am tyring to keep to keep it in the memory ? Can i set backups to 0, if i don't want it.
ex:
<property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="TEST1_RDD"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="backups" value="0"/>
</bean>
</property> <property name="cacheConfiguration">
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="TEST2_RDD"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="backups" value="0"/>
</bean>
</property>
Also, do i need to specify explicitly write synchronization mode ? and by default which one Ignite consider ?
ex:
<property name="writeSynchronizationMode" value="FULL_SYNC"/>
Appreciate your response.
Yes, You have to write configuration for each cache as your cache may have different functionality/purpose and you have to set configuration according to it.
For backups it's default value is 0 and for CacheWriteSynchronizationMode default value is PRIMARY_SYNC
There is a possibility to define cache templates, if you don't want to provide the same configuration for caches: https://apacheignite.readme.io/docs/cache-template