Ignite FileAlreadyExistsException during WAL Archival - ignite

We are using Gridgain Version: 8.8.10
JDK Version: 1.8
We have Ignite cluster with 3 nodes in Azure Kubernetes. We have enabled native persistence. Some of our Ignite pods are going into the CrashLoopBackOff with the below exception
[07:45:45,477][WARNING][main][FileWriteAheadLogManager] Content of WAL working directory needs rearrangement, some WAL segments will be moved to archive: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1. Segments from 0000000000000001.wal to 0000000000000008.wal will be moved, total number of files: 8. This operation may take some time.
[07:45:45,480][SEVERE][main][IgniteKernal] Exception during start processors, node will be stopped and close connections
class org.apache.ignite.IgniteCheckedException: Failed to start processor: GridProcessorAdapter []
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1938)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1159)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1787)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1711)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1141)
at org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1059)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:945)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:844)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:714)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:683)
at org.apache.ignite.Ignition.start(Ignition.java:344)
at org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:290)
Caused by: class org.apache.ignite.internal.processors.cache.persistence.StorageException: Failed to move WAL segment [src=/gridgain/wal/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal, dst=/gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal]
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3326)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.prepareAndCheckWalFiles(FileWriteAheadLogManager.java:1542)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.start0(FileWriteAheadLogManager.java:494)
at org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:60)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:605)
at org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1935)
... 11 more
Caused by: java.nio.file.FileAlreadyExistsException: /gridgain/walarchive/node00-71fcf5d3-faf7-4d2b-abae-bd0621bb12a1/0000000000000001.wal
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:450)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.moveSegmentsToArchive(FileWriteAheadLogManager.java:3307)
... 16 more
[07:45:45,482][SEVERE][main][IgniteKernal] Got exception while starting (will rollback startup routine).
It seems like the during WAL Archival it is created file with the same name and it is not able to override the file. Is the any specific configuration during WAL Archival which we are missing?
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- set the size of wal segments to 128MB -->
<property name="walSegmentSize" value="#{128 * 1024 * 1024}"/>
<property name="writeThrottlingEnabled" value="true"/>
<!-- Set the page size to 8 KB -->
<property name="pageSize" value="#{8 * 1024}"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<!-- Memory region of 20 MB initial size. -->
<property name="initialSize" value="#{20 * 1024 * 1024}"/>
<!-- Memory region of 8 GB max size. -->
<property name="maxSize" value="#{8L * 1024 * 1024 * 1024}"/>
<!-- Enabling eviction for this memory region. -->
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="persistenceEnabled" value="true"/>
<!-- Increasing the buffer size to 1 GB. -->
<property name="checkpointPageBufferSize" value="#{1024L * 1024 * 1024}"/>
</bean>
</property>
<property name="walPath" value="/gridgain/wal"/>
<property name="walArchivePath" value="/gridgain/walarchive"/>
</bean>
</property>
Anyone has faced a similar issue with Ignite Kubernetes Cluster.
We are observing this in GKE. In AKS it works fine. We are using the Apache Ignite Operator.
https://ignite.apache.org/docs/latest/installation/kubernetes/gke-deployment

Related

ignite control tool command "reset_lost_partitions" not work

After all server nodes restart and rejoin the baseline, the cache still failed to be read or written:
javax.cache.CacheException: class org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed to execute the cache operation (all partition owners have left the grid, partition data has been lost) [cacheName=SQL_DEV_MONI_MONITOR, partition=840, key=UserKeyCacheObjectImpl [part=840, val=iii, hasValBytes=false]]
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1278)
Partitions owned by nodes is reblanced:
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mcpl323", inci="7ffed004-2058-4045-875e-bca5ac37e12a", instance="10.16.23.47:9090", job="prometheus"}
544
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mwpl037", inci="fcea873e-3822-4f1b-ab86-acf71d2e740b", instance="10.16.50.124:9090", job="prometheus"}
382
cacheGroups_public_LocalNodeOwningPartitionsCount{iin="fintech-test-grid-mwpl038", inci="d52c1baf-c34e-4c49-af93-89bff06f55f1", instance="10.16.50.123:9090", job="prometheus"}
210
The data region persistence was enabled, but reset_lost_partitions not works:
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!--
Default memory region that grows endlessly. A cache is bound to this memory region
unless it sets another one in its CacheConfiguration.
-->
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<property name="persistenceEnabled" value="true"/>
<!-- 1 GB memory region with disabled eviction -->
<property name="initialSize" value="#{100L * 1024 * 1024}"/>
<property name="maxSize" value="#{100L * 1024 * 1024 * 1024}"/>
<!-- Enabling SEGMENTED_LRU page replacement for this region. -->
<property name="pageReplacementMode" value="SEGMENTED_LRU"/>
<property name="metricsEnabled" value="true"/>
<property name="warmUpConfiguration">
<bean class="org.apache.ignite.configuration.LoadAllWarmUpConfiguration"/>
</property>
</bean>
</property>
</bean>
</property>
If a node has permanently left the grid, you need to update the baseline topology. There are a number of ways of doing that, but the simplest is the control script:
./control.sh --baseline
Then
./control.sh --baseline add [node]
./control.sh --baseline remove [node]

Ignite query result fetched from cache or disk

With Ignite Native Persistence enabled, is there a way to know if the query result is being fetched from cache or disk?
I am using Apache Ignite 2.7.5 with 2 nodes running in PARTITIONED mode with the following configuration at each node.
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<!-- Redefining the default region's settings -->
<property name="pageSize" value="#{4 * 1024}"/>
<!--<property name="writeThrottlingEnabled" value="true"/>-->
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
<property name="initialSize" value="#{105L * 1024 * 1024 * 1024}"/>
<property name="name" value="Default_Region"/>
<!--Setting the size of the default region to 4GB. -->
<property name="maxSize" value="#{120L * 1024 * 1024 * 1024}"/>
<property name="checkpointPageBufferSize"
value="#{4096L * 1024 * 1024}"/>
<!--<property name="pageEvictionMode" value="RANDOM_2_LRU"/>-->
</bean>
</property>
</bean>
All data is stored in so-called pages located in off-heap memory, it could be either a RAM or a disk, though, for the latter Ignite needs to load pages to the off-heap first, and it doesn't perform reads from a disk directly. On-heap memory is required for data processing, like merging data sets for an SQL query, processing communication requests, and so on.
There is no solid way of detecting if a piece of required data was already preloaded into RAM, though there are some metrics that could help you to see what's happening to the cluster in general. I.e. how often page eviction happens and so on.
You might want to check the following metrics for a data region.
These three give an estimate of the data size loaded to a data region:
TotalAllocatedPages
PagesFillFactor
EmptyDataPages
When persistence is enabled, these provide information on how intensively we use disk for reads (smaller is better):
​
PagesReplaceRate
PagesRead
PagesReplaced
Some implementation details that might be useful:
Ignite Durable Memory, Ignite Persistent Store - under the hood

Apache Ignite zone(rack)-aware parititons

I'm battling to configure Apache Ignite to distribute partitions in zone-aware manner. I have Ignite 2.8.0 with 4 nodes running as StatefulSet pods in GKE 1.14 split in two zones. I followed the guide, and the example:
Propagated zone names into pod under AVAILABILITY_ZONE env var.
Then using Web-Console I verified that this env var was loaded correctly for each node.
I setup cache template in node XML config as in the below and created a cache from it using GET /ignite?cmd=getorcreate&cacheName=zone-aware-cache&templateName=zone-aware-cache (I can't see affinityBackupFilter settings in UI, but other parameters from the template got applied, so I assume it worked)
To simplify verification of partition distribution, I the partition number is set to just 2. After creating the cache I observed the following partition distribution:
Then I mapped nodes ids to values in AVAILABILITY_ZONE env var, as reported by nodes, with the following results:
AA146954 us-central1-a
3943ECC8 us-central1-c
F7B7AB67 us-central1-a
A94EE82C us-central1-c
As one can easily see, partition 0 pri/bak resides on nodes 3943ECC8 and A94EE82C which both are in the same zone. What am I missing to make it work?
Another odd thing, is then specifying partition number to be low (e.g. 2 or 4), only 3 out of 4 nodes are used). When using 1024 partitions, all nodes are utilized, but the problem still exists - 346 out of 1024 partitions had their primary/backup colocated in the same zone.
Here is my node config XML:
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
<!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<!-- Enables Kubernetes IP finder and setting custom namespace and service names. -->
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder">
<property name="namespace" value="ignite"/>
</bean>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean id="zone-aware-cache-template" abstract="true" class="org.apache.ignite.configuration.CacheConfiguration">
<!-- when you create a template via XML configuration, you must add an asterisk to the name of the template -->
<property name="name" value="zone-aware-cache*"/>
<property name="cacheMode" value="PARTITIONED"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="backups" value="1"/>
<property name="readFromBackup" value="true"/>
<property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
<property name="copyOnRead" value="true"/>
<property name="eagerTtl" value="true"/>
<property name="statisticsEnabled" value="true"/>
<property name="affinity">
<bean class="org.apache.ignite.cache.affinity.rendezvous.RendezvousAffinityFunction">
<property name="partitions" value="2"/> <!-- for debugging only! -->
<property name="excludeNeighbors" value="true"/>
<property name="affinityBackupFilter">
<bean class="org.apache.ignite.cache.affinity.rendezvous.ClusterNodeAttributeAffinityBackupFilter">
<constructor-arg>
<array value-type="java.lang.String">
<!-- Backups must go to different AZs -->
<value>AVAILABILITY_ZONE</value>
</array>
</constructor-arg>
</bean>
</property>
</bean>
</property>
</bean>
</list>
</property>
</bean>
</beans>
Update: Eventually excludeNeighbors false/true makes or breaks zone awareness. I'm not sure why it didn't work with excludeNeighbors=false previously for me. I made some scripts to automate my testing. And now it's definite that it's the excludeNeighbors setting. It's all here: https://github.com/doitintl/ignite-gke. Regardless I also opened a bug with IGNITE Jira: https://issues.apache.org/jira/browse/IGNITE-12896. Many thanks to #alamar for his suggestions.
I recommend setting excludeNeighbors to false. It is true in your case, it is not needed, and I get correct partitions mapping when I set it to false (of course, I also run all four nodes locally).
Environment property was enough, did not need to add it manually to user attributes.

Apache Ignite JDBC Thin Client Does not work with Existing Cache

I have created an Ignite cache "contact" and added "Person" object to it.
When I use Ignite JDBC Client mode I am able to query this cache. But when I implement JDBC Thin Client, it says that the table Person does not exist.
I tried the query this way:
Select * from Person
Select * from contact.Person
Both did not work with Thin Client. I am using Ignite 2.1.
I appreciate your help as how to query an existing cache using Thin Client.
Thank you.
Cache Configuration in default-config.xml
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<!-- Enabling Apache Ignite Persistent Store. -->
<property name="persistentStoreConfiguration">
<bean class="org.apache.ignite.configuration.PersistentStoreConfiguration"/>
</property>
<property name="binaryConfiguration">
<bean class="org.apache.ignite.configuration.BinaryConfiguration">
<property name="compactFooter" value="false"/>
</bean>
</property>
<property name="memoryConfiguration">
<bean class="org.apache.ignite.configuration.MemoryConfiguration">
<!-- Setting the page size to 4 KB -->
<property name="pageSize" value="#{4 * 1024}"/>
</bean>
</property>
<!-- Explicitly configure TCP discovery SPI to provide a list of initial nodes. -->
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
<property name="addresses">
<list>
<!-- In distributed environment, replace with actual host IP address. -->
<value>127.0.0.1:55500..55502</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>
Cache Configuration in the Server Side of the Code
CacheConfiguration<Long, Person> cc = new CacheConfiguration<>(cacheName);
cc.setCacheMode(CacheMode.REPLICATED);
cc.setRebalanceMode(CacheRebalanceMode.ASYNC);
cc.setIndexedTypes(Long.class, Person.class);
cache = ignite.getOrCreateCache(cc);
Thin Client JDBC URL
Class.forName("org.apache.ignite.IgniteJdbcThinDriver");
// Open the JDBC connection.
Connection conn = DriverManager.getConnection("jdbc:ignite:thin://192.168.1.111:10800");
Statement st = conn.createStatement();
If you want to query data from an existing cache using SQL, you should specify an SQL schema in the cache configuration. Add the following code before the cache creation:
cc.setSqlSchema("PUBLIC");
Note that you have persistence configured, so when you do ignite.getOrCreateCache(cc); the new configuration won't be applied, if a cache with this name is already persisted. You should, for example, remove persistence data or use createCache(...) method instead.

Ignite 2.0 how to swap to hard disk

When I do some test with ignite memory, some problems come to me.
The document said I can set the swap to hard disk enable in cacheconfiguration and set the swap file path in MemoryPolicyConfiguration.
However, the swapenable is missing in ignite 2.0 and setswapfile still exists. So, I wonder whether is swapping to disk still available in ignite 2.0. If so, how can I manage it.
define your memory policy, then inject into your cache. like this:
<!-- Defining a custom memory policy. -->
<property name="memoryPolicies">
<list>
<bean class="org.apache.ignite.configuration.MemoryPolicyConfiguration">
<property name="name" value="Default_Region"/>
<!-- 100 MB memory region with disabled eviction -->
<property name="initialSize" value="#{100 * 1024 * 1024}"/>
<!-- Setting a name of the swapping file. -->
<property name="swapFilePath" value="mindMemoryPolicySwap"/>
</bean>
try it.