Apache Ignite cache long running transaction - ignite

Sometimes, not always, when the topology of my cluster changes it happens that my application hangs for up to 1 minute or more. In the log I then see the below Ignite warning. I guess this is the reason why my application hangs at a cache operation.
What is causing the long transaction? I expect it is either network issues or GC?
I wasn't able to find out which cache operation in my code is causing this long transaction. Does the warning help me find out what operation it is?
22:00:30.456 [grid-timeout-worker-#63][101] WARN org.apache.ignite.internal.diagnostic-[warning] Found long running transaction [startTime=21:58:57.176, curTime=22:00:30.456, tx=GridNearTxLocal [mappings=IgniteTxMappingsImpl [], nearLocallyMapped=false, colocatedLocallyMapped=false, needCheckBackup=null, hasRemoteLocks=false, trackTimeout=false, lb=null, thread=<failed to find active thread 1498>, mappings=IgniteTxMappingsImpl [], super=GridDhtTxLocalAdapter [nearOnOriginatingNode=false, nearNodes=[], dhtNodes=[], explicitLock=false, super=IgniteTxLocalAdapter [completedBase=null, sndTransformedVals=false, depEnabled=false, txState=IgniteTxStateImpl [activeCacheIds=[], recovery=null, txMap=[]], super=IgniteTxAdapter [xidVer=GridCacheVersion [topVer=147994093, order=1536523104974, nodeOrder=74], writeVer=null, implicit=false, loc=true, threadId=1498, startTime=1536523137176, nodeId=e8153238-1d5a-4149-8db8-83a9fc820750, startVer=GridCacheVersion [topVer=147994093, order=1536523104974, nodeOrder=74], endVer=null, isolation=REPEATABLE_READ, concurrency=PESSIMISTIC, timeout=0, sysInvalidate=false, sys=false, plc=2, commitVer=null, finalizing=NONE, invalidParts=null, state=ACTIVE, timedOut=false, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], duration=93280ms, onePhaseCommit=false], size=0]]]]
My caches are created like this:
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="MFDB_JobList" />
<property name="cacheMode" value="PARTITIONED" />
<property name="backups" value="0" />
<property name="atomicityMode" value="TRANSACTIONAL"/>
<property name="writeSynchronizationMode" value="FULL_SYNC"/>
<property name="indexedTypes">
<list>
<value>java.util.UUID</value>
<value>CacheJobQueueEntry</value>
</list>
</property>
</bean>
The relevant ignite configuration looks like this:
<property name="networkTimeout" value="60000" />
<property name="networkSendRetryCount" value="10" />
<property name="failureDetectionTimeout" value="100000" />
<property name="clientFailureDetectionTimeout" value="100000" />

go through below link:-
https://issues.apache.org/jira/browse/IGNITE-6980
And also try
<property name="atomicityMode" value="ATOMIC"/>
with Async Cache operation. It might help to minimize above issue.

Related

Why is Ignite allocating 32GB of additional internal memory for the JVM upon grid activation?

Hi we are using Apache Ignite 2.7 (8 nodes, 120GB each) and configuring 16GB heap and a 100GB data region (with persistence on). Using native memory tracking we see the usually expected categories like heap, thread etc. are as expected but "Internal" (i.e. off-heap) is a whopping 132GB. That is on top of everything else the JVM needs to run. With such a huge memory request by the JVM the system is being driven into out of memory conditions (OS out of RAM).
As an experiment we reduced the data region to 1GB and measured JVM internal memory use before and after grid activation (grid is being activated by a client node we attach). We saw Internal (read: unsafe off-heap) memory jump from 62,154 to 32,897,187 KB on grid activation. So the 32GB overhead seems to be regardless of the size of the data region.
This 32GB of extra system RAM usage is a real problem for us. Why is Ignite doing this and how to we control it?
Thanks
Here is a typical native memory summary we are seeings. Note the HUGE Internal allocation.
native memory Total: reserved=156688325KB, committed=156439245KB
- Java Heap (reserved=16777216KB, committed=16777216KB) (mmap: reserved=16777216KB, committed=16777216KB)
- Class (reserved=112257KB, committed=111489KB) (classes #17951) (malloc=1665KB #17624) (mmap: reserved=110592KB, committed=109824KB)
- Thread (reserved=229015KB, committed=229015KB) (thread #223) (stack: reserved=228032KB, committed=228032KB) (malloc=723KB #1128)
(arena=260KB #432)
- Code (reserved=255790KB, committed=40250KB) (malloc=6190KB #11547) (mmap: reserved=249600KB, committed=34060KB)
- GC (reserved=704014KB, committed=704014KB) (malloc=48654KB #22251) (mmap: reserved=655360KB, committed=655360KB)
- Compiler (reserved=420KB, committed=420KB) (malloc=289KB #1284) (arena=131KB #15)
- Internal (reserved=138544815KB, committed=138544811KB) (malloc=138544779KB #35177) (mmap: reserved=36KB, committed=32KB)
- Symbol (reserved=26536KB, committed=26536KB) (malloc=24002KB #216741) (arena=2533KB #1)
- Native Memory Tracking (reserved=4822KB, committed=4822KB) (malloc=30KB #346) (tracking overhead=4791KB)
- Arena Chunk (reserved=673KB, committed=673KB) (malloc=673KB)
- Unknown (reserved=32768KB, committed=0KB) (mmap: reserved=32768KB, committed=0KB)
PS
We have the default data region set to 128MB, the systemRegionMaxSize set to 8GB and systemRegionInitialSize set to 512MB.
Config:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/opt/ignite/apache-ignite/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="600000"/>
<property name="rebalanceThreadPoolSize" value="12"/>
<property name="peerClassLoadingEnabled" value="true"/>
<property name="publicThreadPoolSize" value="32"/>
<property name="systemThreadPoolSize" value="32"/>
<property name="workDirectory" value="/data/ignite/work"/>
<property name="segmentationPolicy" value="RESTART_JVM"/>
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="checkpointReadLockTimeout" value="0"/>
<property name="systemRegionInitialSize" value="#{512L * 1024 * 1024}"/>
<property name="systemRegionMaxSize" value="#{8L * 1024 * 1024 * 1024}"/>
<property name="storagePath" value="/data/ignite/persistentStore"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<property name="initialSize" value="67108864"/>
<property name="maxSize" value="134217728"/>
<property name="persistenceEnabled" value="false"/>
<property name="metricsEnabled" value="true"/>
</bean>
</property>
<property name="dataRegionConfigurations">
<list>
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Tiered_Region"/>
<property name="initialSize" value="53687091200"/>
<property name="maxSize" value="53687091200"/>
<property name="persistenceEnabled" value="true"/>
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="evictionThreshold" value="0.75"/>
<property name="metricsEnabled" value="true"/>
</bean>
</list>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="default"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="backups" value="0"/>
</bean>
</list>
</property>
<property name="communicationSpi">
<bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="messageQueueLimit" value="#{1 * 1024}"/>
<property name="idleConnectionTimeout" value="30000"/>
</bean>
</property>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder">
<property name="awsCredentials" ref="aws.creds"/>
<property name="bucketName" value="project-test-xyz"/>
</bean>
</property>
</bean>
</property>
</bean>
<bean id="aws.creds" class="com.amazonaws.auth.BasicAWSCredentials">
<constructor-arg value="foo"/>
<constructor-arg value="bar"/>
</bean>
</beans>
[ Adding logs below ]
[2019-05-17 22:28:39,592][WARN ][main][IgniteKernal] Peer class
loading is enabled (disable it in production for performance and
deployment consistency reasons) [2019-05-17 22:28:39,593][WARN
][main][IgniteKernal] Please set system property
'-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed
environments. [2019-05-17 22:28:40,141][WARN
][main][NoopCheckpointSpi] Checkpoints are disabled (to enable
configure any GridCheckpointSpi implementation) [2019-05-17
22:28:40,214][WARN ][main][GridCollisionManager] Collision resolution
is disabled (all jobs will be activated upon arrival). [2019-05-17
22:28:41,690][WARN ][main][GridCacheDatabaseSharedManager]
DataRegionConfiguration.maxWalArchiveSize instead
DataRegionConfiguration.walHistorySize would be used for removing old
archive wal files [2019-05-17 22:28:41,826][WARN
][main][PartitionsEvictManager] Logging at INFO level without checking
if INFO level is enabled: Evict partition permits=4 [2019-05-17
22:28:46,291][WARN ][main][IgniteKernal] Nodes started on local
machine require more than 80% of physical RAM what can lead to
significant slowdown due to swapping (please decrease JVM heap size,
data region size or checkpoint buffer size) [required=12516MB,
available=14008MB] log4j: Finalizing appender named [null].
[2019-05-17 22:31:19,958][WARN
][disco-event-worker-#42][GridDiscoveryManager] Local node's value of
'java.net.preferIPv4Stack' system property differs from remote node's
(all nodes in topology should have identical value)
[locPreferIpV4=null, rmtPreferIpV4=true, locId8=f25228c0,
rmtId8=eac4211d, rmtAddrs=[192.168.1.5/127.0.0.1, /192.168.1.5],
rmtNode=ClusterNode [id=eac4211d-c272-4eb0-9bd5-f91dfa34a0e9, order=2,
addr=[127.0.0.1, 192.168.1.5], daemon=false]] [2019-05-17
22:32:24,265][WARN ][exchange-worker-#43][GridAffinityAssignmentCache]
Logging at INFO level without checking if INFO level is enabled: Local
node affinity assignment distribution is not ideal [cache=default,
expectedPrimary=1024.00, actualPrimary=1024, expectedBackups=1024.00,
actualBackups=0, warningThreshold=50.00%] [2019-05-17
22:32:24,269][WARN ][exchange-worker-#43][GridAffinityAssignmentCache]
Logging at INFO level without checking if INFO level is enabled: Local
node affinity assignment distribution is not ideal [cache=default,
expectedPrimary=1024.00, actualPrimary=1024, expectedBackups=1024.00,
actualBackups=0, warningThreshold=50.00%] [2019-05-17
22:32:24,850][WARN ][exchange-worker-#43][GridAffinityAssignmentCache]
Logging at INFO level without checking if INFO level is enabled: Local
node affinity assignment distribution is not ideal [cache=default,
expectedPrimary=1024.00, actualPrimary=1024, expectedBackups=1024.00,
actualBackups=0, warningThreshold=50.00%] [2019-05-17
22:32:24,911][WARN
][disco-notifier-worker-#41][GridClusterStateProcessor] Logging at
INFO level without checking if INFO level is enabled: Received state
change finish message: true 22:33:49.086 [exchange-worker-#43] INFO
c.b.aa.ceres.loader.S3CacheLoader - load
eb5445c7-d7fa-4018-95b6-63c4a0911eae received inject ignite instance
IgniteKernal [longJVMPauseDetector=LongJVMPauseDetector
[workerRef=Thread[jvm-pause-detector-worker,5,main], longPausesCnt=0,
longPausesTotalDuration=0, longPausesTimestamps=[0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], longPausesDurations=[0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
cfg=IgniteConfiguration [igniteInstanceName=null, pubPoolSize=32,
svcPoolSize=32, callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=16,
mgmtPoolSize=4, igfsPoolSize=8, dataStreamerPoolSize=8,
utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000,
p2pPoolSize=2, qryPoolSize=8, igniteHome=/opt/ignite/apache-ignite,
igniteWorkDir=/data/ignite/work,
mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer#6f94fa3e,
nodeId=f25228c0-afbc-4626-990a-68f97fd5b258, marsh=BinaryMarshaller
[], marshLocJobs=false, daemon=false, p2pEnabled=true,
netTimeout=5000, sndRetryDelay=1000, sndRetryCnt=3,
metricsHistSize=10000, metricsUpdateFreq=2000,
metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi
[addrRslvr=null, sockTimeout=5000, ackTimeout=5000,
marsh=JdkMarshaller
[clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1#44a3f602],
reconCnt=10, reconDelay=2000, maxAckTimeout=600000,
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
segPlc=NOOP, segResolveAttempts=2, waitForSegOnStart=true,
allResolversPassReq=true, segChkFreq=60000,
commSpi=TcpCommunicationSpi
[connectGate=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ConnectGateway#6020964a,
connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy#3f2874d5,
enableForcibleNodeKill=false, enableTroubleshootingLog=true,
locAddr=null, locHost=0.0.0.0/0.0.0.0, locPort=47100,
locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false,
idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000,
reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=1024,
slowClientQueueLimit=0, nioSrvr=GridNioServer [selectorSpins=0,
filterChain=FilterChain[filters=[GridNioCodecFilter
[parser=org.apache.ignite.internal.util.nio.GridDirectParser#7873ad1,
directMode=true], GridConnectionBytesVerifyFilter], closed=false,
directBuf=true, tcpNoDelay=true, sockSndBuf=32768, sockRcvBuf=32768,
writeTimeout=2000, idleTimeout=600000, skipWrite=false,
skipRead=false, locAddr=0.0.0.0/0.0.0.0:47100, order=LITTLE_ENDIAN,
sndQueueLimit=1024, directMode=true, sslFilter=null,
msgQueueLsnr=null, readerMoveCnt=0, writerMoveCnt=0,
readWriteSelectorsAssign=false], shmemSrv=null,
usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true,
filterReachableAddresses=false, ackSndThreshold=32,
unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=47100,
boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null,
ctxInitLatch=java.util.concurrent.CountDownLatch#7b757828[Count = 0],
stopping=false],
evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi#282cb7c7,
colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [],
indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi#50de186c,
addrRslvr=null,
encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi#5a3bc7ed,
clientMode=false, rebalanceThreadPoolSize=1,
txCfg=TransactionConfiguration [txSerEnabled=false,
dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC,
dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0,
pessimisticTxLogSize=0, pessimisticTxLogLinger=10000,
tmLookupClsName=null, txManagerFactory=null, useJtaSync=false],
cacheSanityCheckEnabled=true, discoStartupDelay=60000,
deployMode=SHARED, p2pMissedCacheSize=100, locHost=null,
timeSrvPortBase=31100, timeSrvPortRange=100,
failureDetectionTimeout=60000, sysWorkerBlockedTimeout=null,
clientFailureDetectionTimeout=30000, metricsLogFreq=60000,
hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null,
host=null, port=11211, noDelay=true, directBuf=false,
sndBufSize=32768, rcvBufSize=32768, idleQryCurTimeout=600000,
idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=4,
idleTimeout=7000, sslEnabled=false, sslClientAuth=false,
sslCtxFactory=null, sslFactory=null, portRange=100, threadPoolSize=8,
msgInterceptor=null], odbcCfg=null, warmupClos=null,
atomicCfg=AtomicConfiguration [seqReserveSize=1000,
cacheMode=PARTITIONED, backups=1, aff=null, grpName=null],
classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null,
memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration
[sysRegionInitSize=41943040, sysRegionMaxSize=104857600,
pageSize=1024, concLvl=0, dfltDataRegConf=DataRegionConfiguration
[name=Default_Region, maxSize=134217728, initSize=67108864,
swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9,
emptyPagesPoolSize=100, metricsEnabled=true,
metricsSubIntervalCount=5, metricsRateTimeInterval=60000,
persistenceEnabled=false, checkpointPageBufSize=0],
dataRegions=[DataRegionConfiguration [name=Tiered_Region,
maxSize=8589934592, initSize=8589934592, swapPath=null,
pageEvictionMode=DISABLED, evictionThreshold=0.9,
emptyPagesPoolSize=100, metricsEnabled=true,
metricsSubIntervalCount=5, metricsRateTimeInterval=60000,
persistenceEnabled=true, checkpointPageBufSize=0]],
storagePath=/data/ignite/persistentStore, checkpointFreq=180000,
lockWaitTime=30000, checkpointThreads=8,
checkpointWriteOrder=SEQUENTIAL, walHistSize=20,
maxWalArchiveSize=1073741824, walSegments=10, walSegmentSize=67108864,
walPath=db/wal, walArchivePath=db/wal/archive, metricsEnabled=false,
walMode=LOG_ONLY, walTlbSize=131072, walBuffSize=0, walFlushFreq=2000,
walFsyncDelay=1000, walRecordIterBuffSize=67108864,
alwaysWriteFullPages=false,
fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory#2fb68ec6,
metricsSubIntervalCnt=5, metricsRateTimeInterval=60000,
walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=true,
walCompactionEnabled=false, walCompactionLevel=1,
checkpointReadLockTimeout=null], activeOnStart=true,
autoActivation=true, longQryWarnTimeout=3000, sqlConnCfg=null,
cliConnCfg=ClientConnectorConfiguration [host=null, port=10800,
portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true,
maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0,
jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true,
sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false,
sslCtxFactory=null], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000,
authEnabled=false, failureHnd=null, commFailureRslvr=null],
igniteInstanceName=null, startTime=1558132126418,
rsrcCtx=org.apache.ignite.internal.processors.resource.GridSpringResourceContextImpl#556d0e12,
reconnectState=ReconnectState [firstReconnectFut=GridFutureAdapter
[ignoreInterrupts=false, state=INIT, res=null, hash=1426466647],
curReconnectFut=null, reconnectDone=null]]
I thik that's Checkpoint Page Buffer, which is 20% of data region's size by default.
You may specify it explicitly to make sure you don't forget it, and decrease the region size accordingly to make sure you don't run out of RAM.
Should only be applicable to persistent regions.
Note that you should also expect your OS take a few GBs towards its data structures and block caches, so I don't think you should allocate 116G of 120G to Ignite's Off-Heap. Don't forget about Heap either.

Ignite Persistence not working in Yarn offline deployment

Am trying to enable Ignite Native Persistence in Ignite Yarn Deployment.
Purpose of this is to have data written to disc when RAM overflows.
But when I try to add large number of records to Ignite Grid, the node is getting disconnected and getting below exception.
Error :class org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled (node is stopping).
javax.cache.CacheException: class org.apache.ignite.internal.NodeStoppingException: Operation has been cancelled (node is stopping).
ERROR com.project$$anonfun$startWritingToGrid$1: org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1287)
ERROR com.project$$anonfun$startWritingToGrid$1: org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:1648)
ERROR com.project$$anonfun$startWritingToGrid$1: org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1071)
ERROR com.project$$anonfun$startWritingToGrid$1: org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:928)
Please find below the details.
Ignite Version : 2.3.0
Cluster details for Yarn Deployment:
IGNITE_NODE_COUNT=10
IGNITE_RUN_CPU_PER_NODE=5
IGNITE_MEMORY_PER_NODE=10096
IGNITE_VERSION=2.3.0
IGNITE_PATH=/tmp/ignite/2.3.0/apache-ignite-fabric-2.3.0-bin.zip
IGNITE_RELEASES_DIR=/tmp/ignite/2.3.0/releases
IGNITE_WORKING_DIR=/tmp/ignite/2.3.0/work
IGNITE_XML_CONFIG=/tmp/ignite/2.3.0/config/ignite-config.xml
IGNITE_USERS_LIBS=/tmp/ignite/2.3.0/libs
IGNITE_LOCAL_WORK_DIR=/local/home/ignite/2.3.0
Ignite Configuration for Yarn deployment:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util-2.0.xsd">
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="clientMode" value="false"/>
<property name="dataStorageConfiguration">
<bean
class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="defaultDataRegionConfiguration">
<bean
class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="persistenceEnabled" value="true"/>
</bean>
</property>
</bean>
</property>
<property name="peerClassLoadingEnabled" value="true"/>
<property name="networkTimeout" value="10000000"/>
<property name="networkSendRetryCount" value="50"/>
<property name="discoverySpi">
<bean
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value><hosts>:47500</value>
</list>
</property>
</bean>
</property>
<property name="networkTimeout"
value="10000000"/>
<property name="joinTimeout"
value="10000000"/>
<property name="maxAckTimeout"
value="10000000"/>
<property name="reconnectCount" value="50"/>
<property name="socketTimeout" value="10000000"/>
</bean>
</property>
</bean>
</beans>
Code to Add Data to grid :
var cacheConf: CacheConfiguration[Long, Data] = new
CacheConfiguration[Long, Data]("DataCache")
cacheConf.setCacheMode(CacheMode.PARTITIONED)
cacheConf.setIndexedTypes(classOf[Long], classOf[Data])
val cache = ignite.getOrCreateCache(cacheConf)
var dataMap = getDataMap()
cache.putAll(dataMap)
Code to Count records:
val sql1 = "select * from DataCache"
val count = cache.query(new SqlFieldsQuery(sql1)).getAll.size()

JAVA - Spring Integration Flow Transaction + com.atomikos.icatch.HeurHazardException: Heuristic Exception

i'm trying to make transactional an entire flow of Spring integration, the flow starts with an adapter to an IBM MQ Queue, and then we have a complex flow with ActiveMQ Queues, and i'm getting a com.atomikos.icatch.HeurHazardException: Heuristic Exception when atomikos is trying to register the resources.
Here's my applicationContext.xml.
<bean id="mqConnectionFactory" class="com.ibm.mq.jms.MQQueueConnectionFactory">
<property name="hostName" value="${ibm.mq.connection.url}" />
<property name="port" value="${ibm.mq.connection.port}" />
<property name="transportType" value="${ibm.mq.conection.type}" />
<property name="queueManager" value="${ibm.mq.conection.queuemanager}" />
<property name="channel" value="${ibm.mq.conection.channel}" />
</bean>
<bean id="mqConnectionFactoryCache"
class="org.springframework.jms.connection.CachingConnectionFactory">
<property name="targetConnectionFactory" ref="mqConnectionFactory" />
<property name="sessionCacheSize" value="10" />
<property name="cacheConsumers" value="true"></property>
</bean>
<bean id="mqInboundQueue" class="com.ibm.mq.jms.MQQueue">
<constructor-arg value="${ibm.mq.conection.queue}" />
</bean>
<bean id="amqConnectionFactory" class="org.apache.activemq.ActiveMQXAConnectionFactory">
<!-- brokerURL -->
<property name="brokerURL" value="${active.mq.connection.url}" />
<property name="redeliveryPolicy" ref="amqRedeliveryPolicy"></property>
</bean>
<bean id="amqRedeliveryPolicy" class="org.apache.activemq.RedeliveryPolicy">
<property name="maximumRedeliveries" value="5"></property>
<property name="redeliveryDelay" value="1000"></property>
</bean>
<bean id="atmConnectionFactory" class="com.atomikos.jms.AtomikosConnectionFactoryBean"
init-method="init" destroy-method="close">
<property name="uniqueResourceName" value="INPUT" />
<property name="xaConnectionFactory" ref="amqConnectionFactory" />
<bean id="inputChannelQueue" class="org.apache.activemq.command.ActiveMQQueue">
<constructor-arg value="INPUT" />
</bean>
<jms:channel id="inputChannel" queue="inputChannelQueue">
<jms:interceptors>
<int:wire-tap channel="inputLoggingChannel"/>
</jms:interceptors>
</jms:channel>
<jms:message-driven-channel-adapter id="MQInboundGateway"
connection-factory="mqConnectionFactoryCache"
destination="mqInboundQueue"
channel="inputChannel"
concurrent-consumers="20"
cache-level="3" acknowledge="transacted" />
.
. //OTHER SPRING INTEGRATION ELEMENTS
.
<int:channel id="concentradorOutputChannel" /> //Final Channel to the output adapter.
<!-- ATOMIKOS BEANS -->
<bean id="userTransactionService" class="com.atomikos.icatch.config.UserTransactionServiceImp" init-method="init" destroy-method="shutdownForce">
<constructor-arg>
<!-- IMPORTANT: specify all Atomikos properties here -->
<props>
<prop key="com.atomikos.icatch.service">
com.atomikos.icatch.standalone.UserTransactionServiceFactory
</prop>
</props>
</constructor-arg>
</bean>
<bean id="AtomikosTransactionManager" class="com.atomikos.icatch.jta.UserTransactionManager" init-method="init" destroy-method="close" depends-on="userTransactionService">
<property name="startupTransactionService" value="false" />
<property name="forceShutdown" value="false" />
</bean>
<bean id="AtomikosUserTransaction" class="com.atomikos.icatch.jta.UserTransactionImp" depends-on="userTransactionService">
<property name="transactionTimeout" value="300" />
</bean>
<!-- //================= TX BEANS =================// -->
<bean id="JtaTransactionManager" class="org.springframework.transaction.jta.JtaTransactionManager" depends-on="userTransactionService">
<property name="transactionManager" ref="AtomikosTransactionManager" />
<property name="userTransaction" ref="AtomikosUserTransaction" />
</bean>
<aop:config>
<aop:advisor advice-ref="txAdvice" pointcut="bean(inputChannel)" />
</aop:config>
<tx:advice id="txAdvice" transaction-manager="JtaTransactionManager">
<tx:attributes>
<tx:method name="send" />
</tx:attributes>
</tx:advice>
This is the exception:
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING core version: 3.9.3
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.automatic_resource_registration = true
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.client_demarcation = false
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.threaded_2pc = false
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.serial_jta_transactions = true
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.serializable_logging = true
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.log_base_dir = .\
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.max_actives = 50
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.checkpoint_interval = 500
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.enable_logging = true
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.output_dir = .\
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.log_base_name = tmlog
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.max_timeout = 300000
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.tm_unique_name = 10.200.204.8.tm
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING java.naming.factory.initial = com.sun.jndi.rmi.registry.RegistryContextFactory
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING java.naming.provider.url = rmi://localhost:1099
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.service = com.atomikos.icatch.standalone.UserTransactionServiceFactory
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.force_shutdown_on_vm_exit = false
INFO com.atomikos.icatch.config.imp.AbstractUserTransactionService USING com.atomikos.icatch.default_jta_timeout = 10000
WARN com.atomikos.datasource.xa.XAResourceTransaction XAResourceTransaction 10.200.204.8.tm001540002610.200.204.8.tm157: no XAResource to commit - the required resource is probably not yet intialized?
WARN com.atomikos.icatch.imp.CommitMessage Unexpected error in commit
com.atomikos.icatch.HeurHazardException: Heuristic Exception
at com.atomikos.datasource.xa.XAResourceTransaction.commit(XAResourceTransaction.java:707)
at com.atomikos.icatch.imp.CommitMessage.send(CommitMessage.java:72)
at com.atomikos.icatch.imp.PropagationMessage.submit(PropagationMessage.java:83)
at com.atomikos.icatch.imp.Propagator$PropagatorThread.run(Propagator.java:79)
at com.atomikos.icatch.imp.Propagator.submitPropagationMessage(Propagator.java:58)
at com.atomikos.icatch.imp.HeurHazardStateHandler.onTimeout(HeurHazardStateHandler.java:131)
at com.atomikos.icatch.imp.CoordinatorImp.alarm(CoordinatorImp.java:933)
at com.atomikos.timing.PooledAlarmTimer.notifyListeners(PooledAlarmTimer.java:112)
at com.atomikos.timing.PooledAlarmTimer.run(PooledAlarmTimer.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
I need the entire flow from the inputChannel to concentradorOutputChannel to be within the TX.
Thanks for your help.
Solved, just needed to remove the files tmlog.lck and tmlog6.log

Spring integration deadlock using Aggregator + MessageStoreReaper + Redis?

This question is related to this post in SI forum, but as the forum is closed, I post it here to continue the thread:
http://forum.spring.io/forum/spring-projects/integration/748192-messages-not-flowing-when-using-jms-channels
To sum up, I have an aggregator with a Redis message store and a reaper scheduled every 60 secs. Messages are sent to the aggregator using a JMS-Channel. Here's the config:
<bean id="jedisPoolConfigBean" class="redis.clients.jedis.JedisPoolConfig">
<property name="maxActive" value="10" />
<property name="maxIdle" value="5" />
<property name="minIdle" value="1" />
<property name="testOnBorrow" value="true" />
<property name="testOnReturn" value="true" />
<property name="testWhileIdle" value="true" />
</bean>
<bean id="loyaltyRedisConnectionFactory"
class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory">
<property name="hostName" value="${redis.hostName}" />
<property name="database" value="${redis.loyalty.database}" />
<property name="port" value="${redis.port}" />
<property name="poolConfig" ref="jedisPoolConfigBean" />
</bean>
<bean id="loyaltyAggregatorRedisMessageStore"
class="org.springframework.integration.redis.store.RedisMessageStore">
<constructor-arg ref="loyaltyRedisConnectionFactory" />
</bean>
<task:scheduler id="loyaltyScheduler"
pool-size="${loyalty.aggregator.reaper.pool_size}"/>
<task:scheduled-tasks scheduler="loyaltyScheduler">
<task:scheduled ref="loyaltyReaperBean" method="run" fixed-rate="60000" />
</task:scheduled-tasks>
<bean id="loyaltyReaperBean" class="org.springframework.integration.store.MessageGroupStoreReaper">
<property name="messageGroupStore" ref="loyaltyAggregatorRedisMessageStore" />
<property name="timeout" value="30000" />
</bean>
I'm a bit new with thread dumps but as far I can see DefaultMessageListener threads are stucked by the taskScheduler that launches the MessageReaper. In particular, at the ReentrantLock class.
Any idea? Maybe I have to make some other config to avoid this
Any help is appreciated
Thanks in advance!
Guzman
"loyaltyScheduler-1" - Thread t#57
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <47e54701> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- locked <368788cf> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"DefaultMessageListenerContainer-6" - Thread t#73
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- waiting to lock <368788cf> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "loyaltyScheduler-1" t#57
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:894)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1221)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.handleMessageInternal(AbstractCorrelatingMessageHandler.java:223)
"DefaultMessageListenerContainer-7" - Thread t#80
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- waiting to lock <368788cf> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "loyaltyScheduler-1" t#57
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:894)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1221)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.handleMessageInternal(AbstractCorrelatingMessageHandler.java:223)
"DefaultMessageListenerContainer-8" - Thread t#83
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- waiting to lock <368788cf> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) owned by "loyaltyScheduler-1" t#57
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:894)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1221)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
at org.springframework.integration.aggregator.AbstractCorrelatingMessageHandler.handleMessageInternal(AbstractCorrelatingMessageHandler.java:223)

What is the proper setup for spring defaultmessagelistener container on glassfish for redelivery after exception?

I'm trying to setup spring's DefaultMessageListenerContainer class to redeliver messages after an exception is thrown or session.rollback() is called. I am also trying to get this running on glassfish 3.1.2 web profile.
When calling session.rollback() in the onMessage() method of my SessionAwareMessageListener, I get an exception with the message saying: MessageDispatcher - [C4024]: The session is not transacted. I don't see this problem with ActiveMQ, but of course that configuration is different because I'm not using it in an application server.
Has anyone here gotten this working? My configuration follows:
<bean id="jndiTemplate" class="org.springframework.jndi.JndiTemplate">
<property name="environment">
<props>
<prop key="java.naming.factory.initial">com.sun.enterprise.naming.SerialInitContextFactory</prop>
<prop key="java.naming.provider.url">${jms.jndicontext.url}</prop>
<prop key="java.naming.factory.state">com.sun.corba.ee.impl.presentation.rmi.JNDIStateFactoryImpl</prop>
<prop key="java.naming.factory.url.pkgs">com.sun.enterprise.naming</prop>
</props>
</property>
</bean>
<bean id="jmsConnectionFactory" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiTemplate" ref="jndiTemplate" />
<property name="jndiName" value="${jms.connection.factory}" />
</bean>
<bean id="jmsTemplate"
class="org.springframework.jms.core.JmsTemplate">
<property name="connectionFactory" ref="jmsConnectionFactory"/>
<property name="defaultDestination" ref="jmsServiceQueue"/>
</bean>
<bean id="jmsServiceProducer"
class="net.exchangesolutions.services.messaging.service.jms.JmsMessageServiceProducerImpl">
<property name="serviceTemplate" ref="jmsTemplate"/>
<property name="serviceDestination" ref="jmsServiceQueue"/>
</bean>
<bean id="myMessageListener"
class="com.myorg.jms.MessageDispatcher"/>
<bean id="jmsServiceContainer"
class="org.springframework.jms.listener.DefaultMessageListenerContainer">
<property name="connectionFactory" ref="jmsConnectionFactory"/>
<property name="destination" ref="jmsServiceQueue"/>
<property name="messageListener" ref="myMessageListener"/>
<property name="errorHandler" ref="jmsErrorHandler" />
<property name="receiveTimeout" value="180000"/>
<property name="concurrentConsumers" value="1"/>
<property name="cacheLevelName" value="CACHE_NONE"/>
<property name="pubSubNoLocal" value="true"/>
<property name="sessionTransacted" value="true"/>
<property name="sessionAcknowledgeMode" value="2" />
<property name="transactionManager" ref="jmsTransactionManager"/>
</bean>
<bean id="jmsTransactionManager" class="org.springframework.jms.connection.JmsTransactionManager">
<property name="connectionFactory" ref="jmsConnectionFactory"/>
</bean>
Setting the acknowledge="auto", the message is acknowledged before listener execution, so the message is deleted from queue.
I have also achieved the DLQ scenario in Spring Application by doing the following changes to your code.
First, we set the acknowledge="transacted" (Since we want guaranteed redelivery in case of exception thrown and Trans acknowledgment for successful listener execution)
<jms:listener-container container-type="default" connection-factory="connectionFactory" acknowledge=" transacted">
Next, since we want to throw the JMSException, we are implementing SessionAwareMessageListener.
public class MyMessageQueueListener implements SessionAwareMessageListener {
public void onMessage( Message message , Session session ) throws JMSException {
//DO something
if(success){
//Do nothing – so the transaction acknowledged
} else {
//throw exception - So it redelivers
throw new JMSException("..exception");
}
}
}
I have tested this. This seems working fine.