Apache Ignite: What causes the ArrayIndexOutOfBoundsException in StripedCompositeReadWriteLock? - ignite

We have services written in Kotlin, which are behaves as clients for Ignite. We use ignite-core and ignite-spring dependencies in these services.
These service starts to produce errors like that after several days of uptime:
java.lang.ArrayIndexOutOfBoundsException: Index -4 out of bounds for length 16
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock.getReadHoldCount(StripedCompositeReadWriteLock.java:120)
at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.holdsLock(GridDhtPartitionTopologyImpl.java:262)
at org.apache.ignite.internal.processors.cache.CacheGroupContext.isTopologyLocked(CacheGroupContext.java:678)
at org.apache.ignite.internal.processors.cache.GridCacheContext.toCacheObject(GridCacheContext.java:1863)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapSingleUpdate(GridNearAtomicSingleUpdateFuture.java:553)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.map(GridNearAtomicSingleUpdateFuture.java:458)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:447)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:255)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1159)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1157)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:782)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1157)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.putAsync0(GridDhtAtomicCache.java:661)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2998)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2981)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync0(IgniteCacheProxyImpl.java:1339)
... 21 common frames omitted
Wrapped by: org.apache.ignite.IgniteCheckedException: Index -4 out of bounds for length 16
at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7628)
at org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:260)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:172)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$7.applyx(IgniteCacheProxyImpl.java:1344)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl$7.applyx(IgniteCacheProxyImpl.java:1341)
at org.apache.ignite.internal.util.lang.IgniteClosureX.apply(IgniteClosureX.java:38)
at org.apache.ignite.internal.util.future.GridFutureChainListener.applyCallback(GridFutureChainListener.java:78)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:70)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:30)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:399)
at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:354)
at org.apache.ignite.internal.util.future.GridFutureAdapter$ChainFuture.<init>(GridFutureAdapter.java:588)
at org.apache.ignite.internal.util.future.GridFutureAdapter.chain(GridFutureAdapter.java:361)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync0(IgniteCacheProxyImpl.java:1341)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync(IgniteCacheProxyImpl.java:1326)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.putAsync(GatewayProtectedCacheProxy.java:880)
... 19 common frames omitted
The index specified in the ArrayIndexOutOfBoundsException is different for different exception instances but it's always a negative value.
After restarting the services, the error goes and the services work fine next several days. But then the error appears again.
I investigated it a bit and found the error may be causes by overflowing IDX_GEN value in StripedCompositeReadWriteLock since this value is always increased. If there are a lot of calls to the lock class it maybe overflowed. But it's just my guess and I'm not 100% sure.
Is there other possible causes for the error?
Or maybe some workaround how to avoid this issue. I'd like to avoid periodically restarting a whole service just to reset the IDX_GEN
Ignite version and ignite dependencies versions is 2.11.0
Java 18

Related

Spring Batch Throwing an error 'Object 'BATCH_JOB_SEQ' is not a sequence object'

I am working on a spring batch project, I have used sql server as a local database, used this link to create the db script for batches but now get below given error.
23463 [main] WARN  o.s.b.c.c.a.DefaultBatchConfigurer - No transaction manager was provided, using a DataSourceTransactionManager
24014 [main] WARN  o.s.b.a.o.j.JpaBaseConfiguration$JpaWebConfiguration - spring.jpa.open-in-view is enabled by default. Therefore, database queries may be performed during view rendering. Explicitly configure spring.jpa.open-in-view to disable this warning
92338 [HikariPool-2 housekeeper] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=49s173ms145µs619ns).
102416 [HikariPool-1 housekeeper] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m969ms212µs331ns).
102682 [http-nio-8080-exec-1] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Could not obtain sequence value; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: Object 'BATCH_JOB_SEQ' is not a sequence object.] with root cause
com.microsoft.sqlserver.jdbc.SQLServerException: Object 'BATCH_JOB_SEQ' is not a sequence object.
When I call the SQL query
SELECT name, type_desc FROM sys.objects WHERE name=N'BATCH_JOB_SEQ';
The returned result is
Name: type_desc:
BATCH_JOB_SEQ SEQUENCE_OBJECT
In Spring Batch 4, tables were used to emulate sequences for SQL Server and a SqlServerMaxValueIncrementer was used to increment IDs.
In Spring Batch v5, we changed SQL Server support to use sequences instead of emulating them with tables. Version 5 uses a SqlServerSequenceMaxValueIncrementer to increment IDs.
The link you shared points to the DDL script of the main branch, which is for Spring Batch v5. So in your case, you either need to use the DDL script for v4 (which you can find here), or upgrade your application to Spring Batch 5 (the last version is 5.0.0-RC2, the GA is planned later this November 2022).

Spark - Failed to load collect frame - "RetryingBlockFetcher - Exception while beginning fetch"

We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!

Setting a timeout on webservice consumer built with org.apache.axis.client.Call and running on Domino

I'm maintaining an antedeluvian Notes application which connects to a SAP back-end via a manually done 'Webservice'
The server is running Domino Release 7.0.4FP2 HF97.
The Webservice is not the more recently Webservice Consumer, but a large Java agent which is using Apache soap.jar (org.apache.soap). Below an example of the calling code.
private Call setupSOAPCall() {
Call call = new Call();
SOAPHTTPConnection conn = new SOAPHTTPConnection();
call.setSOAPTransport(conn);
call.setEncodingStyleURI(Constants.NS_URI_SOAP_ENC);
There has been a change in the SAP system which is now taking 8 minutes to complete (verified by SAP Team).
I'm getting an error message as follows:
[SOAPException: faultCode=SOAP-ENV:Client; msg=For input string: "906 "; targetException=java.lang.NumberFormatException: For input string: "906 "]
I found a blog article describing the error message quite closely:
https://thejavablog.wordpress.com/category/jmeter/
and I've come to the hypothesis that it is a timeout message that is returning to my Call object and that this timeout message is being incorrectly parsed, hence the NumberFormat Exception.
Looking at my logs I can see that there is a time difference of 62 seconds between my call and the response.
I recommended that the server setting in the server document, tab Internet Protocols/HTTP/Timeouts/Request timeouts be changed from 60 seconds to 600 seconds, and the http task restarted with
tell http restart
I've re-run the tests and I am getting the same error, and the time difference is still slightly more than 60 seconds, which is not what I was expecting.
I read Michael Rulnau's blog entry
http://www.mruhnau.net/2014/06/how-to-overcome-domino-webservice.html
which points to this APR
http://www-01.ibm.com/support/docview.wss?uid=swg1LO48272
but I'm not convinced that this would apply in this case, since there is no way that IBM would know that my Java agent is in fact making a Soap call.
My current hypothesis is that I have to use either the setTimeout() method on
org.apache.axis.client.Call
https://axis.apache.org/axis/java/apiDocs/org/apache/axis/client/Call.html
or on the org.apache.soap.transport.http.SOAPHTTPConnection
https://docs.oracle.com/cd/B13789_01/appdev.101/b12024/org/apache/soap/transport/http/SOAPHTTPConnection.html
and that the timeout value is an apache default, not something that is controlled by the Domino server.
I'd be grateful for any help.
I understand your approach, and I hope this is the correct one to solve your problem.
Add a debug (console write would be fine) that display the default Timeout then try to increase it to 10 min.
SOAPHTTPConnection conn = new SOAPHTTPConnection();
System.out.println("time out is :" + conn.getTimeout());
conn.setTimeout(600000);//10 min in ms
System.out.println("after setting it, time out is :" + conn.getTimeout());
call.setSOAPTransport(conn);
Now keep in mind that Dommino has also a Max LotusScript/Java execution time, check this value and (at least for a try) change it: http://www.ibm.com/support/knowledgecenter/SSKTMJ_9.0.1/admin/othr_servertasksagentmanagertab_r.html (it's version 9 help but this part should be identical)
I've since discovered that it wasn't my code generating the error; the default timeout for the apache axis SOAPHTTPConnetion is 0, i.e. no timeout.

I am getting 'Local Search phase started with an uninitialized Solution' when I run on a larger dataset

I am developing a solver using Optaplanner 6.1.0, similar to the Vehicle Routing Problem. When I run my solver on 700 installers and 200 bookings, it will successfully solve the planning problem. But, when I used against a larger dataset (700 installers and 1220 bookings), I get
Caused by: java.lang.IllegalStateException: Local Search phase started with an uninitialized Solution. First initialize the Solution. For example, run a Construction Heuristic phase first.
but right before the exception,
16:10:40,378 INFO [DefaultConstructionHeuristicPhase] [http-listener-1(4)] Construction Heuristic phase (0) ended: step total (194), time spent (30693), best score (-1hard/-688803soft).
I am using <constructionHeuristicType>FIRST_FIT_DECREASING</constructionHeuristicType>
in my config.
Am I using it wrong?
Maybe the value range for a planning variable is empty. Especially with value range provider from entity, this is more likely. Feel free to file a jira that the error message should improve in such a case.
Diagnostic todo: Comment out the local solver phase, run the solver (so it only does the construction heuristic) and then iterate through the planning entities and print out the value for each planning value. Check if there are any nulls in there.
The fact that you have 194 steps, instead 200 steps in your CH indicates this. (If those other 6 planning entities are immovable, this won't trigger this exception (more info), so that's not the problem.)

What is going wrong with my etl process?

I'm using GoodData's CloudConnect (based on CloverETL) to read a massive json file and write certain elements to a .csv.
Unfortunately, I'm seeing the error pasted below in the console log. Am I running out of memory due to the error, or is that not enough memory the actual error?
ERROR [WatchDog_0] - Component [JSONReader:JSONREADER1] finished with status ERROR.
Java heap space
ERROR [WatchDog_0] - Error details:
org.jetel.exception.JetelRuntimeException: Component [JSONReader:JSONREADER1] finished with status ERROR.
at org.jetel.graph.Node.createNodeException(Node.java:543)
at org.jetel.graph.Node.run(Node.java:522)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.Exception: java.lang.OutOfMemoryError: Java heap space
at org.jetel.component.TreeReader$StreamConvertingXPathProcessor.checkThrownException(TreeReader.java:766)
at org.jetel.component.TreeReader$StreamConvertingXPathProcessor.manageThread(TreeReader.java:757)
at org.jetel.component.TreeReader$StreamConvertingXPathProcessor.processInput(TreeReader.java:732)
at org.jetel.component.TreeReader.execute(TreeReader.java:412)
at org.jetel.graph.Node.run(Node.java:493)
... 1 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at net.sf.saxon.tinytree.TinyTree.condense(TinyTree.java:379)
at net.sf.saxon.tinytree.TinyBuilder.close(TinyBuilder.java:177)
at net.sf.saxon.event.ReceivingContentHandler.endDocument(ReceivingContentHandler.java:219)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endDocument(AbstractSAXParser.java:745)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:515)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:404)
at net.sf.saxon.event.Sender.send(Sender.java:193)
at net.sf.saxon.event.Sender.send(Sender.java:50)
at net.sf.saxon.Configuration.buildDocument(Configuration.java:2973)
at net.sf.saxon.sxpath.XPathExpression.evaluate(XPathExpression.java:154)
at org.jetel.component.tree.reader.xml.XmlXPathEvaluator.iterate(XmlXPathEvaluator.java:79)
at org.jetel.component.tree.reader.XPathPushParser.handleContext(XPathPushParser.java:104)
at org.jetel.component.tree.reader.XPathPushParser.parse(XPathPushParser.java:84)
at org.jetel.component.TreeReader$StreamConvertingXPathProcessor$PipeParser.work(TreeReader.java:827)
at org.jetel.graph.runtime.CloverWorker.run(CloverWorker.java:87)
... 1 more
This looks like the second case: this error is caused by insufficient memory for your task.
Error occurred during evaluating (one of) your JSONReader component(s).
The JSON seems to be really huge and you should consider splitting this task into smaller ones if possible.
Did you run your transformation locally or on the gooddata server?
It is really hard to advise something specific without knowing details.
Try to use JSONExtract instead if JSONReader - it uses less memory, but also reads JSON files.
From the respective help documents:
JSONReader uses DOM, so the whole input is stored in memory and therefore the component can be memory-greedy.
JSONExtract uses SAX instead of DOM, so it uses less memory than JSONReader