Could not restore Flink keyed state backend from savepoint when upgrading from 1.10 to 1.11 - migration

We tried to migrate to Flink 1.11 recovering the job from a savepoint taken in 1.10. The job code was not changed, only updated the Flink version of the dependencies to 1.11 (in SBT, we use Scala) and re-built the jar.
All operators have uids and the job correctly recovers from that savepoint if run on a 1.10 cluster, we are getting the following exception and have no clue:
java.lang.Exception: Exception while creating StreamOperatorStateContext.
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:473)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:469)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:522)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for CoStreamFlatMap_8a6da66867c6cf8469bae55e9f834297_(1/1) from any of the 1 provided restore options.
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
... 9 more
Caused by: org.apache.flink.runtime.state.BackendBuildingException: Failed when trying to restore heap backend
at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:116)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.createKeyedStateBackend(FsStateBackend.java:540)
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
... 11 more
Caused by: java.lang.IllegalStateException: Missing value for the key 'org.apache.flink.runtime.checkpoint.savepoint.Savepoint'
at org.apache.flink.util.LinkedOptionalMap.unwrapOptionals(LinkedOptionalMap.java:190)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializerSnapshot.restoreSerializer(KryoSerializerSnapshot.java:86)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:546)
at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:505)
at org.apache.flink.api.common.typeutils.NestedSerializersSnapshotDelegate.snapshotsToRestoreSerializers(NestedSerializersSnapshotDelegate.java:225)
at org.apache.flink.api.common.typeutils.NestedSerializersSnapshotDelegate.getRestoredNestedSerializers(NestedSerializersSnapshotDelegate.java:83)
at org.apache.flink.api.common.typeutils.CompositeTypeSerializerSnapshot.restoreSerializer(CompositeTypeSerializerSnapshot.java:204)
at org.apache.flink.runtime.state.StateSerializerProvider.previousSchemaSerializer(StateSerializerProvider.java:189)
at org.apache.flink.runtime.state.StateSerializerProvider.currentSchemaSerializer(StateSerializerProvider.java:164)
at org.apache.flink.runtime.state.RegisteredKeyValueStateBackendMetaInfo.getStateSerializer(RegisteredKeyValueStateBackendMetaInfo.java:136)
at org.apache.flink.runtime.state.heap.StateTable.getStateSerializer(StateTable.java:315)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.createStateMap(CopyOnWriteStateTable.java:54)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.createStateMap(CopyOnWriteStateTable.java:36)
at org.apache.flink.runtime.state.heap.StateTable.<init>(StateTable.java:98)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.<init>(CopyOnWriteStateTable.java:49)
at org.apache.flink.runtime.state.heap.AsyncSnapshotStrategySynchronicityBehavior.newStateTable(AsyncSnapshotStrategySynchronicityBehavior.java:41)
at org.apache.flink.runtime.state.heap.HeapSnapshotStrategy.newStateTable(HeapSnapshotStrategy.java:243)
at org.apache.flink.runtime.state.heap.HeapRestoreOperation.createOrCheckStateForMetaInfo(HeapRestoreOperation.java:185)
at org.apache.flink.runtime.state.heap.HeapRestoreOperation.restore(HeapRestoreOperation.java:152)
at org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:114)
Can anyone help?
Thanks
UPDATE
The savepoint comes from a savepoint processed with the stateprocessor API and the state in the KeyedStateBootstrapFunction is made of:
var mapToDetector: MapState[String, Map[String, Detector]] = null
var detectorsConfigs: MapState[String,AnomalyStepConfiguration] = null
var outputTopic : ValueState[String]= null
var pipeStatus: MapState[String, String] = null
var debounceMap: MapState[String, Map[String, DebounceStats]] = null

org.apache.flink.runtime.checkpoint.savepoint.Savepoint is renamed in FLINK-16247. However, this class is used in savepoint metadata and should not exist in keyed state serializer on task side. In other words, did you use something related to checkpoint or savepoint on task side in state access?
I also try to use StateMachineExample to create savepoint in Flink-1.10.2 and it resumes successfully within Flink-1.11.1 cluster. The program also used CopyOnWriteStateTable by default which is what you see in your exception stack trace.

Related

Spring Batch Throwing an error 'Object 'BATCH_JOB_SEQ' is not a sequence object'

I am working on a spring batch project, I have used sql server as a local database, used this link to create the db script for batches but now get below given error.
23463 [main] WARN  o.s.b.c.c.a.DefaultBatchConfigurer - No transaction manager was provided, using a DataSourceTransactionManager
24014 [main] WARN  o.s.b.a.o.j.JpaBaseConfiguration$JpaWebConfiguration - spring.jpa.open-in-view is enabled by default. Therefore, database queries may be performed during view rendering. Explicitly configure spring.jpa.open-in-view to disable this warning
92338 [HikariPool-2 housekeeper] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=49s173ms145µs619ns).
102416 [HikariPool-1 housekeeper] WARN  com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m969ms212µs331ns).
102682 [http-nio-8080-exec-1] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet] - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Could not obtain sequence value; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: Object 'BATCH_JOB_SEQ' is not a sequence object.] with root cause
com.microsoft.sqlserver.jdbc.SQLServerException: Object 'BATCH_JOB_SEQ' is not a sequence object.
When I call the SQL query
SELECT name, type_desc FROM sys.objects WHERE name=N'BATCH_JOB_SEQ';
The returned result is
Name: type_desc:
BATCH_JOB_SEQ SEQUENCE_OBJECT
In Spring Batch 4, tables were used to emulate sequences for SQL Server and a SqlServerMaxValueIncrementer was used to increment IDs.
In Spring Batch v5, we changed SQL Server support to use sequences instead of emulating them with tables. Version 5 uses a SqlServerSequenceMaxValueIncrementer to increment IDs.
The link you shared points to the DDL script of the main branch, which is for Spring Batch v5. So in your case, you either need to use the DDL script for v4 (which you can find here), or upgrade your application to Spring Batch 5 (the last version is 5.0.0-RC2, the GA is planned later this November 2022).

Issue in saving the content of a dataframe to table

I have a data source (hive external tables) which refresh the data in adhoc manner. To avoid any discrepancies in the execution i'm trying to save the data as a table in my location.
Initially, i have loaded the data from data source to a dataframe
source = hqlContext.table("datasourcedb.table1") // this is working fine
Then, trying to save it the my application location -
source.write.mode('overwrite').saveAsTable("appdb.table1") //No read/write operations on appdb.table1 while doing this action
Above actions throwing exceptions:
java.io.IOException: The file being written is in an invalid state. Probably caused by an error thrown previously. Current state: BLOCK
at org.apache.parquet.hadoop.ParquetFileWriter$STATE.error(ParquetFileWriter.java:146)
at org.apache.parquet.hadoop.ParquetFileWriter$STATE.startBlock(ParquetFileWriter.java:138)
at org.apache.parquet.hadoop.ParquetFileWriter.startBlock(ParquetFileWriter.java:195)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:153)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.abortTask$1(WriterContainer.scala:294)
at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:271)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/03/02 04:31:32 ERROR TaskSetManager: Task 9 in stage 1.0 failed 4 times; aborting job
18/03/02 04:31:32 ERROR InsertIntoHadoopFsRelation: Aborting job.
**Note: The size of the source is abot 6GB. Hence, no persist action is planned **

Datastax: Block not found error from DSEFS

Spark streaming job running in DSE using DSEFS for check-pointing directory. I see this error in debug log file. How to resolve this error?
ERROR [dsefs-netty-worker-5] 2017-12-01 05:23:02,679 DSE-FS RestServerHandler.scala:126 - [id: 0x9964e082, /<>:58874 :> 0.0.0.0/0.0.0.0:5598] Streaming data to remote end failed.
java.io.IOException: Block not found a3859f30-aa23-11e7-80b9-4b8bdaf197cd
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:706) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:703) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.executeInSameThread(SameThreadExecutionContext.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.execute(SameThreadExecutionContext.scala:33) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.exec.SerialExecutionContextProvider$$anon$5$$anon$2.execute(SerialExecutionContextProvider.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) [scala-library-2.10.6.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) ~[scala-library-2.10.6.jar:na]
at scala.concurrent.Promise$class.complete(Promise.scala:55) ~[scala-library-2.10.6.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) ~[scala-library-2.10.6.jar:na]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
This error means DSEFS server failed to find metadata of the data block in the dsefs.blocks Cassandra table. The ids of the file blocks are stored in the dsefs.block_offsets table and they reference blocks stored in dsefs.blocks. If a row exists in dsefs.block_offsets and points to the block id that is absent in dsefs.blocks, you get this error when reading the file.
This error should not happen under normal circumstances and it means the filesystem metadata somehow got into inconsistent state. This may be a bug in the DSEFS implementation, a result of a data loss caused by setting up dsefs keyspace with insufficient replication factor or a result of a write operation that did not finish successfully and was applied only partially.
Please make sure you set dsefs keyspace RF to at least 3 and run nodetool repair to avoid accidental data loss or unavailability of some DSEFS metadata.
If this doesn't help, please contact me directly or through DataStax technical support and provide more details, including logs from the time before the error and more context on what the job was doing when the failure occurred.

Unable to create a VIEW in Apache Ignite

I am trying to create a view on Apache Ignite using the following syntax:
emplCache.query(new SqlFieldsQuery(
"CREATE VIEW EmployeeCopy AS (SELECT * FROM Employee);")).getAll();
(Assume that table Employee is available and has data in it).
When this line gets executed, I get the following exception:
javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: null
at org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1823)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:795)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.query(IgniteCacheProxy.java:765)
at com.demo.ignite.test1.EmployeeQuery2.createCopyTable(EmployeeQuery2.java:71)
at com.demo.ignite.test1.EmployeeQuery2.main(EmployeeQuery2.java:55)
Caused by: class org.apache.ignite.IgniteCheckedException: null
at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2316)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1820)
... 4 more
Caused by: java.lang.NullPointerException
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1343)
at org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1815)
at org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1813)
at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2293)
... 5 more
I see that the variable "twoStepQry" is null at line 1343 of class IgniteH2Indexing.java. I am not able to understand if I have missed something.
I am using apache-ignite-2.1.0.
Also if I create a VIEW, how does it work internally? Does it lock on those entries in the cache, or does it copy it to some other cache?
Ignite does not support CREATE VIEW for now.
I have created a ticket for this: https://issues.apache.org/jira/browse/IGNITE-5951
List of supported DDL statements can be found at https://apacheignite.readme.io/docs/distributed-ddl

HSQLDB throws Asset failed exception and file io error on db.script.new file during Checkpoint

Our application is a Java based desktop application which will download the binary data from the source, parses it and add it to HSQLDB database. When downloading from the sources individually, application works perfectly. But when doing the same from multiple sources simultaneously with each source in an individual thread, I am getting an error of
java.sql.SQLException: Assert failed: java.lang.ArrayIndexOutOfBoundsException: 23 in statement [CHECKPOINT]
at org.hsqldb.jdbc.Util.throwError(Unknown Source)
at org.hsqldb.jdbc.jdbcPreparedStatement.execute(Unknown Source)
or sometimes,
java.sql.SQLException: Assert failed: java.lang.ArrayIndexOutOfBoundsException: 1016 in statement [CHECKPOINT]
followed by
java.sql.SQLException: File input/output error: C:\ProgramData\test\data\database\db.script.new in statement [CHECKPOINT]
at org.hsqldb.jdbc.Util.throwError(Unknown Source)
at org.hsqldb.jdbc.jdbcPreparedStatement.execute(Unknown Source)
Java: 1.8;
HSQL version: 1.8.10
We are not in the position to migrate the HSQLDB to latest version because of various reasons.
HSQL Properties:
hsqldb.script_format=0
runtime.gc_interval=0
sql.enforce_strict_size=false
hsqldb.cache_size_scale=8
readonly=false
hsqldb.nio_data_file=true
hsqldb.cache_scale=14
version=1.8.0
hsqldb.default_table_type=memory
hsqldb.cache_file_scale=1
hsqldb.log_size=200
modified=yes
hsqldb.cache_version=1.7.0
hsqldb.original_version=1.8.0
hsqldb.compatible_version=1.8.0
Any help or hint will be appreciated.
This is an 7 year old version which is not ideal for multi-threaded usage.
The simple solution is to perform the database updates with a single thread. You can retrofit your multi-threaded application with a synchronized block over a singleton object around the code that performs the database update.