NServiceBus with RavenDB - Failed to fetch timeouts from the timeout storage - ravendb

We are having a similar problem as the one listed here: timeout index not present on ravendb
Information in addition to the info on that question:
Everything (service and raven) is running on the same machine
NServiceBus 4.6.6
RavenDB 2.0.3.2375
Start up log is:
2015-03-19 15:21:37,644 [1] INFO NServiceBus.Configure [(null)] - Invocation of NServiceBus.IWantToRunBeforeConfiguration completed in 0.14 s
2015-03-19 15:21:37,801 [1] INFO NServiceBus.Configure [(null)] - Invocation of NServiceBus.Config.INeedInitialization completed in 0.00 s
2015-03-19 15:21:37,832 [1] INFO NServiceBus.Licensing.LicenseManager [(null)] - Expires on 12/31/9999 00:00:00
2015-03-19 15:21:37,863 [1] INFO NServiceBus.Configure [(null)] - Invocation of NServiceBus.INeedInitialization completed in 0.06 s
2015-03-19 15:21:38,004 [1] INFO NServiceBus.Configure [(null)] - Invocation of NServiceBus.IWantToRunBeforeConfigurationIsFinalized completed in 0.13 s
2015-03-19 15:21:38,082 [1] INFO NServiceBus.Features.FeatureInitializer [(null)] - Features:
Audit [4.6.6] - Enabled
AutoSubscribe [4.6.6] - Enabled
BinarySerialization [4.6.6] - Controlled by category Serializers
BsonSerialization [4.6.6] - Controlled by category Serializers
JsonSerialization [4.6.6] - Controlled by category Serializers
XmlSerialization [4.6.6] - Controlled by category Serializers
MsmqTransport [4.6.6] - Enabled
Gateway [4.6.6] - Disabled
TimeoutManager [4.6.6] - Enabled
Sagas [4.6.6] - Disabled
SecondLevelRetries [4.6.6] - Enabled
StorageDrivenPublisher [4.6.6] - Disabled
MessageDrivenSubscriptions [4.6.6] - Enabled
2015-03-19 15:21:38,098 [1] INFO NServiceBus.Features.FeatureInitializer [(null)] - Feature categories:
- Serializers
* BinarySerialization - Disabled
* BsonSerialization - Disabled
* JsonSerialization - Disabled
* XmlSerialization - Enabled
2015-03-19 15:21:38,129 [1] INFO NServiceBus.Unicast.Config.FinalizeUnicastBusConfiguration [(null)] - Number of messages found: 2
2015-03-19 15:21:38,129 [1] INFO NServiceBus.Configure [(null)] - Invocation of NServiceBus.Config.IFinalizeConfiguration completed in 0.13 s
2015-03-19 15:21:39,223 [1] INFO NServiceBus.ConfigureRavenPersistence [(null)] - Connection to RavenDB at http://localhost:8080 verified. Detected version: Product version: 2.0.3 / 5a4b7ea, Build version: 2375
2015-03-19 15:21:39,238 [1] INFO NServiceBus.Unicast.Transport.Monitoring.ReceivePerformanceDiagnostics [(null)] - NServiceBus performance counter for # of msgs successfully processed / sec is not set up correctly, no statistics will be emitted for the FooBar.Api queue. Execute the Install-NServiceBusPerformanceCounters cmdlet to create the counter.
2015-03-19 15:21:39,285 [18] INFO NServiceBus.Unicast.Transport.Monitoring.ReceivePerformanceDiagnostics [(null)] - NServiceBus performance counter for # of msgs successfully processed / sec is not set up correctly, no statistics will be emitted for the FooBar.Api.Retries queue. Execute the Install-NServiceBusPerformanceCounters cmdlet to create the counter.
2015-03-19 15:21:39,285 [16] INFO NServiceBus.Unicast.Transport.Monitoring.ReceivePerformanceDiagnostics [(null)] - NServiceBus performance counter for # of msgs successfully processed / sec is not set up correctly, no statistics will be emitted for the FooBar.Api.TimeoutsDispatcher queue. Execute the Install-NServiceBusPerformanceCounters cmdlet to create the counter.
2015-03-19 15:21:39,285 [1] INFO NServiceBus.Unicast.Transport.Monitoring.ReceivePerformanceDiagnostics [(null)] - NServiceBus performance counter for # of msgs successfully processed / sec is not set up correctly, no statistics will be emitted for the FooBar.Api.Timeouts queue. Execute the Install-NServiceBusPerformanceCounters cmdlet to create the counter.
2015-03-19 15:21:39,285 [18] INFO NServiceBus.Satellites.SatelliteLauncher [(null)] - Started 3/3 NServiceBus.SecondLevelRetries.SecondLevelRetriesProcessor, NServiceBus.Core, Version=4.6.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite
2015-03-19 15:21:39,285 [1] INFO NServiceBus.Satellites.SatelliteLauncher [(null)] - Started 1/3 NServiceBus.Timeout.Hosting.Windows.TimeoutMessageProcessor, NServiceBus.Core, Version=4.6.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite
2015-03-19 15:21:39,285 [16] INFO NServiceBus.Satellites.SatelliteLauncher [(null)] - Started 2/3 NServiceBus.Timeout.Hosting.Windows.TimeoutDispatcherProcessor, NServiceBus.Core, Version=4.6.0.0, Culture=neutral, PublicKeyToken=9fc386479f8a226c satellite
2015-03-19 15:21:39,535 [21] WARN NServiceBus.Timeout.Hosting.Windows.TimeoutPersisterReceiver [(null)] - Failed to fetch timeouts from the timeout storage
2015-03-19 15:21:39,535 [21] INFO NServiceBus.CircuitBreakers.RepeatedFailuresOverTimeCircuitBreaker [(null)] - The circuit breaker for TimeoutStorageConnectivity is now in the armed state
2015-03-19 15:21:40,551 [16] WARN NServiceBus.Timeout.Hosting.Windows.TimeoutPersisterReceiver [(null)] - Failed to fetch timeouts from the timeout storage
Any ideas on what to change would be greatly appreciated

It turns out that the API had NServiceBus.Host installed via nuget. For a send only end point this is not necessary. Removing the nuget package and reconfiguring slightly fixed the issue.
Now we only have the NServiceBus and NServiceBus.Core nuget packages installed.
And the config is now:
protected void Application_Start(object sender, EventArgs e)
{
log4net.Config.XmlConfigurator.Configure();
Configure.Serialization.Xml();
Configure.Transactions.Disable();
Bus = Configure.With()
.Log4Net()
.DefaultBuilder()
.UseTransport<Msmq>()
.PurgeOnStartup(false)
.UnicastBus()
.ImpersonateSender(false)
.SendOnly();
}
The log messages have now stopped as the API is no longer trying to create a bus to receive messages on.

Related

server.HiveServer2: Error starting priviledge synchonizer

Hive version 3.1.2
Hadoop components(hdfs/yarn/historyjob) with kerberos authentication.
hive kerberos config:
hive.server2.authentication=KERBEROS
hive.server2.authentication.kerberos.principal=hiveserver2/_HOST#BDP.COM
hive.server2.authentication.kerberos.keytab=/etc/kerberos/hadoop/hiveserver2.bdp-05.keytab
hive.metastore.sasl.enabled=true
hive.metastore.kerberos.keytab.file=/etc/kerberos/hadoop/metastore.bdp-05.keytab
hive.metastore.kerberos.principal=metastore/_HOST#BDP.COM
First, start the Metastore:
./bin/hive --service metastore > /dev/null &
Nothing unnormal in the log.
Then start hiveserver2 :
./bin/hive --service hiveserver2 > /dev/null &
Here is the start logs :
2020-12-30T11:28:48,746 INFO [main] server.HiveServer2: Starting HiveServer2
2020-12-30T11:28:49,168 INFO [main] security.UserGroupInformation: Login successful for user hiveserver2/bigdata-server-05#BDP.COM using keytab file /etc/kerberos/hadoop/hiveserver2.bdp-05.keytab
2020-12-30T11:28:49,171 INFO [main] cli.CLIService: SPNego httpUGI not created, spNegoPrincipal: , ketabFile:
2020-12-30T11:28:49,187 INFO [main] SessionState: Hive Session ID = 0754b9bc-f2f9-4d4c-ab95-a7359764bc49
2020-12-30T11:28:50,052 INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/hiveserver2/0754b9bc-f2f9-4d4c-ab95-a7359764bc49
2020-12-30T11:28:50,066 INFO [main] session.SessionState: Created local directory: /tmp/hive/0754b9bc-f2f9-4d4c-ab95-a7359764bc49
2020-12-30T11:28:50,069 INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/hiveserver2/0754b9bc-f2f9-4d4c-ab95-a7359764bc49/_tmp_space.db
2020-12-30T11:28:50,600 INFO [main] metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://bigdata-server-05:9083
2020-12-30T11:28:50,605 INFO [main] metastore.HiveMetaStoreClient: HMSC::open(): Could not find delegation token. Creating KERBEROS-based thrift connection.
2020-12-30T11:28:50,653 INFO [main] metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
2020-12-30T11:28:50,653 INFO [main] metastore.HiveMetaStoreClient: Connected to metastore.
2020-12-30T11:28:50,654 INFO [main] metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=hiveserver2/bigdata-server-05#BDP.COM (auth:KERBEROS) retries=1 delay=1 lifetime=0
2020-12-30T11:28:50,781 INFO [main] service.CompositeService: Operation log root directory is created: /tmp/hive/operation_logs
2020-12-30T11:28:50,783 INFO [main] service.CompositeService: HiveServer2: Background operation thread pool size: 100
2020-12-30T11:28:50,783 INFO [main] service.CompositeService: HiveServer2: Background operation thread wait queue size: 100
2020-12-30T11:28:50,783 INFO [main] service.CompositeService: HiveServer2: Background operation thread keepalive time: 10 seconds
2020-12-30T11:28:50,784 INFO [main] service.CompositeService: Connections limit are user: 0 ipaddress: 0 user-ipaddress: 0
2020-12-30T11:28:50,787 INFO [main] service.AbstractService: Service:OperationManager is inited.
2020-12-30T11:28:50,787 INFO [main] service.AbstractService: Service:SessionManager is inited.
2020-12-30T11:28:50,787 INFO [main] service.AbstractService: Service:CLIService is inited.
2020-12-30T11:28:50,787 INFO [main] service.AbstractService: Service:ThriftBinaryCLIService is inited.
2020-12-30T11:28:50,787 INFO [main] service.AbstractService: Service:HiveServer2 is inited.
2020-12-30T11:28:50,835 INFO [pool-7-thread-1] SessionState: Hive Session ID = 693b0399-aabd-42b5-a4b2-a4cebbd325d4
2020-12-30T11:28:50,838 INFO [main] results.QueryResultsCache: Initializing query results cache at /tmp/hive/_resultscache_
2020-12-30T11:28:50,844 INFO [pool-7-thread-1] session.SessionState: Created HDFS directory: /tmp/hive/hiveserver2/693b0399-aabd-42b5-a4b2-a4cebbd325d4
2020-12-30T11:28:50,844 INFO [main] results.QueryResultsCache: Query results cache: cacheDirectory /tmp/hive/_resultscache_/results-23ae949b-6894-4a17-8141-0eacf5fe5a63, maxCacheSize 2147483648, maxEntrySize 10485760, maxEntryLifetime 3600000
2020-12-30T11:28:50,846 INFO [pool-7-thread-1] session.SessionState: Created local directory: /tmp/hive/693b0399-aabd-42b5-a4b2-a4cebbd325d4
2020-12-30T11:28:50,849 INFO [pool-7-thread-1] session.SessionState: Created HDFS directory: /tmp/hive/hiveserver2/693b0399-aabd-42b5-a4b2-a4cebbd325d4/_tmp_space.db
2020-12-30T11:28:50,861 INFO [main] events.NotificationEventPoll: Initializing lastCheckedEventId to 0
2020-12-30T11:28:50,862 INFO [main] server.HiveServer2: Starting Web UI on port 10002
2020-12-30T11:28:50,885 INFO [pool-7-thread-1] metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized
2020-12-30T11:28:50,894 INFO [main] util.log: Logging initialized #4380ms
2020-12-30T11:28:51,009 INFO [main] service.AbstractService: Service:OperationManager is started.
2020-12-30T11:28:51,009 INFO [main] service.AbstractService: Service:SessionManager is started.
2020-12-30T11:28:51,010 INFO [main] service.AbstractService: Service:CLIService is started.
2020-12-30T11:28:51,010 INFO [main] service.AbstractService: Service:ThriftBinaryCLIService is started.
2020-12-30T11:28:51,013 WARN [main] security.HadoopThriftAuthBridge: Client-facing principal not set. Using server-side setting: hiveserver2/_HOST#BDP.COM
2020-12-30T11:28:51,013 INFO [main] security.HadoopThriftAuthBridge: Logging in via CLIENT based principal
2020-12-30T11:28:51,019 INFO [main] security.UserGroupInformation: Login successful for user hiveserver2/bigdata-server-05#BDP.COM using keytab file /etc/kerberos/hadoop/hiveserver2.bdp-05.keytab
2020-12-30T11:28:51,019 INFO [main] security.HadoopThriftAuthBridge: Logging in via SERVER based principal
2020-12-30T11:28:51,023 INFO [main] security.UserGroupInformation: Login successful for user hiveserver2/bigdata-server-05#BDP.COM using keytab file /etc/kerberos/hadoop/hiveserver2.bdp-05.keytab
2020-12-30T11:28:51,030 INFO [main] delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2020-12-30T11:28:51,033 INFO [main] security.TokenStoreDelegationTokenSecretManager: New master key with key id=0
2020-12-30T11:28:51,034 INFO [Thread[Thread-8,5,main]] security.TokenStoreDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2020-12-30T11:28:51,035 INFO [Thread[Thread-8,5,main]] delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2020-12-30T11:28:51,035 INFO [Thread[Thread-8,5,main]] security.TokenStoreDelegationTokenSecretManager: New master key with key id=1
2020-12-30T11:28:51,040 INFO [main] thrift.ThriftCLIService: Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
2020-12-30T11:28:51,040 INFO [main] service.AbstractService: Service:HiveServer2 is started.
2020-12-30T11:28:51,041 ERROR [main] server.HiveServer2: Error starting priviledge synchonizer:
java.lang.NullPointerException: null
at org.apache.hive.service.server.HiveServer2.startPrivilegeSynchonizer(HiveServer2.java:985) ~[hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:726) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1037) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:140) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1305) [hive-service-3.1.2.jar:3.1.2]
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1149) [hive-service-3.1.2.jar:3.1.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_271]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_271]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
at org.apache.hadoop.util.RunJar.run(RunJar.java:318) [hadoop-common-3.1.3.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:232) [hadoop-common-3.1.3.jar:?]
2020-12-30T11:28:51,044 INFO [main] server.HiveServer2: Shutting down HiveServer2
In my case, the hiveserver2-sit.xml was created by Apache Ranger when turning the ranger-hive-plugin on. Once I disable the ranger-hive-plugin, hiveserver2-sit.xml was edited by Ranger.
Here are the remaining configurations:
<configuration>
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.security.authorization.manager</name>
<value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>
<property>
<name>hive.security.authenticator.manager</name>
<value>org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator</value>
</property>
<property>
<name>hive.conf.restricted.list</name>
<value>hive.security.authorization.enabled,hive.security.authorization.manager,hive.security.authenticator.manager</value>
</property>
</configuration>
Start hiveServer2 will encounter the previous exception.
Remove hiveserver2-site.xml will work fine.
I don't know why? Somebody can explain?
is this still actual ? , if yes , check logs . You should see that it tries to connect to zookeeper , if not described it will try to connect to localhost:2181 , so either it must be there or zk quorum servers should be described.

Flink streaming file sink cannot recover from savepoint

We have a flink streaming job which read data from kinesis and sink it to S3 in parquet format.
Sink:
val parquetEventsWriterSink: StreamingFileSink[GenericRecord] = StreamingFileSink
.forBulkFormat(new Path(s"s3a://some-bucket-name/"), ParquetAvroWriters.forGenericRecord(AvroSchema[ParquetEvent]))
.withBucketCheckInterval(1000)
.withBucketAssigner(new DateTimeBucketAssigner("yyyyMMdd"))
.build()
when i want do flink app update, i do following (stop flink app with savepoint and rerun from savepoint)
/usr/bin/flink stop ${FLINK_APP_ID} -p s3a://bucket-to-save/savepoint -d -yid ${YARN_APP_ID}
/usr/bin/flink run -m yarn-cluster -s s3a://bucket-to-save/savepoint/savepoint-588ff0-7a7febf4f80a --allowNonRestoredState /path/to/flink.jar
Shutdown flink app log:
...
2020-07-02 09:30:37,448 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 32 # 1593682237275 for job af23c08acc92229281cd28a12f8c42da.
2020-07-02 09:30:41,129 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 32 for job af23c08acc92229281cd28a12f8c42da (145557 bytes in 2408 ms).
2020-07-02 09:30:41,294 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Map -> Timestamps/Watermarks -> (....) (1/1) (67dc5d50d1713166b4e06d59c044806d) switched from RUNNING to FINISHED.
2020-07-02 09:30:41,303 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(30000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) (1/1) (7820bd149b8a2bf2c18a7dbef24dea2a) switched from RUNNING to FINISHED.
2020-07-02 09:30:41,303 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - KeyedProcess (1/1) (7edae003a3d86bd0b0b6ccd6978d7225) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,331 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - KeyedProcess -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (e5ab39b6b0841aa5f90dfddf7035a014) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,331 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(30000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (54c9e5818615472a94bd4164748af8ab) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(120000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (45b06521f0dc628909f36e526281120e) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,430 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(4000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (4290ac65e1cfa537c6fc2eda6f82030f) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,431 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(30000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (2031b6da4375863db5b5496d359e4afe) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,466 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(SlidingEventTimeWindows(120000, 2000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (fb335e9cc26f3973e335fb5e52abc62c) switched from RUNNING to FINISHED.
2020-07-02 09:30:42,468 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job JOB_NAME (af23c08acc92229281cd28a12f8c42da) switched from state RUNNING to FINISHED.
2020-07-02 09:30:42,468 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Stopping checkpoint coordinator for job af23c08acc92229281cd28a12f8c42da.
2020-07-02 09:30:42,468 INFO org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore - Shutting down
2020-07-02 09:30:42,468 INFO org.apache.flink.runtime.checkpoint.CompletedCheckpoint - Checkpoint with ID 32 at 's3a://bucket-savepoint/savepoint-af23c0-91247f37c4e0' not discarded.
2020-07-02 09:30:42,479 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Job af23c08acc92229281cd28a12f8c42da reached globally terminal state FINISHED.
2020-07-02 09:30:42,496 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job JOB_NAME(af23c08acc92229281cd28a12f8c42da).
2020-07-02 09:30:42,499 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Suspending SlotPool.
2020-07-02 09:30:42,500 INFO org.apache.flink.runtime.jobmaster.JobMaster - Close ResourceManager connection b1b71407ad4d004faf36b9ca5ff59897: JobManager is shutting down..
2020-07-02 09:30:42,500 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Stopping SlotPool.
2020-07-02 09:30:42,500 INFO org.apache.flink.yarn.YarnResourceManager - Disconnect job manager 00000000000000000000000000000000#akka.tcp://flink#ip-10-60-37-55.ec2.internal:39455/user/jobmanager_0 for job af23c08acc92229281cd28a12f8c42da from the resource manager.
2020-07-02 09:30:42,502 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManagerRunner already shutdown.
Resume job from savepoint:
...
2020-07-02 09:32:59,878 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job c9b11c5904dd3182c68a967b1af8623c from savepoint s3a://bucket-savepoint/sp-s/savepoint-af23c0-91247f37c4e0 (allowing non restored state)
2020-07-02 09:33:00,225 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Reset the checkpoint ID of job c9b11c5904dd3182c68a967b1af8623c to 33.
2020-07-02 09:33:00,225 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Restoring job c9b11c5904dd3182c68a967b1af8623c from latest valid checkpoint: Checkpoint 32 # 0 for c9b11c5904dd3182c68a967b1af8623c.
2020-07-02 09:33:00,232 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - No master state to restore
2020-07-02 09:33:00,234 INFO org.apache.flink.runtime.jobmaster.JobManagerRunner - JobManager runner for job JOB_NAME (c9b11c5904dd3182c68a967b1af8623c) was granted leadership with session id 00000000-0000-0000-0000-000000000000 at akka.tcp://flink#ip-10-60-37-55.ec2.internal:45829/user/jobmanager_0.
2020-07-02 09:33:00,236 INFO org.apache.flink.runtime.jobmaster.JobMaster - Starting execution of job JOB_NAME (c9b11c5904dd3182c68a967b1af8623c) under job master id 00000000000000000000000000000000.
2020-07-02 09:33:00,237 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job JOB_NAME (c9b11c5904dd3182c68a967b1af8623c) switched from state CREATED to RUNNING.
2020-07-02 09:33:00,245 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Map -> Timestamps/Watermarks -> (Map -> files to s3 -> Sink: Unnamed, Filter, Filter, Filter, Filter, Filter) (1/1) (d9b4be9984c2ab3f468d7f286f8106cf) switched from CREATED to SCHEDULED.
...
2020-07-02 09:33:15,286 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingEventTimeWindows(4000), EventTimeTrigger, ScalaProcessWindowFunctionWrapper) -> (Map -> Sink: Unnamed, Map -> Map -> Sink: Unnamed) (1/1) (79f5b69107d200d78bb8a32dfa32e126) switched from RUNNING to FAILED.
java.io.IOException: Inconsistent result for object 20200702/part-0-29: conflicting lengths. Recovered committer for upload 4jAOkQuDaQiMt53wlO3z8bzO6KtwkaDXXZIA6sTUE1f5pmvZs8EDPlRvu6VYI4y34sK5Zwr4p8fa8EglWvVvC_8z5sn_Dk6L6b5YJnWNIdThRHQ4qfMUK3dj1Eoi7cYweq0J42PcRrK7VOJLJmo8hg-- **indicates 12810 bytes, present object is 5825 bytes**
at org.apache.flink.fs.s3.common.writer.S3Committer.commitAfterRecovery(S3Committer.java:98)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.commitRecoveredPendingFiles(Bucket.java:156)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.<init>(Bucket.java:128)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:399)
at org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149)
at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:334)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Completing multipart commit on 20200702/part-0-29: org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchUpload; Request ID: _hidden_; S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=), S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=:NoSuchUpload
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:225)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267)
at org.apache.flink.fs.s3hadoop.HadoopS3AccessHelper.commitMultiPartUpload(HadoopS3AccessHelper.java:84)
at org.apache.flink.fs.s3.common.writer.S3Committer.commitAfterRecovery(S3Committer.java:85)
... 17 more
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchUpload; Request ID:; S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=), S3 Extended Request ID: nBiv1K/zDVx/FuA+bEPoiI1idEj3UQQzSYOz3V6uesSrV3fXtcLkkYGuexCL/UJVY/1xODua8n8=
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1304)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3065)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$finalizeMultipartUpload$1(WriteOperationHelper.java:229)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
... 24 more
2020-07-02 09:33:15,287 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job JOB_NAME (c9b11c5904dd3182c68a967b1af8623c) switched from state RUNNING to FAILING.
java.io.IOException: Inconsistent result for object 20200702/part-0-29: conflicting lengths. Recovered committer for upload 4jAOkQuDaQiMt53wlO3z8bzO6KtwkaDXXZIA6sTUE1f5pmvZs8EDPlRvu6VYI4y34sK5Zwr4p8fa8EglWvVvC_8z5sn_Dk6L6b5YJnWNIdThRHQ4qfMUK3dj1Eoi7cYweq0J42PcRrK7VOJLJmo8hg-- indicates 12810 bytes, present object is 5825 bytes
at org.apache.flink.fs.s3.common.writer.S3Committer.commitAfterRecovery(S3Committer.java:98)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.commitRecoveredPendingFiles(Bucket.java:156)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.<init>(Bucket.java:128)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:399)
at org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149)
at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:334)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: Completing multipart commit on 20200702/part-0-29: org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchUpload; Request ID: 26483B8A3458BB00; S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=), S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=:NoSuchUpload
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:225)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:111)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.lambda$retry$3(Invoker.java:260)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:256)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.finalizeMultipartUpload(WriteOperationHelper.java:222)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.WriteOperationHelper.completeMPUwithRetries(WriteOperationHelper.java:267)
at org.apache.flink.fs.s3hadoop.HadoopS3AccessHelper.commitMultiPartUpload(HadoopS3AccessHelper.java:84)
at org.apache.flink.fs.s3.common.writer.S3Committer.commitAfterRecovery(S3Committer.java:85)
... 17 more
Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchUpload; Request ID: 26483B8A3458BB00; S3 Extended Request ID: nBiv1K/zDVx/FuA+/UJVY/1xODua8n8=), S3 Extended Request ID: nBiv1K/zDVx/FuA+bEPoiI1idEj3UQQzSYOz3V6uesSrV3fXtcLkkYGuexCL/UJVY/1xODua8n8=
at org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1639)
at
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
... 24 more
2020-07-02 09:33:15,316 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Custom Source -> Map -> Timestamps/Watermarks -> (Map -> packets to s3 -> Sink: Unnamed, Filter, Filter, Filter, Filter, Filter) (1/1) (d9b4be9984c2ab3f468d7f286f8106cf) switched from RUNNING to CANCELING.
Flink 1.9.1 (EMR emr-5.29.0)
Use
"org.apache.flink" %% "flink-parquet" % "1.10.0",
"org.apache.parquet" % "parquet-avro" % "1.10.0",
If you restart with the same Flink version and the same job (unmodified) it works or it throws the same exception? Also, can it be that you have configured any pruning policy for your pending Multi-part Uploads that cancels them before they are committed?

Yarn report flink job as FINISHED and SUCCEED when flink job failure

I am running flink job on yarn, we use "fink run" in command line to submit our job to yarn, one day we had an exception on flink job, as we didn't enable the flink restart strategy so it simply failed, but eventually we found that the job status was "SUCCEED" from the yarn application list, which we expect to be "FAILED".
Flink CLI log:
06/12/2018 03:13:37 FlatMap (getTagStorageMapper.flatMap)(23/32) switched to CANCELED
06/12/2018 03:13:37 GroupReduce (ResultReducer.reduceGroup)(31/32) switched to CANCELED
06/12/2018 03:13:37 FlatMap (SubClassEDFJoinMapper.flatMap)(29/32) switched to CANCELED
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (SubClassInventoryMapper.flatMap)(27/32) switched to CANCELED
06/12/2018 03:13:37 GroupReduce (OutputReducer.reduceGroup)(28/32) switched to CANCELED
06/12/2018 03:13:37 CHAIN DataSource (SubClassInventory.AvroInputFormat.createInput) -> FlatMap (BIMBQMInstrumentMapper.flatMap)(27/32) switched to CANCELED
06/12/2018 03:13:37 GroupReduce (BIMBQMGovCorpReduce.reduceGroup)(30/32) switched to CANCELED
06/12/2018 03:13:37 FlatMap (BIMBQMEVMJoinMapper.flatMap)(32/32) switched to CANCELED
06/12/2018 03:13:37 Job execution switched to status FAILED.
No JobSubmissionResult returned, please make sure you called ExecutionEnvironment.execute()
2018-06-12 03:13:37,625 INFO org.apache.flink.yarn.YarnClusterClient - Sending shutdown request to the Application Master
2018-06-12 03:13:37,625 INFO org.apache.flink.yarn.YarnClusterClient - Start application client.
2018-06-12 03:13:37,630 INFO org.apache.flink.yarn.ApplicationClient - Notification about new leader address akka.tcp://flink#ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,632 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager.
2018-06-12 03:13:37,633 INFO org.apache.flink.yarn.ApplicationClient - Received address of new leader akka.tcp://flink#ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager with session ID 00000000-0000-0000-0000-000000000000.
2018-06-12 03:13:37,634 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager null.
2018-06-12 03:13:37,635 INFO org.apache.flink.yarn.ApplicationClient - Trying to register at JobManager akka.tcp://flink#ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager.
2018-06-12 03:13:37,688 INFO org.apache.flink.yarn.ApplicationClient - Successfully registered at the ResourceManager using JobManager Actor[akka.tcp://flink#ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345]
2018-06-12 03:13:38,648 INFO org.apache.flink.yarn.ApplicationClient - Sending StopCluster request to JobManager.
2018-06-12 03:13:39,480 INFO org.apache.flink.yarn.YarnClusterClient - Application application_1528772982594_0001 finished with state FINISHED and final state SUCCEEDED at 1528773218662
2018-06-12 03:13:39,480 INFO org.apache.flink.yarn.YarnClusterClient - YARN Client is shutting down
2018-06-12 03:13:39,582 INFO org.apache.flink.yarn.ApplicationClient - Stopped Application client.
2018-06-12 03:13:39,583 INFO org.apache.flink.yarn.ApplicationClient - Disconnect from JobManager Actor[akka.tcp://flink#ip-10-97-46-149.tr-fr-nonprod.aws-int.thomsonreuters.com:45663/user/jobmanager#182802345].
Flink job manager Log:
FlatMap (BIMBQMEVMJoinMapper.flatMap) (32/32) (67a002e07fe799c1624a471340c8cf9d) switched from CANCELING to CANCELED.
Try to restart or fail the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) if no longer possible.
Requesting new TaskManager container with 8192 megabytes memory. Pending requests: 1
Job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) switched from state FAILING to FAILED.
Could not restart the job Flink Java Job at Tue Jun 12 03:13:17 UTC 2018 (1086cedb3617feeee8aace29a7fc6bd0) because the restart strategy prevented it.
Unregistered task manager ip-10-97-44-186/10.97.44.186. Number of registered task managers 31. Number of available slots 31
Stopping JobManager with final application status SUCCEEDED and diagnostics: Flink YARN Client requested shutdown
Shutting down cluster with status SUCCEEDED : Flink YARN Client requested shutdown
Unregistering application from the YARN Resource Manager
Waiting for application to be successfully unregistered.
Can anybody help me understand why does yarn say my flink job was "SUCCEED"?
The reported application status in Yarn does not reflect the status of the executed job but the status of the Flink cluster since this is the Yarn application. Thus, the final status of the Yarn application only depends on whether the Flink cluster finished properly or not. Differently said, if a job fails, then it does not necessarily mean that the Flink cluster failed. These are two different things.

not able to enable the timeout manager in a distributor with Production profile in a Master node

After deploying my service as a Master with production profile I don't see the timeout manager to get enabled in the initialiazation. I have tried everything with no success.
The way I install the service is:
NServiceBus.Host.exe /install /serviceName:BusinessServices
/displayName:BusinessServices /description:BusinessServices
/userName:machinename\Administrator /password:pass
NServiceBus.Master nservicebus.production
The endpointconfig is:
public class EndpointConfig : IConfigureThisEndpoint,
AsA_Server,AsA_Publisher {
}
And the configuration is:
<MasterNodeConfig Node="machinename"/>
<TransportConfig MaximumConcurrencyLevel="50" />
RavenDB is installed in that machine and works well for saga, anyway in the initialization it says:
2014-05-19 17:02:55,428 [1] INFO NServiceBus.Configure [(null)]
<(null)> - Invocation of NServiceBus.IWantToRunBeforeConfiguration
completed in 0.14 s 2014-05-19 17:02:55,785 [1] INFO
NServiceBus.Configure [(null)] <(null)> - Invocation of
NServiceBus.Config.INeedInitialization completed in 0.00 s 2014-05-19
17:02:56,296 [1] INFO NServiceBus.Licensing.LicenseManager [(null)]
<(null)> - Expires on 07/03/2014 00:00:00 2014-05-19 17:02:56,576 [1]
INFO NServiceBus.Configure [(null)] <(null)> - Invocation of
NServiceBus.INeedInitialization completed in 0.79 s 2014-05-19
17:02:56,724 [1] INFO
NServiceBus.Distributor.T5.BusinessServices.High [(null)] <(null)> -
Endpoint configured to host the distributor, applicative input queue
re routed to T5.BusinessServices.High.worker#WIN-74CD8F6BJ66
2014-05-19 17:02:57,118 [1] INFO NServiceBus.Configure [(null)]
<(null)> - Invocation of
NServiceBus.IWantToRunBeforeConfigurationIsFinalized completed in 0.54
s 2014-05-19 17:02:57,356 [1] INFO NServiceBus.Features.Sagas
[(null)] <(null)> - Sagas found in scanned types, saga persister
enabled 2014-05-19 17:02:57,371 [1] INFO
NServiceBus.Features.FeatureInitializer [(null)] <(null)> - Features:
Audit [4.6.1] - Enabled AutoSubscribe [4.6.1] - Enabled
BinarySerialization [4.6.1] - Controlled by category Serializers
BsonSerialization [4.6.1] - Controlled by category Serializers
JsonSerialization [4.6.1] - Controlled by category Serializers
XmlSerialization [4.6.1] - Controlled by category Serializers
MsmqTransport [4.6.1] - Enabled Gateway [4.6.1] - Enabled
TimeoutManager [4.6.1] - Disabled Sagas [4.6.1] - Enabled
SecondLevelRetries [4.6.1] - Enabled StorageDrivenPublisher [4.6.1] -
Enabled MessageDrivenSubscriptions [4.6.1] - Enabled Heartbeats
[1.0.0] - Enabled SagaAudit [1.0.0] - Enabled
Do I have to do anything more?
Thanks in advance
As Jon said in the comment ... if you are configuring a server as a distributor or master, you don't want the MasterConfig config section. Having that section will disable the timeout manager, as it expects to defer that logic to the configured master.

How can I trace the failure ot TSaslTransport (hive related)

I've been debugging a JDBC Connection error in hive, similar to what was asked here:
Hive JDBC getConnection does not return.
By turning on log4j properly, i finally got down to seeing this , before the getConnection() hangs. What is thrift waiting for ? If this is related to using the wrong thrift APIs, how can I determine versioning differences between client/server?
I have tried copying all libraries from my hive server onto my client app to test if it is some kind of minor thrift class versioning error, but that didnt solve the problem, my JDBC connection still hangs.
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport#219ba640
0 [main] DEBUG org.apache.thrift.transport.TSaslTransport - opening transport org.apache.thrift.transport.TSaslClientTransport#219ba640
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 14
3 [main] DEBUG org.apache.thrift.transport.TSaslClientTransport - Sending mechanism name PLAIN and initial response of length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status START and payload length 5
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Writing message with status COMPLETE and payload length 14
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Start message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Start message handled
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
5 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: Main negotiation loop complete
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: SASL Client receiving last message
6 [main] DEBUG org.apache.thrift.transport.TSaslTransport - CLIENT: SASL Client receiving last message