Having a Redis Timeout/Replica issue - redis

We are currently having a few issues with our program with Redis that happen and occur randomly. We do not understand what is going on.
The code is open source as well if anyone wants to tag in and help on these two repos:
https://github.com/pythonology/BeatTogether.Core
https://github.com/pythonology/BeatTogether.MasterServer/tree/feature/dedicated-servers (using the branch at the moment for dev purposes)
2020-12-10 03:22:17.829 +00:00 [ERR] An error occurred while handling message (MessageType='BroadcastServerStatusRequest').
StackExchange.Redis.RedisServerException: ERR Error running script (call to f_dce2ce9e149bc775618a47cd07eca24d80a33664): #user_script:4: #user_script: 4: -READONLY You can't write against a read only replica.
at BeatTogether.MasterServer.Data.Implementations.Repositories.ServerRepository.AddServer(Server server) in /app/BeatTogether.MasterServer.Data/Implementations/Repositories/ServerRepository.cs:line 80
at BeatTogether.MasterServer.Kernel.Implementations.UserService.BroadcastServerStatus(MasterServerSession session, BroadcastServerStatusRequest request) in /app/BeatTogether.MasterServer.Kernel/Implementations/UserService.cs:line 156
at BeatTogether.Core.Messaging.Implementations.BaseMessageHandler.<>c__DisplayClass6_0`2.<<Register>b__0>d.MoveNext()
--- End of stack trace from previous location --
and
2020-12-10 06:20:23.198 +00:00 [ERR] An error occurred while handling message (MessageType='BroadcastServerStatusRequest').
StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: EVAL, mc: 1/1/0, mgr: 10 of 10 available, clientName: master-server, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=3,Free=32764,Min=1024,Max=32767), v: 2.2.4.27433 at StackExchange.Redis.ConnectionMultiplexer.ThrowFailed[T](TaskCompletionSource`1 source, Exception unthrownException) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2760
--- End of stack trace from previous location ---
at BeatTogether.MasterServer.Data.Implementations.Repositories.ServerRepository.AddServer(Server server) in /app/BeatTogether.MasterServer.Data/Implementations/Repositories/ServerRepository.cs:line 80
at BeatTogether.MasterServer.Kernel.Implementations.UserService.BroadcastServerStatus(MasterServerSession session, BroadcastServerStatusRequest request) in /app/BeatTogether.MasterServer.Kernel/Implementations/UserService.cs:line 156
at BeatTogether.Core.Messaging.Implementations.BaseMessageHandler.<>c__DisplayClass6_0`2.<<Register>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at BeatTogether.Core.Messaging.Implementations.BaseMessageSource.<>c__DisplayClass15_0.<<Signal>b__0>d.MoveNext()
Our redis instance doesn't have slaves, only a master and its one instance on the same machine.
Any help is greatly appreciated. Thanks.

Related

How to find batch element in Websphere commerce error

When I am running buildindex in my Websphere application, I have this error in buildindex log:
[2021/05/10 15:41:57:590 GMT] I Data import pre-processing completed in 0.389 seconds for table TI_CAT_EXTENDED_41060.
[2021/05/10 15:41:57:591 GMT] I /opt/IBM/WebSphere/CommerceServer80/instances/auth/search/pre-processConfig/MC_41060/DB2/wc-dataimport-preprocess-catentry-metainf.xml
[2021/05/10 15:41:57:591 GMT] I
Table name: TI_X_CATENT_META_INF_410600
Fetch size: 500
Batch size: 500
[2021/05/10 15:41:58:048 GMT] I Error for batch element #415: DB2 SQL Error: SQLCODE=-302, SQLSTATE=22001, SQLERRMC=null, DRIVER=4.19.77
[2021/05/10 15:41:58:048 GMT] I SQL: SELECT CATENTRY_ID, TITLE, TITLE_KEYWORDS, SHORT_DESC, SHORT_DESC_KEYWORDS, LONG_DESC, LONG_DESC_KEYWORDS, LOCALE FROM X_CATENT_META_INF WHERE STORE_ID = 41006
[2021/05/10 15:41:58:087 GMT] I
The program exiting with exit code: 1.
Data import pre-processing was unsuccessful. An unrecoverable error has occurred.
[2021/05/10 15:41:58:091 GMT] E com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain:handleExecutionException Exception message: CWFDIH0002: An SQL exception was caught. The following error occurred: [jcc][t4][102][10040][4.19.77] Batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null., stack trace: com.ibm.commerce.foundation.dataimport.exception.DataImportSystemException: CWFDIH0002: An SQL exception was caught. The following error occurred: [jcc][t4][102][10040][4.19.77] Batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null.
at com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain.processDataConfig(DataImportPreProcessorMain.java:1515)
at com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain.execute(DataImportPreProcessorMain.java:1331)
at com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain.main(DataImportPreProcessorMain.java:534)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:56)
at java.lang.reflect.Method.invoke(Method.java:620)
at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:280)
Caused by: com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][4.19.77] Batch failure. The batch was submitted, but at least one exception occurred on an individual member of the batch.
Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null
at com.ibm.db2.jcc.am.b4.a(b4.java:475)
at com.ibm.db2.jcc.am.Agent.endBatchedReadChain(Agent.java:414)
at com.ibm.db2.jcc.am.ki.a(ki.java:5342)
at com.ibm.db2.jcc.am.ki.c(ki.java:4929)
at com.ibm.db2.jcc.am.ki.executeBatch(ki.java:3045)
at com.ibm.commerce.foundation.dataimport.preprocess.AbstractDataPreProcessor.populateTable(AbstractDataPreProcessor.java:373)
at com.ibm.commerce.foundation.dataimport.preprocess.StaticAttributeDataPreProcessor.process(StaticAttributeDataPreProcessor.java:461)
at com.ibm.commerce.foundation.dataimport.preprocess.DataImportPreProcessorMain.processDataConfig(DataImportPreProcessorMain.java:1482)
... 7 more
The exception seems to be clear, but I can't identify what is the element #415 in batch. Even the log doesn't helps, because it doesn't point to another more detailed log. Do you have any suggestion for find it?
Thanks to the comment of user #mao, I have followed this link
The failing table first must be identified. Enable more detailed tracing for di-preprocess:
Navigate to :
WC_installdir/instances/instance_name/xml/config/dataimport
and open the logging.properties file. Find all instances of INFO and
change it to FINEST. Optionally increase the size of the log file and
the number of historical log files while editing this file.
Thanks to this suggestion, I had re-run the buildindex process, and found that solr was wrongly grouping fields from original table, thus generating a too long field for the destination, and generating the error.

RavenDB build 35215 Concurrent merge failed warning

That warning is spamming the logs, it happens every few seconds. can someone tells me what does it mean, how is it affecting the raven server and how to fix it ?
Message
Concurrent merge failed
Exception
Lucene.Net.Index.MergePolicy+MergeException: Exception of type
'Lucene.Net.Index.MergePolicy+MergeException' was thrown. --->
Lucene.Net.Index.CorruptIndexException: docs out of order (0 <= 0 )
at Lucene.Net.Index.IndexWriter.HandleMergeException(Exception t,
OneMerge merge) at Lucene.Net.Index.IndexWriter.Merge(OneMerge
merge) at
Lucene.Net.Index.ConcurrentMergeScheduler.MergeThread.Run() --- End
of inner exception stack trace --- at
Lucene.Net.Index.ConcurrentMergeScheduler.HandleMergeException(Exception
exc) at
Raven.Database.Indexing.ErrorLoggingConcurrentMergeScheduler.HandleMergeException(Exception
exc) in
C:\Builds\RavenDB-Stable-3.5\Raven.Database\Indexing\ErrorLoggingConcurrentMergeScheduler.cs:line
15
Logged
3 hours ago (01/09/18, 11:57am)
Level
Warn
Logger
Raven.Database.Indexing.ErrorLoggingConcurrentMergeScheduler

NServiceBus 6 (6.0.0-beta0004) - The requested service 'NServiceBus.RecoverabilityExecutorFactory' has not been registered

Just running with a vanilla NServiceBus 6.0.0-beta0004
var endpointConfiguration = new EndpointConfiguration("endpoint");
endpointInstance = await Endpoint.Start(endpointConfiguration).ConfigureAwait(false);
This throws an exception:
Bus.ProgramService An unhandled error has occurred.
Autofac.Core.Registration.ComponentNotRegisteredException: The requested service 'NServiceBus.RecoverabilityExecutorFactory' has not been registered. To avoid this exception, either register a component to provide the service, check for service registration using IsRegistered(), or use the ResolveOptional() method to resolve an optional dependency.
at Autofac.ResolutionExtensions.ResolveService(IComponentContext context, Service service, IEnumerable`1 parameters)
at Autofac.ResolutionExtensions.Resolve(IComponentContext context, Type serviceType)
at NServiceBus.AutofacObjectBuilder.Build(Type typeToBuild) in D:\Code\GitHub\agupta-au\NServiceBus\src\NServiceBus.Core\ObjectBuilder\Autofac\AutofacObjectBuilder.cs:line 39
at NServiceBus.CommonObjectBuilder.Build[T]() in D:\Code\GitHub\agupta-au\NServiceBus\src\NServiceBus.Core\ObjectBuilder\Common\CommonObjectBuilder.cs:line 28
at NServiceBus.StartableEndpoint.CreateReceivers() in D:\Code\GitHub\agupta-au\NServiceBus\src\NServiceBus.Core\StartableEndpoint.cs:line 93
at NServiceBus.StartableEndpoint.<Start>d__1.MoveNext() in D:\Code\GitHub\agupta-au\NServiceBus\src\NServiceBus.Core\StartableEndpoint.cs:line 45
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at NServiceBus.Endpoint.<Start>d__1.MoveNext() in D:\Code\GitHub\agupta-au\NServiceBus\src\NServiceBus.Core\Endpoint.cs:line 28
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
What am I missing? My understanding is that the Recoverability stuff is to do with the First Level Retries + Second Level Retries but I've not set this up yet. Moreover, there seems to be discussion between the guys about this already: https://github.com/Particular/NServiceBus/pull/3828
Any help would be greatly appreciated :)
As Daniel already mentioned in the comments, this code should not be part of the available beta packages. It's highly recommended to stick with the available beta packages on NuGet instead because it's very likely to run into issues when using any persistence/transport package otherwise. The release-6.0.0 branch is still under active development.
(But of course that exception should not occur at all ;) )

Nservice bus queue not found exception

I have a connector( which receives messages from different connectors. While receiving the message my connector gives the following message:
[Worker.11] WARN NServiceBus.Unicast.Transport.Transact
ional.TransactionalTransport [(null)] <(null)> - Failed raising 'transport messa
ge received' event for message with ID=fd970068-55ad-49c0-8abc-4133b7f7fe12\2138
47
NServiceBus.Unicast.Queuing.QueueNotFoundException ---> System.Messaging.Message
QueueException: Cannot enlist the transaction.
at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransac
tion internalTransaction, MessageQueueTransactionType transactionType)
at NServiceBus.Unicast.Queuing.Msmq.MsmqMessageSender.NServiceBus.Unicast.Que
uing.ISendMessages.Send(TransportMessage message, Address address)
--- End of inner exception stack trace ---
at NServiceBus.Unicast.UnicastBus.HandleTransportMessage(IBuilder childBuilde
r, TransportMessage msg)
at NServiceBus.Unicast.UnicastBus.TransportMessageReceived(Object sender, Tra
nsportMessageReceivedEventArgs e)
at System.EventHandler`1.Invoke(Object sender, TEventArgs e)
at NServiceBus.Unicast.Transport.Transactional.TransactionalTransport.OnTrans
portMessageReceived(TransportMessage msg)
The weird thing is when i was doing the internal testing it was working all fine as soon as this has been now hosted on to amazon server it has started blowing up.
The following is how i have configured my nservice bus with
NServiceBus.Configure.With(
AllAssemblies.Except("libBL.dll").And("libCommon.dll").And("libExtra.dll")).StructureMapBuilder()
.JsonSerializer()
.UnicastBus()
.DoNotAutoSubscribe()
.TransactionTimeout(TimeSpan.FromMinutes(5));
Any help would be much appreciated. If any more information is needed please let me know
Thanks
On searching through the code found out that if a message coming in was failing then i was not rolling back the transaction in which the whole operation began and hence the nservicebus was not able to query the database as it was locked. After putting in the logic for transaction roll back got my connector to work.

The destination queue '<QueueName>#<servername>' could not be found

While testing the pub/sub model, I changed the name of the subscriber queue, while the subscription for the old queue still exists in the DB, so there is a dangling subscription in the DB.
So when publisher and subscriber started and I tried to send message from publisher, following exception happened and basically publisher stopped and no longer send any more message
2011-02-09 09:56:21,115 [6] ERROR Publisher.ServerEndpoint [(null)] <(null)> - Problem occurred when starting the endpoint.
System.Configuration.ConfigurationErrorsException: The destination queue 'StoreInputQueue#' could not be found. You may have misconfigured the destination for this kind of message (Message.EventMessage) in the MessageEndpointMappings of the UnicastBusConfig section in your configuration file.It may also be the case that the given queue just hasn't been created yet, or has been deleted. ---> System.Messaging.MessageQueueException: The queue does not exist or you do not have sufficient permissions to perform the operation.
at System.Messaging.MessageQueue.MQCacheableInfo.get_WriteHandle()
at System.Messaging.MessageQueue.StaleSafeSendMessage(MQPROPS properties, IntPtr transaction)
at System.Messaging.MessageQueue.SendInternal(Object obj, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageQueue.Send(Object obj, MessageQueueTransactionType transactionType)
at NServiceBus.Unicast.Transport.Msmq.MsmqTransport.Send(TransportMessage m, String destination) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\impl\unicast\NServiceBus.Unicast.Msmq\MsmqTransport.cs:line 334
--- End of inner exception stack trace ---
at NServiceBus.Unicast.Transport.Msmq.MsmqTransport.Send(TransportMessage m, String destination) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\impl\unicast\NServiceBus.Unicast.Msmq\MsmqTransport.cs:line 346
at NServiceBus.Unicast.UnicastBus.SendMessage(IEnumerable`1 destinations, String correlationId, MessageIntentEnum messageIntent, IMessage[] messages) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 593
at NServiceBus.Unicast.UnicastBus.Publish[T](T[] messages) in d:\BuildAgent-02\work\20b5f701adefe8f8\src\unicast\NServiceBus.Unicast\UnicastBus.cs:line 343
at Publisher.ServerEndpoint.Run() in C:\Downloads\ESB\NServiceBus\publisher\publisher\ServerEndpoint.cs:line 26
at NServiceBus.Host.Internal.ConfigManager.<>c_DisplayClass1.b_0() in d:\BuildAgent-02\work\20b5f701adefe8f8\src\host\NServiceBus.Host\Internal\ConfigurationManager.cs:line 56
Is there a timeout period after which it will try to send message to rest of subscribers, I waited quite long...
I don't think it will retry.
Pulling the rug (queue) out from under a running endpoint is not a good thing to do. In production this really should never happen.
Since you're just testing, delete the offending subscription row from the database, and restart the endpoint, and everything should be fine.