Fiware Orion Context Broker 2.2.0 in Docker - stop to send notifications after a while - notifications

We are experiencing problems with the notifications sent by Orion for new data after the last update to version 2.2.0.
Orion is run inside a Docker container.
Specifically, the problem is this:
when we start the docker container, every endpoint is contacted when a new data arrives. But then, after a while (less than 1 day), some endpoints (currently, the one hosted by Amazon Web Service) stopped to be contacted. The error obtained is: 'notification failure for sender-thread: Timeout was reached'
As additional information,
if we try to send data manually (through a CURL request performed in a bash instance inside the docker container) it works fine. While Orion cannot contact the endpoint falling in a "Timeout" exception.
Furthermore, if we restart the container (with consequent deletion of the contextBroker.pid from the dedicated folder in (/var/lib/docker/overlay2/)), it starts to push data again.
Linked issue on github

Related

NestJS, Bull Queue and Redis in production

I have a project using NestJS version 9.2.0 with Bull Queue version 4.9.0 which is connecting to Redis to schedule jobs.
Locally this is working as expected, however, when deploying to staging then Bull won't connect and it fails with error maxRetriesPerRequest.
My assumption is that on staging Bull is not being able to connect over TLS connection, however I am unable to confirm that this is definitely the issue.
I have tried the following without having any success:
set the Redis url as rediss://.....
pass the parameter ?tls=true at the end of the Redis url
pass an empty tls: {} object as part of the Redis configuration which I then use within the BullModule that I am importing as use as factory within the app.module file
The reasons why I suspect it is an issue related to TLS connection:
the local connection works just fine
when I set an invalid URL locally mocking a scenario where Bull cannot connect to Redis I experience the same behaviour in the console as it happens on staging and production
I ensured that the Redis instance on staging and production is up and running and that the url I am using in the app is correct

Dynatrace one agent in ecs fargate containers stops but application container is running

Am trying to install one agent in my ECS fargate task. Along with application container i have added another container definition for one agent with image as alpine:latest and used run time injection.
While running the task, initially the one agent container is in running state and after a minute it goes to stopped state same time application container will be in running state.
In dynatrace the same host is available and keeps recreating after 5-10mins frequently.
Actually the issue that I had was task was in draining status because of application issue due to which in dynatrace it keeps recreating... And the same time i used run time injection for my ECS fargate so once the binaries are downloaded and injected to volume, the one agent container definition will stop while the application container keeps running and injecting logs in dynatrace.
I have the same problem and connected via ssh to the cluster I saw that the agent needs to be privileged. The only thing that worked for me was sending traces and metrics through Opentelemetry.
https://aws-otel.github.io/docs/components/otlp-exporter
Alternative:
use sleep infinity in the command field of your oneAgent container.

What if I don't close Service Bus Queue Client and the process stops?

I m trying to deploy a never ending message pump to process service bus queue messages as a .NET Core console app on Azure AKS Kubernetes.
The app will have auto-scaling based on the number of message in the queue, so more app may be deployed and when started they will connect to the service bus and start RegisterMessageHandler.
The auto-scaler may also teardown the app but without an event to signal the shutdown to the Console app I won't be able to close the queue properly to stop receiving messages or stop processing messages.
How does one handle that in .NET Core on AKS?
There are two signals available on Pod shutdown. The PreStop hook can be configured to trigger a shell command or generate an HTTP request against your container. You should also expect your running process to receive a TERM signal prior to the pod stopping.
Here's a blog post which covers some basics on hooking the TERM signal by implementing IApplicationLifetime within the context of a Kestrel server.
This article has a thorough end-to-end example of a simple implementation of IApplicationLifetime.
For .NET 3.x, IHostApplicationLifetime supersedes IApplicationLifetime.

How to properly restart a kafka s3 sink connect?

I started a kafka s3 sink connector (bundle connector from confluent package) since 1 May. It works fine until 8 May. Checking the status, it tells that some aws exception crashes this connector. This should not be a big problem, so I want to restore it.
I tried the following steps:
I POST /connectors/s3sink/restart . Then I saw the connector is in RUNNING mode, but the task is still FAIL.
Then I PUT /connectors/s3sink/task/0/restart. Ok, now the task is in RUNNING mode.
But then I tail the log, I found it starts to rewrite the old data, such as 3 May data. And it messed the old data!
So, does connect restart REST API reset the offset? I thought it will save the offset and just start from the offset it fails.
And how to restart a failed connector task correctly? By deleting those PODs? (using kubernetes), or by REST /task/0/restart? When should I use /connectors/s3sink/restart?
/connector/:name/restart is a rolling restart operation on the worker leader that needs to propagate to all worker server tasks in async fashion. So, you need to ensure network connection between the leader worker and all others.
/connector/:name/task/:num/restart will send request straight to that worker, restarting the thread.
Restart should not reset the offset since they are stored in the consumer offsets topic for that connect cluster. If anything, the tasks were not able to commit offsets back to the __consumer_offsets topic, but you should see logs for that.

Service not activated when message in queue after worker process shut down

I have a (dead-letter) queue on my local machine called logging/logdeadletterservice.svc. I have a corresponding service running at appdev.me.com/logging/logdeadletterservice.svc to pull the messages from the queue and resubmit them. This works great so long as the worker process is running. However, once the worker process is shut down (or if it hasn't come up yet), the service no longer gets messages from the queue unless I browse to the SVC manually.
According to this post, NETWORK SERVICE needs permissions to peek the queue. I went ahead and added that permission, but the message was not pulled from the queue. I tried restarting the Net.Msmq Listener Adapter (which is, indeed, running under Network Service), but still no go.
Any ideas on what I'm doing wrong?
EDIT: I've tried running sc sidtype netmsmqactivator unrestricted and restarting the service, but no go. Switched it back to restricted (original) after it didn't resolve the issue.
EDIT2: Also tried running the Net.Msmq Listener Adapter as myself (which is the user under which the service is running), but no go.
Ended up using AppFabric and running the following commands against appcmd.exe to get the pool to be always available and always warmed up:
%windir%\system32\inetsrv\appcmd.exe set apppool "My Site" /startMode:AlwaysRunning
%windir%\system32\inetsrv\appcmd.exe set app /app.name:"My Site/My App" /serviceAutoStartEnabled:True /serviceAutoStartMode:All /serviceAutoStartProvider:Service