I have an azure app service (based on Docker) that uses Redis as a cache memory. When I reboot/scale redis server, redis client inside azure app service lose connection with server and throws the following exception:
Timeout awaiting response (outbound=0KiB, inbound=0KiB, 2852ms elapsed, timeout is 2000ms), command=SETEX, next: GET test, inst: 0, qu: 0, qs: 45, aw: False, rs: ReadAsync, ws: Idle, in: 0, serverEndpoint: Unspecified/redis-server-com:6380, mgr: 10 of 10 available, clientName: wallet-api, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=1,Free=32766,Min=4,Max=32767), v: 2.0.601.3402 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
It takes up to 15min to reconnect with redis server from azure app service, however if I restart app service as soon as app is up, redis client connection is established successfully. From documentation, ConnectionMultiplexor object should manage reconnection, but it does not look like he is doing his job.
Here the redis client code:
public class RedisStore : IRedisStore, IDisposable
{
private readonly ConfigurationOptions _options;
private static IConnectionMultiplexer _connection;
public RedisStore(RedisConfiguration redisConfiguration)
{
_options = ConfigurationOptions.Parse(redisConfiguration.ConnectionString);
_options.ReconnectRetryPolicy = new ExponentialRetry(redisConfiguration.RetryFromMilliSeconds);
}
async Task IRedisStore.InitializeConnection()
{
if (_connection == null)
{
_connection = await ConnectionMultiplexer.ConnectAsync(_options);
}
}
async Task<T> IRedisStore.SetGet<T>(string key)
{
var value = await _connection.GetDatabase().StringGetAsync(key);
if (value.IsNull)
return default(T);
return JsonConvert.DeserializeObject<T>(value);
}
async Task IRedisStore.SetStore<T>(string key, T value)
{
var serialized = JsonConvert.SerializeObject(value);
await _connection.GetDatabase().StringSetAsync(key, serialized);
}
void IDisposable.Dispose()
{
_connection.Dispose();
}
}
The redis connection is initialized from bootstrap code:
private async Task InitializeRedis()
{
var redis = Container.GetInstance<IRedisStore>();
await redis.InitializeConnection();
}
Also, while app service is throwing redis timeout exceptions, netstat displayed that redis connection is established:
Just before to establish connection again, I got the following 2 exceptions, I guess one for each connection:
SocketFailure on redis-server.com:6380/Interactive, Idle/Faulted, last: GET, origin: ReadFromPipe, outstanding: 52, last-read: 982s ago, last-write: 6s ago, unanswered-write: 938s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402 <--- Unable to read data from the transport connection: Connection timed out. <--- Connection timed out
SocketFailure on redis-server.com:6380/Subscription, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 16, last-read: 998s ago, last-write: 36s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402 <--- Unable to read data from the transport connection: Connection timed out. <--- Connection timed out
Why connection is not refreshed? Is there any way to improve reconnection? 15min is too much for a production environment.
UPDATE 03/09/2020. I did a quick test rebooting redis server with same client but using a secured connection via SSL (port 6380) and a plain connection (port 6379). Checking netstat (netstat -ptona) with a plain connection, redis client reconnect successfully. However checking again with SSL enabled, connection keeps established but there is no response from redis server.
Possible workaround: It looks like something related to framework. As #Json Pan suggested in his reply, I will try upgrading to netcore 3.1 and force app to refresh connection periodically.
UPDATE
After read this blog, I modify the source code, upgrade the project from .net core 1.0 to 3.1.
I suggest you can try it or modify it in your project, to test reconnect time.
You can download my sample code.
PRIVIOUS
I recommand you use Reconnecting with Lazy pattern.
And the answer in How does ConnectionMultiplexer deal with disconnects?, will useful to you.
Related
I'm trying to use MQTTNet as service broker, that takes requests from webClients over MQTT.js. However, this method is notworking for unknown reasons.
when i test the Service Broker using the windows application "MQTT Explorer" as a client, it works fine.
When i test the MQTT.js Client to connect to an open Service Broker like broker.emqx.io it works also fine.
but the connection between my service Broker with the mqtt client has always a problem. The following error is thrown from the MQTTNet Server:
Client '[::1]:58434' accepted by TCP listener '[::]:8883, ipv6'.
Expected at least 21538 bytes but there are only 69 bytes
MQTTnet.Exceptions.MqttProtocolViolationException: Expected at least 21538 bytes but there are
only 69 bytes
at MQTTnet.Formatter.MqttBufferReader.ReadString()
at MQTTnet.Formatter.MqttPacketFormatterAdapter.ParseProtocolVersion(ReceivedMqttPacket
receivedMqttPacket)
at MQTTnet.Formatter.MqttPacketFormatterAdapter.DetectProtocolVersion(ReceivedMqttPacket
receivedMqttPacket)
at MQTTnet.Adapter.MqttChannelAdapter.ReceivePacketAsync(CancellationToken cancellationToken)
at MQTTnet.Server.MqttClientSessionsManager.ReceiveConnectPacket(IMqttChannelAdapter
channelAdapter, CancellationToken cancellationToken)
at MQTTnet.Server.MqttClientSessionsManager.HandleClientConnectionAsync(IMqttChannelAdapter
channelAdapter, CancellationToken cancellationToken)
Client '[::1]:58434' disconnected at TCP listener '[::]:8883, ipv6'.
configuration of my server are as following:
static async Task<MqttServer> StartMqttServer(bool isDevelopment, ConsoleLogger consoleLogger = null)
{
MqttFactory mqttFactory = new MqttFactory();
if (consoleLogger != null)
{
mqttFactory = new MqttFactory(consoleLogger);
}
// Due to security reasons the "default" endpoint (which is unencrypted) is not enabled by default!
var mqttServerOptions = mqttFactory.CreateServerOptionsBuilder()
.WithDefaultEndpoint()
.Build();
var server = mqttFactory.CreateMqttServer(mqttServerOptions);
await server.StartAsync();
return server;
}
does anybody know why is this happening? And perhaps have an idea how i can fix it?
Thanks in advaced.
I have a .NET 5 webapi application. Another .NET 5 console application is sending requests to the webapi. I have 10 seconds timeout on each request and I am sending requests one by one with a short interval. On my PC (which is very fast) I send thousands of requests and all of them succeed. But on another PC (which is pretty slow) I get few request failed, approximately every 2 minutes. First request takes a few seconds to send when the application starts because of connection estabishment to webapi.
var httpClient = new HttpClient
{
Timeout = TimeSpan.FromSeconds(10)
};
await PerformPingAsync(httpClient, interval, cts.Token).ConfigureAwait(false);
private static async Task PerformPingAsync(HttpClient httpClient, int interval, CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
try
{
var request = new HttpRequestMessage
{
RequestUri = new Uri("http://127.0.0.1/api/service/ping"),
Method = HttpMethod.Get
};
var response = await httpClient.SendAsync(request, cancellationToken).ConfigureAwait(false);
response.EnsureSuccessStatusCode();
_consoleLogger.Info("Ping attempt succeed");
}
catch (Exception e)
{
_consoleLogger.Error(e);
}
await Task.Delay(TimeSpan.FromSeconds(interval), cancellationToken).ConfigureAwait(false);
}
}
Interval between requests is 1 second.
I have tried .net core 3.1, .NET 5 and 6 for my console ping application - and all of them have request timeouts (10 seconds). I have created the same console application in .net framework 4.8 and I get no timeouts. I also created a html page with JS fetch to send request and I get no timeouts.
Also what I have discovered that these timeouts can start to happen after a PC reboot. After some of reboots I might not get any timeouts but after the other reboot - I have them.
I found out that HttpClient acts a bit different in .net core/.NET 5/6 than in .net framework. And I tried to use SocketsHttpHandler:
var socketsHandler = new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(10),
PooledConnectionIdleTimeout = TimeSpan.FromMinutes(10)
};
return new HttpClient(socketsHandler)
{
Timeout = TimeSpan.FromSeconds(10)
};
But I have no success. Due to the official documention PooledConnectionLifetime timer should be reset after request is sent. If I have no requests for 10 minutes the connection will be closed. It's obvious that my N request is timed out because connection is being reestablished. But why it's happening, why my SocketsHttpHandler is not working? Any ideas?
I have a ratchet server, that I try to access via Websocket. It is similar to the tutorial: logging when there is a new client or when it receives a message. The Ratchet server reports having successfully established a connection while the Kotlin client does not (the connection event in Kotlin is never fired). I am using the socket-io-java module v.2.0.1. The client shows a timeout after the specified timeout time, gets detached at the server and attaches again after a short while, just as it seems to think, the connection did not properly connect (because of a missing connection response?).
The successful connection confirmation gets reported to the client, if the client is a Websocket-Client in the JS-console of Chrome, but not to my Kotlin app. Even an Android emulator running on the same computer doesn´t get a response (So I think the problem is not wi-fi related).
The connection works fine with JS, completing the full handshake, but with an Android app it only reaches the server, but never the client again.
That´s my server code:
<?php
namespace agroSMS\Websockets;
use Ratchet\ConnectionInterface;
use Ratchet\MessageComponentInterface;
class SocketConnection implements MessageComponentInterface
{
protected \SplObjectStorage $clients;
public function __construct() {
$this->clients = new \SplObjectStorage;
}
function onOpen(ConnectionInterface $conn)
{
$this->clients->attach($conn);
error_log("New client attached");
}
function onClose(ConnectionInterface $conn)
{
$this->clients->detach($conn);
error_log("Client detached");
}
function onError(ConnectionInterface $conn, \Exception $e)
{
echo "An error has occurred: {$e->getMessage()}\n";
$conn->close();
}
function onMessage(ConnectionInterface $from, $msg)
{
error_log("Received message: $msg");
// TODO: Implement onMessage() method.
}
}
And the script that I run in the terminal:
<?php
use Ratchet\Server\IoServer;
use agroSMS\Websockets\SocketConnection;
use Ratchet\WebSocket\WsServer;
use Ratchet\Http\HttpServer;
require dirname(__DIR__) . '/vendor/autoload.php';
$server = IoServer::factory(
new HttpServer(
new WsServer(
new SocketConnection()
)
)
);
$server->run();
What I run in the browser for tests (returns "Connection established" in Chrome, but for some reason not in the Browser "Brave"):
var conn = new WebSocket('ws://<my-ip>:80');
conn.onopen = function(e) {
console.log("Connection established!");
};
conn.onmessage = function(e) {
console.log(e.data);
};
What my Kotlin-code looks like:
try {
val uri = URI.create("ws://<my-ip>:80")
val options = IO.Options.builder()
.setTimeout(60000)
.setTransports(arrayOf(WebSocket.NAME))
.build()
socket = IO.socket(uri, options)
socket.connect()
.on(Socket.EVENT_CONNECT) {
Log.d(TAG, "[INFO] Connection established")
socket.send(jsonObject)
}
.once(Socket.EVENT_CONNECT_ERROR) {
val itString = gson.toJson(it)
Log.d(TAG, itString)
}
}catch(e : Exception) {
Log.e(TAG, e.toString())
}
After a minute the Kotlin code logs a "timeout"-error, detaches from the server, and attaches again.
When I stop the script on the server, it then gives an error: "connection reset, websocket error" (which makes sense, but why doesn´t he get the connection in the first time?)
I also tried to "just" change the protocol to "wss" in the url, in case it might be the problem, even though my server doesn´t even work with SSL, but this just gave me another error:
[{"cause":{"bytesTransferred":0,"detailMessage":"Read timed out","stackTrace":[],"suppressedExceptions":[]},"detailMessage":"websocket error","stackTrace":[],"suppressedExceptions":[]}]
And the connection isn´t even established at the server. So this try has been more like a down-grade.
I went to the github page of socket.io-java-client to find a solution to my problem there and it turned out, the whole problem was, that I misunderstood a very important concept:
That socket.io uses Websockets doesn´t mean it is compatible with Websockets.
So speaking in clear words:
If you use socket.io at client side, you also need to use it at the server side and vice versa. Since socket.io sends a lot of meta data with its packets, a pure Websocket-server will accept their connection establishment, but his acknowledgement coming back will not be accepted by the socket.io client.
You have to go for either full socket.io or full pure Websockets.
I am runnning an Xunit functional test within a docker compose stack based on Debian 3.1-buster with Confluent Kafka .NET Client v1.5.3 connecting to broker confluentinc/cp-kafka:6.0.1. I am fairly new to Kafka....
The architecture is illustrated below:
I am testing with xUnit and have a class fixture that starts an in-process generic Kestrel host for the lifetime of the test collection/class. I am using an in-process generic host since I have an additional signalR service which uses websockets. From what I understand the WebApplicationFactory is in-memory only and does not use network sockets.
The generic host contains a Kafka producer and consumer. The producer is a singleton service that produces using the Produce method. The consumer is BackgroundService that runs a Consume loop with a cancellation token (see listing further below). The consumer has the following configuration:
EnableAutoCommit: true
EnableAutoOffsetStore: false
AutoOffsetReset: AutoOffsetReset.Latest
It is a single consumer with 3 partitions. The group.initial.rebalance.delay is configured as 1000ms.
The test spawns a thread that sends an event to trigger the producer to post data onto the Kafka topic. The test then waits for a time delay via ManualResetEvent to allow time for the consumer to process the topic data.
Problem with Consumer is Blocking
When I run the test within a docker-compose environment I can see from the logs (included below) that:
The producer and consumer are connected to the same broker and topic
The producer sends the data to the topic but the consumer is blocking
The xUnit and in-process Kestrel host are running within a docker-compose service within the same network as the kafka service. The Kafka producer is able to successfully post data onto the kafka topic as demonstrated by the logs below.
I have created an additional docker-compose service that runs a python client consumer. This uses a poll loop to consume data posted while running the test. Data is consumed by the Python client.
Does anyone have any ideas why the consumer would be blocking within this environment to assist with fault finding?
Would the wait performed in the xUnit test block the in-process Kestrel host started by the xUnit fixture?
If I run the Kestrel host locally on MacOS Catalina 10.15.7 connecting to Kafka (image:lensesio:fast-data-dev-2.5.1-L0) in docker-compose it produces and consumes successfully.
Update - Works with lensesio image
The local docker-compose that works uses docker image for lensesio:fast-data-dev-2.5.1-L0. This uses Apache Kafka 2.5.1 and Confluent components 5.5.1. I have also tried:
Downgrading to Confluent Kafka images 5.5.1
Upgrading the .Net Confluent Client to 1.5.3
The result remains the same, the producer produces fine, however the Consumer blocks.
What is the difference between lensesio:fast-data-dev-2.5.1-LO configuration and the confluent/cp images that would cause the blocking?
I have tagged the working docker-compose configuration onto the end of this query.
Update - Works for the confluent/cp-kafka image when group.initial.rebalance.delay is 0ms
Originally the group.initial.rebalance.delay was 1ms, the same as the lensesio:fast-data-dev-2.5.1-LO image. The 1ms settings on confluent/cp-kafka image exhibits the blocking behaviour.
If I change the group.initial.rebalance.delay to 0ms then no blocking occurs with the confluent/cp-kafka image.
Does the lensesio:fast-data-dev-2.5.1-LO image offer better performance in a docker-compose development environment when used with the confluent-kafka-dotnet client?
Test
[Fact]
public async Task MotionDetectionEvent_Processes_Data()
{
var m = new ManualResetEvent(false);
// spawn a thread to publish a message and wait for 14 seconds
var thread = new Thread(async () =>
{
await _Fixture.Host.MqttClient.PublishAsync(_Fixture.Data.Message);
// allow time for kafka to consume event and process
Console.WriteLine($"TEST THREAD IS WAITING FOR 14 SECONDS");
await Task.Run(() => Task.Delay(14000));
Console.WriteLine($"TEST THREAD IS COMPLETED WAITING FOR 14 SECONDS");
m.Set();
});
thread.Start();
// wait for the thread to have completed
await Task.Run(() => { m.WaitOne(); });
// TO DO, ASSERT DATA AVAILABLE ON S3 STORAGE ETC.
}
Test Output - Producer has produced data onto the topic but consumer has not consumed
Test generic host example
SettingsFile::GetConfigMetaData ::: Directory for executing assembly :: /Users/simon/Development/Dotnet/CamFrontEnd/Tests/Temp/WebApp.Test.Host/bin/Debug/netcoreapp3.1
SettingsFile::GetConfigMetaData ::: Executing assembly :: WebApp.Testing.Utils, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
AutofacTestHost is using settings file /Users/simon/Development/Dotnet/CamFrontEnd/Tests/Temp/WebApp.Test.Host/bin/Debug/netcoreapp3.1/appsettings.Local.json
info: WebApp.Mqtt.MqttService[0]
Mqtt Settings :: mqtt://mqtt:*********#localhost:1883
info: WebApp.Mqtt.MqttService[0]
Mqtt Topic :: shinobi/+/+/trigger
info: WebApp.S3.S3Service[0]
Minio client created for endpoint localhost:9000
info: WebApp.S3.S3Service[0]
minio://accesskey:12345678abcdefgh#localhost:9000
info: Extensions.Hosting.AsyncInitialization.RootInitializer[0]
Starting async initialization
info: Extensions.Hosting.AsyncInitialization.RootInitializer[0]
Starting async initialization for WebApp.Kafka.Admin.KafkaAdminService
info: WebApp.Kafka.Admin.KafkaAdminService[0]
Admin service trying to create Kafka Topic...
info: WebApp.Kafka.Admin.KafkaAdminService[0]
Topic::eventbus, ReplicationCount::1, PartitionCount::3
info: WebApp.Kafka.Admin.KafkaAdminService[0]
Bootstrap Servers::localhost:9092
info: WebApp.Kafka.Admin.KafkaAdminService[0]
Admin service successfully created topic eventbus
info: WebApp.Kafka.Admin.KafkaAdminService[0]
Kafka Consumer thread started
info: Extensions.Hosting.AsyncInitialization.RootInitializer[0]
Async initialization for WebApp.Kafka.Admin.KafkaAdminService completed
info: Extensions.Hosting.AsyncInitialization.RootInitializer[0]
Async initialization completed
info: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[0]
User profile is available. Using '/Users/simon/.aspnet/DataProtection-Keys' as key repository; keys will not be encrypted at rest.
info: WebApp.Kafka.ProducerService[0]
ProducerService constructor called
info: WebApp.Kafka.SchemaRegistry.Serdes.JsonDeserializer[0]
Kafka Json Deserializer Constructed
info: WebApp.Kafka.ConsumerService[0]
Kafka consumer listening to camera topics =>
info: WebApp.Kafka.ConsumerService[0]
Camera Topic :: shinobi/RHSsYfiV6Z/xi5cncrNK6/trigger
info: WebApp.Kafka.ConsumerService[0]
Camera Topic :: shinobi/group/monitor/trigger
%7|1607790673.462|INIT|rdkafka#consumer-3| [thrd:app]: librdkafka v1.5.3 (0x10503ff) rdkafka#consumer-3 initialized (builtin.features gzip,snappy,ssl,sasl,regex,lz4,sasl_gssapi,sasl_plain,sasl_scram,plugins,zstd,sasl_oauthbearer, STATIC_LINKING CC GXX PKGCONFIG OSXLD LIBDL PLUGINS ZLIB SSL SASL_CYRUS ZSTD HDRHISTOGRAM SNAPPY SOCKEM SASL_SCRAM SASL_OAUTHBEARER CRC32C_HW, debug 0x2000)
info: WebApp.Kafka.ConsumerService[0]
Kafka consumer created => Name :: rdkafka#consumer-3
%7|1607790673.509|SUBSCRIBE|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": subscribe to new subscription of 1 topics (join state init)
%7|1607790673.509|REBALANCE|rdkafka#consumer-3| [thrd:main]: Group "consumer-group" is rebalancing in state init (join-state init) without assignment: unsubscribe
info: WebApp.Kafka.ConsumerService[0]
Kafka consumer has subscribed to topic eventbus
info: WebApp.Kafka.ConsumerService[0]
Kafka is waiting to consume...
info: WebApp.Mqtt.MqttService[0]
MQTT managed client connected
info: Microsoft.Hosting.Lifetime[0]
Now listening on: http://127.0.0.1:65212
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
Content root path: /Users/simon/Development/Dotnet/CamFrontEnd/Tests/Temp/WebApp.Test.Host/bin/Debug/netcoreapp3.1/
MQTT HAS PUBLISHED...SPAWNING TEST THREAD TO WAIT
TEST THREAD IS WAITING FOR 14 SECONDS
info: WebApp.S3.S3Service[0]
Loading json into JSON DOM and updating 'img' property with key 2d8e2438-e674-4d71-94ac-e54df0143a29
info: WebApp.S3.S3Service[0]
Extracting UTF8 bytes from base64
info: WebApp.S3.S3Service[0]
Updated JSON payload with img: 2d8e2438-e674-4d71-94ac-e54df0143a29, now uploading 1.3053922653198242 MB to S3 storage
%7|1607790674.478|JOIN|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": postponing join until up-to-date metadata is available
%7|1607790674.483|REJOIN|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": subscription updated from metadata change: rejoining group
%7|1607790674.483|REBALANCE|rdkafka#consumer-3| [thrd:main]: Group "consumer-group" is rebalancing in state up (join-state init) without assignment: group rejoin
%7|1607790674.483|JOIN|rdkafka#consumer-3| [thrd:main]: 127.0.0.1:9092/1: Joining group "consumer-group" with 1 subscribed topic(s)
%7|1607790674.541|JOIN|rdkafka#consumer-3| [thrd:main]: 127.0.0.1:9092/1: Joining group "consumer-group" with 1 subscribed topic(s)
info: WebApp.S3.S3Service[0]
Converting modified payload back to UTF8 bytes for Kafka processing
info: WebApp.Kafka.ProducerService[0]
Produce topic : eventbus, key : shinobi/group/monitor/trigger, value : System.Byte[]
info: WebApp.Kafka.ProducerService[0]
Delivered message to eventbus [[2]] #0
%7|1607790675.573|ASSIGNOR|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": "range" assignor run for 1 member(s)
%7|1607790675.588|ASSIGN|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": new assignment of 3 partition(s) in join state wait-sync
%7|1607790675.588|OFFSET|rdkafka#consumer-3| [thrd:main]: GroupCoordinator/1: Fetch committed offsets for 3/3 partition(s)
%7|1607790675.717|FETCH|rdkafka#consumer-3| [thrd:main]: Partition eventbus [0] start fetching at offset 0
%7|1607790675.719|FETCH|rdkafka#consumer-3| [thrd:main]: Partition eventbus [1] start fetching at offset 0
%7|1607790675.720|FETCH|rdkafka#consumer-3| [thrd:main]: Partition eventbus [2] start fetching at offset 1
** EXPECT SOME CONSUMER DATA HERE - INSTEAD IT IS BLOCKING WITH confluent/cp-kafka image **
TEST THREAD IS COMPLETED WAITING FOR 14 SECONDS
Timer Elapsed
Shutting down generic host
info: Microsoft.Hosting.Lifetime[0]
Application is shutting down...
info: WebApp.Mqtt.MqttService[0]
Mqtt managed client disconnected
info: WebApp.Kafka.ConsumerService[0]
The Kafka consumer thread has been cancelled
info: WebApp.Kafka.ConsumerService[0]
Kafka Consumer background service disposing
%7|1607790688.191|CLOSE|rdkafka#consumer-3| [thrd:app]: Closing consumer
%7|1607790688.191|CLOSE|rdkafka#consumer-3| [thrd:app]: Waiting for close events
%7|1607790688.191|REBALANCE|rdkafka#consumer-3| [thrd:main]: Group "consumer-group" is rebalancing in state up (join-state started) with assignment: unsubscribe
%7|1607790688.191|UNASSIGN|rdkafka#consumer-3| [thrd:main]: Group "consumer-group": unassigning 3 partition(s) (v5)
%7|1607790688.191|LEAVE|rdkafka#consumer-3| [thrd:main]: 127.0.0.1:9092/1: Leaving group
%7|1607790688.201|CLOSE|rdkafka#consumer-3| [thrd:app]: Consumer closed
%7|1607790688.201|DESTROY|rdkafka#consumer-3| [thrd:app]: Terminating instance (destroy flags NoConsumerClose (0x8))
%7|1607790688.201|CLOSE|rdkafka#consumer-3| [thrd:app]: Closing consumer
%7|1607790688.201|CLOSE|rdkafka#consumer-3| [thrd:app]: Disabling and purging temporary queue to quench close events
%7|1607790688.201|CLOSE|rdkafka#consumer-3| [thrd:app]: Consumer closed
%7|1607790688.201|DESTROY|rdkafka#consumer-3| [thrd:main]: Destroy internal
%7|1607790688.201|DESTROY|rdkafka#consumer-3| [thrd:main]: Removing all topics
info: WebApp.Mqtt.MqttService[0]
Disposing Mqtt Client
info: WebApp.Kafka.ProducerService[0]
Flushing remaining messages to produce...
info: WebApp.Kafka.ProducerService[0]
Disposing Kafka producer...
info: WebApp.S3.S3Service[0]
Disposing of resources
Stopping...
Kafka Consumer
using System;
using System.Threading;
using System.Threading.Tasks;
using Confluent.Kafka;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Options;
using Microsoft.AspNetCore.SignalR;
using WebApp.Data;
using WebApp.Kafka.Config;
using WebApp.Realtime.SignalR;
namespace WebApp.Kafka
{
public delegate IConsumer<string, MotionDetection> ConsumerFactory(
KafkaConfig config,
IAsyncDeserializer<MotionDetection> serializer
);
public class ConsumerService : BackgroundService, IDisposable
{
private KafkaConfig _config;
private readonly IConsumer<string, MotionDetection> _kafkaConsumer;
private ILogger<ConsumerService> _logger;
private IHubContext<MotionHub, IMotion> _messagerHubContext;
private IAsyncDeserializer<MotionDetection> _serializer { get; }
public ConsumerFactory _factory { get; set; }
// Using SignalR with background services:
// https://learn.microsoft.com/en-us/aspnet/core/signalr/background-services?view=aspnetcore-2.2
public ConsumerService(
IOptions<KafkaConfig> config,
ConsumerFactory factory,
IHubContext<MotionHub, IMotion> messagerHubContext,
IAsyncDeserializer<MotionDetection> serializer,
ILogger<ConsumerService> logger
)
{
if (config is null)
throw new ArgumentNullException(nameof(config));
_config = config.Value;
_factory = factory ?? throw new ArgumentNullException(nameof(factory));
_logger = logger ?? throw new ArgumentNullException(nameof(logger));
_messagerHubContext = messagerHubContext ?? throw new ArgumentNullException(nameof(messagerHubContext));
_serializer = serializer ?? throw new ArgumentNullException(nameof(serializer));
// enforced configuration
_config.Consumer.EnableAutoCommit = true; // allow consumer to autocommit offsets
_config.Consumer.EnableAutoOffsetStore = false; // allow control over which offsets stored
_config.Consumer.AutoOffsetReset = AutoOffsetReset.Latest; // if no offsets committed for topic for consumer group, default to latest
_config.Consumer.Debug = "consumer";
_logger.LogInformation("Kafka consumer listening to camera topics =>");
foreach (var topic in _config.MqttCameraTopics) { _logger.LogInformation($"Camera Topic :: {topic}"); }
_kafkaConsumer = _factory(_config, _serializer);
_logger.LogInformation($"Kafka consumer created => Name :: {_kafkaConsumer.Name}");
}
protected override Task ExecuteAsync(CancellationToken cancellationToken)
{
new Thread(() => StartConsumerLoop(cancellationToken)).Start();
return Task.CompletedTask;
}
private void StartConsumerLoop(CancellationToken cancellationToken)
{
_kafkaConsumer.Subscribe(_config.Topic.Name);
_logger.LogInformation($"Kafka consumer has subscribed to topic {_config.Topic.Name}");
while (!cancellationToken.IsCancellationRequested)
{
try
{
_logger.LogInformation("Kafka is waiting to consume...");
var consumerResult = _kafkaConsumer.Consume(cancellationToken);
_logger.LogInformation("Kafka Consumer consumed message => {}", consumerResult.Message.Value);
if (_config.MqttCameraTopics.Contains(consumerResult.Message.Key))
{
// we need to consider here security for auth, only want for user
// await _messagerHubContext.Clients.All.ReceiveMotionDetection(consumerResult.Message.Value);
_logger.LogInformation("Kafka Consumer dispatched message to SignalR");
// instruct background thread to commit this offset
_kafkaConsumer.StoreOffset(consumerResult);
}
}
catch (OperationCanceledException)
{
_logger.LogInformation("The Kafka consumer thread has been cancelled");
break;
}
catch (ConsumeException ce)
{
_logger.LogError($"Consume error: {ce.Error.Reason}");
if (ce.Error.IsFatal)
{
// https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#fatal-consumer-errors
_logger.LogError(ce, ce.Message);
break;
}
}
catch (Exception e)
{
_logger.LogError(e, $"Unexpected exception while consuming motion detection {e}");
break;
}
}
}
public override void Dispose()
{
_logger.LogInformation("Kafka Consumer background service disposing");
_kafkaConsumer.Close();
_kafkaConsumer.Dispose();
base.Dispose();
}
}
}
Kestrel Host Configuration
/// <summary>
/// Build the server, with Autofac IOC.
/// </summary>
protected override IHost BuildServer(HostBuilder builder)
{
// build the host instance
return new HostBuilder()
.UseServiceProviderFactory(new AutofacServiceProviderFactory())
.ConfigureLogging(logging =>
{
logging.ClearProviders();
logging.AddConsole();
logging.AddFilter("Microsoft.AspNetCore.SignalR", LogLevel.Information);
})
.ConfigureWebHost(webBuilder =>
{
webBuilder.ConfigureAppConfiguration((context, cb) =>
{
cb.AddJsonFile(ConfigMetaData.SettingsFile, optional: false)
.AddEnvironmentVariables();
})
.ConfigureServices(services =>
{
services.AddHttpClient();
})
.UseStartup<TStartup>()
.UseKestrel()
.UseUrls("http://127.0.0.1:0");
}).Build();
}
docker-compose stack
---
version: "3.8"
services:
zookeeper:
image: confluentinc/cp-zookeeper:6.0.1
hostname: zookeeper
container_name: zookeeper
ports:
- "2181:2181"
networks:
- camnet
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ZOOKEEPER_LOG4J_ROOT_LOGLEVEL: WARN
kafka:
image: confluentinc/cp-kafka:6.0.1
hostname: kafka
container_name: kafka
depends_on:
- zookeeper
networks:
- camnet
ports:
- "9092:9092"
- "19092:19092"
environment:
CONFLUENT_METRICS_ENABLE: 'false'
KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
KAFKA_BROKER_ID: 1
KAFKA_CONFLUENT_BALANCER_TOPIC_REPLICATION_FACTOR: 1
KAFKA_CONFLUENT_LICENSE_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 1000
KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
KAFKA_LOG4J_ROOT_LOGLEVEL: WARN
KAFKA_LOG4J_LOGGERS: "org.apache.zookeeper=WARN,org.apache.kafka=WARN,kafka=WARN,kafka.cluster=WARN,kafka.controller=WARN,kafka.coordinator=WARN,kafka.log=WARN,kafka.server=WARN,kafka.zookeeper=WARN,state.change.logger=WARN"
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
mqtt:
container_name: mqtt
image: eclipse-mosquitto:1.6.9
ports:
- "8883:8883"
- "1883:1883"
- "9901:9001"
environment:
- MOSQUITTO_USERNAME=${MQTT_USER}
- MOSQUITTO_PASSWORD=${MQTT_PASSWORD}
networks:
- camnet
volumes:
- ./Mqtt/Config/mosquitto.conf:/mosquitto/config/mosquitto.conf
- ./Mqtt/Certs/localCA.crt:/mosquitto/config/ca.crt
- ./Mqtt/Certs/server.crt:/mosquitto/config/server.crt
- ./Mqtt/Certs/server.key:/mosquitto/config/server.key
minio:
container_name: service-minio
image: dcs3spp/minio:version-1.0.2
ports:
- "127.0.0.1:9000:9000"
environment:
- MINIO_BUCKET=images
- MINIO_ACCESS_KEY=${MINIO_USER}
- MINIO_SECRET_KEY=${MINIO_PASSWORD}
networks:
- camnet
networks:
camnet:
Works with the lensesio:fast-data-dev image. Why?
version: "3.8"
services:
kafka:
image: lensesio/fast-data-dev:2.5.1-L0
container_name: kafka
networks:
- camnet
ports:
- 2181:2181 # zookeeper
- 3030:3030 # ui
- 9092:9092 # broker
- 8081:8081 # schema registry
- 8082:8082 # rest proxy
- 8083:8083 # kafka connect
environment:
- ADV_HOST=127.0.0.1
- SAMPLEDATA=0
- REST_PORT=8082
- FORWARDLOGS=0
- RUNTESTS=0
- DISABLE_JMX=1
- CONNECTORS=${CONNECTOR}
- WEB_PORT=3030
- DISABLE=hive-1.1
mqtt:
container_name: mqtt
image: eclipse-mosquitto:1.6.9
ports:
- "8883:8883"
- "1883:1883"
- "9901:9001"
environment:
- MOSQUITTO_USERNAME=${MQTT_USER}
- MOSQUITTO_PASSWORD=${MQTT_PASSWORD}
networks:
- camnet
volumes:
- ./Mqtt/Config/mosquitto.conf:/mosquitto/config/mosquitto.conf
- ./Mqtt/Certs/localCA.crt:/mosquitto/config/ca.crt
- ./Mqtt/Certs/server.crt:/mosquitto/config/server.crt
- ./Mqtt/Certs/server.key:/mosquitto/config/server.key
minio:
container_name: service-minio
image: dcs3spp/minio:version-1.0.2
ports:
- "127.0.0.1:9000:9000"
environment:
- MINIO_BUCKET=images
- MINIO_ACCESS_KEY=${MINIO_USER}
- MINIO_SECRET_KEY=${MINIO_PASSWORD}
networks:
- camnet
networks:
camnet:
"peerConnection new connection state: connected"
{
"janus": "webrtcup",
"session_id": 3414770196795261,
"sender": 4530256184020316
}
{
"janus": "media",
"session_id": 3414770196795261,
"sender": 4530256184020316,
"type": "audio",
"receiving": true
}
... 1 minute passes
"peerConnection new connection state: disconnected"
{
"janus": "timeout",
"session_id": 3414770196795261
}
"peerConnection new connection state: failed"
See pastebin for the full logs.
I'm trying to join a videoroom on my Janus server. All requests seem to succeed, and my device shows a connected WebRTC status for around one minute before the connection is canceled because of a timeout.
The WebRTC connection breaking off seems to match up with the WebSocket connection to Janus' API breaking.
I tried adding a heartbeat WebSocket message every 10 seconds, but that didn't help. I'm
joining the room
receiving my local SDP plus candidates
configuring the room with said SDP
receiving an answer from janus
accepting that answer with my WebRTC peer connection.
Not sure what goes wrong here.
I also tried setting a STUN server inside the Janus config, to no avail. Same issue.
Added the server logs to the pastebin too.
RTFM: Janus' websocket connections require a keepalive every <60s.
An important aspect to point out is related to keep-alive messages for WebSockets Janus channels. A Janus session is kept alive as long as there's no inactivity for 60 seconds: if no messages have been received in that time frame, the session is torn down by the server. A normal activity on a session is usually enough to prevent that; for a more prolonged inactivity with respect to messaging, on plain HTTP the session is usually kept alive through the regular long poll requests, which act as activity as long as the session is concerned. This aid is obviously not possible when using WebSockets, where a single channel is used both for sending requests and receiving events and responses. For this reason, an ad-hoc message for keeping alive a Janus session should to be triggered on a regular basis. Link.
You need to send 'keepalive' message with same 'session_id'to keep the session going. Janus closes session after 60 seconds.
Look for the implementation: https://janus.conf.meetecho.com/docs/rest.html
Or do it my way: i do it every 30 seconds in a runnable handler.
private Handler mHandler;
private Runnable fireKeepAlive = new Runnable() {
#Override
public void run() {
String transactionId = getRandomStringId();
JSONObject request = new JSONObject();
try {
request.put("janus", "keepalive");
request.put("session_id", yourSessionId);
request.put("transaction", transactionId);
} catch (JSONException e) {
e.printStackTrace();
}
myWebSocketConnection.sendTextMessage(request.toString());
mHandler.postDelayed(fireKeepAlive, 30000);
}
};
Then in OnCreate()
mHandler = new Handler();
then call this where WebSocket connection Opens:
mHandler.post(fireKeepAlive);
be sure to remove callback on destroy
mHandler.removeCallbacks(fireKeepAlive);