java.lang.OutOfMemoryError: Direct buffer memory Apache Ignite - jvm

Recently we moved to 64 bit JVM with ignite 2.10.0, our cache configuration looks like following (was working alright with 2.9.0),
_cfg = new IgniteConfiguration()
{
IgniteInstanceName = MKT_TICK_DATA_CACHE,
DiscoverySpi = new TcpDiscoverySpi
{
LocalPort = 48500,
LocalPortRange = 60,
IpFinder = new TcpDiscoveryStaticIpFinder
{
Endpoints = new[] { "127.0.0.1:48500..48560" }
}
},
CommunicationSpi = new TcpCommunicationSpi
{
LocalPort = 48100
},
DataStorageConfiguration = new DataStorageConfiguration
{
DefaultDataRegionConfiguration = new DataRegionConfiguration
{
Name = MKT_TICK_DATA_CACHE,
InitialSize = 500L * 1024 * 1024,
MaxSize = 4L * 1024 * 1024 * 1024,
PersistenceEnabled = false,
PageEvictionMode = Apache.Ignite.Core.Configuration.DataPageEvictionMode.Random2Lru
},
SystemRegionInitialSize = 100 * 1024 * 1024
},
ClientConnectorConfiguration = new ClientConnectorConfiguration
{
// Set a port range from 10000 to 10005
Port = 10800,
PortRange = 60,
HandshakeTimeout = TimeSpan.FromSeconds(30),
IdleTimeout = Timeout.InfiniteTimeSpan,
MaxOpenCursorsPerConnection = 10000,
SocketReceiveBufferSize = 100 * 1024 * 1024,
SocketSendBufferSize = 100 * 1024 * 1024,
TcpNoDelay = true,
ThinClientEnabled = true,
ThreadPoolSize = 256
},
JvmOptions = new System.Collections.Generic.List<String>() { "-Xms1g", "-Xmx20g", "-XX:+AlwaysPreTouch", "-XX:+UseG1GC", "-XX:+ScavengeBeforeFullGC", "-XX:+DisableExplicitGC", "-Djava.net.preferIPv4Stack=true", "-XX:MaxDirectMemorySize=30G" }
//JvmOptions = new System.Collections.Generic.List<String>() { "-XX:+AlwaysPreTouch", "-XX:+UseG1GC", "-XX:+ScavengeBeforeFullGC", "-XX:+DisableExplicitGC", "-Djava.net.preferIPv4Stack=true" }
//JvmOptions = new[] { "-Xmx10g" },//, "-XX:+AlwaysPreTouch", "-XX:+UseG1GC", "-XX:+ScavengeBeforeFullGC", "-XX:+DisableExplicitGC" }
};
_cfg.ServiceThreadPoolSize = 256;
_cfg.SystemThreadPoolSize = 128;
_cfg.StripedThreadPoolSize = 128;
_cfg.WorkDirectory = _dir;
var cacheConfig = new CacheConfiguration
{
Name = MKT_TICK_DATA_CACHE,
CacheMode = CacheMode.Partitioned,
Backups = 0,
AtomicityMode = CacheAtomicityMode.Atomic,
LoadPreviousValue = false,
OnheapCacheEnabled = true,
WriteBehindEnabled = false,
WriteSynchronizationMode = CacheWriteSynchronizationMode.FullAsync,
MaxConcurrentAsyncOperations = 300000
};
_cfg.CacheConfiguration = new CacheConfiguration[] { cacheConfig };
Ignition.ClientMode = false;
var ignite = Ignition.Start(_cfg);
_cache = ignite.GetOrCreateCache<Int32, MarketTickS>(MKT_TICK_DATA_CACHE);
_oCache = ignite.GetOrCreateCache<Int64, byte[]>(ORION_L_CACHE);
_oSecCache = ignite.GetOrCreateCache<Int64, CacheSecurity>(ORION_SEC_CACHE);
our cache crashes about 15 mins of running. Role of this cache is to store tick data from market, which is coming in rate # 60K ticks per second.
[14:03:01,528][SEVERE][grid-nio-worker-client-listener-13-#197%QTICKDATA%][] JVM will be halted immediately due to the failure: [failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.OutOfMemoryError: Direct buffer memory]]
Could you help please?
*********** Update ******************************
Updates, after a JDK reinstall on the server. Crash did not happen after 60 minutes of continuous operation. However, still curious Ignite 2.9 takes 3.X GB total ram, but ignite 2.10 takes 32.x GB total ram with same cache configurations (Same set of C# binaries, except ignite with 32.x GB memory is 2.10, and with 3.x GB memory is 2.9). Snap shots attached below,
with 2.9,

Try to remove -XX:+DisableExplicitGC switch, it is known to interfere with direct buffer memory allocation, and therefore no longer recommended.
Not sure why it takes more RAM than before, need more details.

Related

Routing Query Not Working in Azure IoT Hub Event Grid

I created a device simulator with the following code:
private static async void SendDeviceToCloudMessagesAsync()
{
while (true)
{
var tdsLevel = Rand.Next(10, 1000);
var filterStatus = tdsLevel % 2 == 0 ? "Good" : "Bad";
var waterUsage = Rand.Next(0, 500);
var currentTemperature = Rand.Next(-30, 100);
var motorStatus = currentTemperature >= 50 ? "Good" : "Bad";
var telemetryDataPoint = new
{
deviceId = DeviceId,
temperature = currentTemperature,
filter = filterStatus,
motor = motorStatus,
usage = waterUsage,
tds = tdsLevel
};
var messageString = JsonConvert.SerializeObject(telemetryDataPoint);
var message = new Message(Encoding.UTF8.GetBytes(messageString));
message.ContentType= "application/json";
message.Properties.Add("Topic", "WaterUsage");
await _deviceClient.SendEventAsync(message);
Console.WriteLine("{0} > Sending message: {1}", DateTime.Now, messageString);
await Task.Delay(5000);
}
}
The output in Azure IoT Explorer is the following:
"body": {
"deviceId": "MyFirstDevice",
"temperature": 60,
"filter": "Bad",
"motor": "Good",
"usage": 302,
"tds": 457
},
"enqueuedTime": "Sun Jan 29 2023 13:55:51 GMT+0800 (Philippine Standard Time)",
"properties": {
"Topic": "WaterUsage"
}
}
I know what to filter in the Azure IoT Hub Message Routing to only filter out temperatures >= 50. The routing query: $body.body.temperature >= 50 does not work as shown below. Any idea on what should be the query?
I have used the following code which worked for me. Instead of using Encoding.UTF8.GetBytes, I have used Encoding.ASCII.GetBytes and explicitly set the ContentEncoding to UTF8 using the below code.
var messageString = JsonConvert.SerializeObject(telemetryDataPoint);
var message = new Message(Encoding.ASCII.GetBytes(messageString));
message.ContentEncoding = "utf-8";
message.ContentType = "application/json";
message.Properties.Add("Topic", "WaterUsage");
Even though the messages you notice in the Azure IoT explorer has the properties information, using Visual Studio Code's Start Monitoring Built-in end point option, you will notice the messages routed to the built in end point have a different format. Please refer the below images for details.
I have used the routing query $body.temperature >= 50 to route the messages to an end point. I could validate from the blob storage container end point that the messages received have the temperature greater than or equal to 50. Please find the below image of routed messages for reference

Audio Length is 0 and so is the results for DeepSpeech Example

I am following the deepspeech example for nodejs_wav and I keep getting the following result,
audio length 0
result:
The audio files are present as well. Here are the additional console output I get when I run the code using node index.js
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
2022-11-29 11:01:35.452488: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the
following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Code from sample
const DeepSpeech = require("deepspeech");
const Fs = require("fs");
const Sox = require("sox-stream");
const MemoryStream = require("memory-stream");
const Duplex = require("stream").Duplex;
const Wav = require("node-wav");
let modelPath = "./models/deepspeech-0.9.3-models.pbmm";
let model = new DeepSpeech.Model(modelPath);
let desiredSampleRate = model.sampleRate();
let scorerPath = "./models/deepspeech-0.9.3-models.scorer";
model.enableExternalScorer(scorerPath);
let audioFile = process.argv[2] || "./audio/2830-3980-0043.wav";
if (!Fs.existsSync(audioFile)) {
console.log("file missing:", audioFile);
process.exit();
}
const buffer = Fs.readFileSync(audioFile);
const result = Wav.decode(buffer);
if (result.sampleRate < desiredSampleRate) {
console.error(
"Warning: original sample rate (" +
result.sampleRate +
") is lower than " +
desiredSampleRate +
"Hz. Up-sampling might produce erratic speech recognition."
);
}
function bufferToStream(buffer) {
let stream = new Duplex();
stream.push(buffer);
stream.push(null);
return stream;
}
let audioStream = new MemoryStream();
bufferToStream(buffer)
.pipe(
Sox({
global: {
"no-dither": true,
},
output: {
bits: 16,
rate: desiredSampleRate,
channels: 1,
encoding: "signed-integer",
endian: "little",
compression: 0.0,
type: "raw",
},
})
)
.pipe(audioStream);
audioStream.on("finish", () => {
let audioBuffer = audioStream.toBuffer();
const audioLength = (audioBuffer.length / 2) * (1 / desiredSampleRate);
console.log("audio length", audioLength);
let result = model.stt(audioBuffer);
console.log("result:", result);
});
Any idea? I am looking into the tensor flow binary, but I wasn't 100% this would be causing this issue.

OfflineAudioContext processing takes increasingly longer in Safari

I am processing an audio buffer with an OfflineAudioContext with the following node layout:
[AudioBufferSourceNode] -> [AnalyserNode] -> [OfflineAudioContext]
This works very good on Chrome (106.0.5249.119) but on Safari 16 (17614.1.25.9.10, 17614) each time I run the analysis takes longer and longer. Both running on macOS.
What's curious is that I must quit Safari to "reset" the processing time.
I guess there's a memory leak?
Is there anything that I'm doing wrong in the JavaScript code that would cause Safari to not garbage collect?
async function processFrequencyData(
audioBuffer,
options
) {
const {
fps,
numberOfSamples,
maxDecibels,
minDecibels,
smoothingTimeConstant,
} = options;
const frameFrequencies = [];
const oc = new OfflineAudioContext({
length: audioBuffer.length,
sampleRate: audioBuffer.sampleRate,
numberOfChannels: audioBuffer.numberOfChannels,
});
const lengthInMillis = 1000 * (audioBuffer.length / audioBuffer.sampleRate);
const source = new AudioBufferSourceNode(oc);
source.buffer = audioBuffer;
const az = new AnalyserNode(oc, {
fftSize: numberOfSamples * 2,
smoothingTimeConstant,
minDecibels,
maxDecibels,
});
source.connect(az).connect(oc.destination);
const msPerFrame = 1000 / fps;
let currentFrame = 0;
function process() {
const frequencies = new Uint8Array(az.frequencyBinCount);
az.getByteFrequencyData(frequencies);
// const times = new number[](az.frequencyBinCount);
// az.getByteTimeDomainData(times);
frameFrequencies[currentFrame] = frequencies;
const nextTime = (currentFrame + 1) * msPerFrame;
if (nextTime < lengthInMillis) {
currentFrame++;
const nextTimeSeconds = (currentFrame * msPerFrame) / 1000;
oc.suspend(nextTimeSeconds).then(process);
}
oc.resume();
}
oc.suspend(0).then(process);
source.start(0);
await oc.startRendering();
return frameFrequencies;
}
const buttonsDiv = document.createElement('div');
document.body.appendChild(buttonsDiv);
const initButton = document.createElement('button');
initButton.onclick = init;
initButton.innerHTML = 'Load audio'
buttonsDiv.appendChild(initButton);
const processButton = document.createElement('button');
processButton.disabled = true;
processButton.innerHTML = 'Process'
buttonsDiv.appendChild(processButton);
const resultElement = document.createElement('pre');
document.body.appendChild(resultElement)
async function init() {
initButton.disabled = true;
resultElement.innerText += 'Loading audio... ';
const audioContext = new AudioContext();
const arrayBuffer = await fetch('https://gist.githubusercontent.com/marcusstenbeck/da36a5fc2eeeba14ae9f984a580db1da/raw/84c53582d3936ac78625a31029022c8fdb734b2a/base64audio.txt').then(r => r.text()).then(fetch).then(r => r.arrayBuffer())
resultElement.innerText += 'finished.';
resultElement.innerText += '\nDecoding audio... ';
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
resultElement.innerText += 'finished.';
processButton.onclick = async () => {
processButton.disabled = true;
resultElement.innerText += '\nStart processing... ';
const t0 = Date.now();
await processFrequencyData(audioBuffer, {
fps: 30,
numberOfSamples: 2 ** 13,
maxDecibels: -25,
minDecibels: -70,
smoothingTimeConstant: 0.2,
});
resultElement.innerText += `finished in ${Date.now() - t0} ms`;
processButton.disabled = false;
};
processButton.disabled = false;
}
I guess this is really a bug in Safari. I'm able to reproduce it by rendering an OfflineAudioContext without any nodes. As soon as I use suspend()/resume() every invocation takes a little longer.
I'm only speculating here but I think it's possible that there is some internal mechanism which tries to prevent the rapid back and forth between the audio thread and the main thread. It almost feels like one of those login forms which takes a bit longer to validate the password every time you try.
Anyway I think you can avoid using suspend()/resume() for your particular use case. It should be possible to create an OfflineAudioContext for each of the slices instead. In order to get the same effect you would only render the particular slice with each OfflineAudioContext.
const currentTime = 0;
while (currentTime < duration) {
const offlineAudioContext = new OfflineAudioContext({
length: LENGTH_OF_ONE_SLICE,
sampleRate
});
const audioBufferSourceNode = new AudioBufferSourceNode(
offlineAudioContext,
{
buffer
}
);
const analyserNode = new AnalyserNode(offlineAudioContext);
audioBufferSourceNode.start(0, currentTime);
audioBufferSourceNode
.connect(analyserNode)
.connect(offlineAudioContext.destination);
await offlineAudioContext.startRendering();
const frequencies = new Uint8Array(analyserNode.frequencyBinCount);
analyserNode.getByteFrequencyData(frequencies);
// do something with the frequencies ...
currentTime += LENGTH_OF_ONE_SLICE * sampleRate;
}
I think the only thing missing would be the smoothing since each of those slices will have it's own AnalyserNode.

Azure Container Apps restart serveral times even if my batch is terminated

I am implementing a batch in Azure Container Apps.
When a message comes from a queue in service bus then my batch is run.
For this, I added a scale rule to automatic scale when a message comes from the queue.
It works well, when there is a message it is scaled out from 0 to 1 replica. But when my batch is terminated, the replica restarts the container several times until it is scaled in to 0.
Here is my terraform script to create the container apps :
resource "azapi_resource" "container_app" {
name = var.container_app_name
location = "northeurope"
parent_id = data.azurerm_resource_group.resource_group.id
identity {
type = "UserAssigned"
identity_ids = [data.azurerm_user_assigned_identity.aca_identity.id]
}
type = "Microsoft.App/containerApps#2022-03-01"
body = jsonencode({
properties: {
managedEnvironmentId = data.azapi_resource.container_environment.id
configuration = {
secrets = [
{
name = "regitry-password"
value = data.azurerm_container_registry.acr.admin_password
},
{
name = "service-bus-connection-string"
value = data.azurerm_servicebus_namespace.servicebus.default_primary_connection_string
}
]
ingress = null
registries = [
{
server = data.azurerm_container_registry.acr.login_server
username = data.azurerm_container_registry.acr.admin_username,
passwordSecretRef = "regitry-password"
}]
}
template = {
containers = [{
image = "${data.azurerm_container_registry.acr.login_server}/${var.container_repository}:${var.container_image_tag}"
name = "dbt-instance"
resources = {
cpu = var.container_cpu
memory = var.container_memory
}
env = [
{
name = "APP_CONFIG_NAME"
value = var.app_configuration_name
},
{
name = "AZURE_CLIENT_ID"
value = data.azurerm_user_assigned_identity.aca_identity.client_id
}
]
}]
scale = {
minReplicas = 0
maxReplicas = 5
rules = [{
name = "queue-based-autoscaling"
custom = {
type = "azure-servicebus"
metadata = {
queueName = var.service_bus_queue_name
messageCount = "1"
}
auth = [{
secretRef = "service-bus-connection-string"
triggerParameter = "connection"
}]
}
}]
}
}
}
})
How to run my container only one time ?
I managed to do it with Azure Container Instance with the property "restartPolicy=Never"

Round Robin Group over two different hosts is not working

I am trying to split load over more than one akka actor system.
Unfortunately the round robin group in not forwarding messages to remote workers. I can see that actor is activated, but there is no work done.
the full code is on github
Is there any other setting that I could miss in my configuration?
private void CreateRemoteCrawlerGroup()
{
var hostname = "374110044f24";
var hostname2 = "25b360699a27";
var remoteAddress2 = Address.Parse($"akka.tcp://DeployTarget#{hostname2}:8090");
var remoteScope2 = new RemoteScope(remoteAddress2);
var remoteCrawler1 =
Context.ActorOf(
Props.Create(() => new WebCrawlerActor(new AppSettingsConfiguration(), Self))
.WithRouter(new RoundRobinPool(2)) // new DefaultResizer(1, 2, messagesPerResize: 500)
.WithDispatcher("my-dispatcher")
.WithDeploy(Deploy.None.WithScope(remoteScope2)), "a");
var remoteAddress = Address.Parse($"akka.tcp://DeployTarget#{hostname}:8090");
var remoteScope = new RemoteScope(remoteAddress);
var remoteCrawler2 =
Context.ActorOf(
Props.Create(() => new WebCrawlerActor(new AppSettingsConfiguration(), Self))
.WithRouter(new RoundRobinPool(2)) // new DefaultResizer(1, 2, messagesPerResize: 500)
.WithDispatcher("my-dispatcher")
.WithDeploy(Deploy.None.WithScope(remoteScope)), "remoteCrawler01");
var workers = new List<string> { remoteCrawler1.Path.ToString(), remoteCrawler2.Path.ToString() };
var router = Context.ActorOf(Props.Empty.WithRouter(new RoundRobinGroup(workers)), "some-group");
_actorDictionary.Add("WebCrawlerActor", router);
}
the solution was switch to akka cluster and use clustr pool instead
var remoteEcho2 =
Context.ActorOf(
Props.Create(() => new WebCrawlerActor(new AppSettingsConfiguration(), Self))
.WithRouter(new ClusterRouterPool(new RoundRobinPool(5), new ClusterRouterPoolSettings(5, 1, true, "crawler"))), "WebCrawlerActor2a");
_actorDictionary.Add("WebCrawlerActor", remoteEcho2);