Issues related to checkPointPageBufferSize and WalAutoArchiveAfterInactivity in Ignite

Issues related to checkPointPageBufferSize and WalAutoArchiveAfterInactivity in Ignite - ignite

I have set checkPointPageBufferSize to 0. It is said that if 0 is mentioned then it calculates automatically. When I saw the logs it prints 0 so what is value of checkPointPageBufferSize which is calculated automatically. I am using Ignite 2.9.0.
Can anyone help me to solve this?

If you explicitly use 0 as the checkPointPageBufferSize or just leave as is then it will be defaulted to a value which is function of the data region size.
less than 1GB - min (256MB, Data_Region_Size)
between 1GB and 8GB - Data_Region_Size / 4
more than 8GB - 2GB
More details could be found in the documentation.

Related

Spark - Failed to load collect frame - "RetryingBlockFetcher - Exception while beginning fetch"

We have a Scala Spark application, that reads something like 70K records from the DB to a data frame, each record has 2 fields.
After reading the data from the DB, we make minor mapping and load this as a broadcast for later usage.
Now, in local environment, there is an exception, timeout from the RetryingBlockFetcher while running the following code:
dataframe.select("id", "mapping_id")
.rdd.map(row => row.getString(0) -> row.getLong(1))
.collectAsMap().toMap
The exception is:
2022-06-06 10:08:13.077 task-result-getter-2 ERROR
org.apache.spark.network.shuffle.RetryingBlockFetcher Exception while
beginning fetch of 1 outstanding blocks
java.io.IOException: Failed to connect to /1.1.1.1:62788
at
org.apache.spark.network.client.
TransportClientFactory.createClient(Transpor .tClientFactory.java:253)
at
org.apache.spark.network.client.
TransportClientFactory.createClient(TransportClientFactory.java:195)
at
org.apache.spark.network.netty.
NettyBlockTransferService$$anon$2.
createAndStart(NettyBlockTransferService.scala:122)
In the local environment, I simply create the spark session with local "spark.master"
When I limit the max of records to 20K, it works well.
Can you please help? maybe I need to configure something in my local environment in order that the original code will work properly?
Update:
I tried to change a lot of Spark-related configurations in my local environment, both memory, a number of executors, timeout-related settings, and more, but nothing helped! I just got the timeout after more time...
I realized that the data frame that I'm reading from the DB has 1 partition of 62K records, while trying to repartition with 2 or more partitions the process worked correctly and I managed to map and collect as needed.
Any idea why this solves the issue? Is there a configuration in the spark that can solve this instead of repartition?
Thanks!

JVM Runtime.availableProcessors() returns 2 when it should be 4

I'm running openjdk11 on alpine linux in a container in an AWS EKS cluster.
The application determines the size of a threadpool based on the number of CPUs as returned by Runtime.getRuntime().availableProcessors()
This call is returning 2 processors even though the container shows that 4 CPUs are available:
# cat /proc/cpuinfo | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
Any idea why and how to solve the problem?
Update
Doing some more digging (prompted by some great questions from #gohm'c in the comments), I found a way to add some trace log prints to the JVM with -Xlog:os+container=trace
[0.001s][trace][os,container] CPU Shares is: 1536
[0.001s][trace][os,container] CPU Share count based on shares: 2
Now, I defined in resources.requests.cpu: "1500m".
I don't know why the slight discrepancy but I changed the value of the CPU request, and indeed the CPU Shares in the log trace changes accordingly.
I understand how the resources.limits.cpu value could affect the CPUs that the JVM sees. But why is the resources.requests.cpu value doing that! This seems like a bug to me? Any thoughts?

How to change the default number of pods per node?

I'm playing around with AWS EKS and I've set up a cluster with a t3.small node. Somehow the number of allocatable pods is set to 8:
Allocatable:
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1902056Ki
pods: 8
It seems to me that the resource request on the node is on the low side
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
310m (15%) 0 (0%) 140Mi (7%) 340Mi (18%)
and so I'd like to ask:
Is 8 a default value?
How can I change that?
Thanks.

The calculation of the max number of pods in a given node in AWS is defined by this formula:
Max Pods = (Maximum Network Interfaces for instance type) * (IPv4 Addresses per Interface) - 1
Since you used t3.small, the value comes down to 3 * 4 - 1 = 11. The number 11 also depends upon the compute resources you're associating with each pod.
The list of network interfaces are listed here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI

Aerospike cluster not clean available blocks

we use aerospike in our projects and caught strange problem.
We have a 3 node cluster and after some node restarting it stop working.
So, we make test to explain our problem
We make test cluster. 3 node, replication count = 2
Here is our namespace config
namespace test{
replication-factor 2
memory-size 100M
high-water-memory-pct 90
high-water-disk-pct 90
stop-writes-pct 95
single-bin true
default-ttl 0
storage-engine device {
cold-start-empty true
file /tmp/test.dat
write-block-size 1M
}
We write 100Mb test data after that we have that situation
available pct equal about 66% and Disk Usage about 34%
All good :slight_smile:
But we stopped one node. After migration we see that available pct = 49% and disk usage 50%
Return node to cluster and after migration we see that disk usage became previous about 32%, but available pct on old nodes stay 49%
Stop node one more time
available pct = 31%
Repeat one more time we get that situation
available pct = 0%
Our cluster crashed, Clients get AerospikeException: Error Code 8: Server memory error
So how we can clean available pct?

If your defrag-q is empty (and you can see whether it is from grepping the logs) then the issue is likely to be that your namespace is smaller than your post-write-queue. Blocks on the post-write-queue are not eligible for defragmentation and so you would see avail-pct trending down with no defragmentation to reclaim the space. By default the post-write-queue is 256 blocks and so in your case that would equate to 256Mb. If your namespace is smaller than that you will see avail-pct continue to drop until you hit stop-writes. You can reduce the size of the post-write-queue dynamically (i.e. no restart needed) using the following command, here I suggest 8 blocks:
asinfo -v 'set-config:context=namespace;id=<NAMESPACE>;post-write-queue=8'
If you are happy with this value you should amend your aerospike.conf to include it so that it persists after a node restart.

Aerospike: Failed to store record. Error: (13L, 'AEROSPIKE_ERR_RECORD_TOO_BIG', 'src/main/client/put.c', 106)

I get the following error while storing the data to aerospike ( client.put ). I have enough space on the drive.
Aerospike: Failed to store record. Error: (13L, 'AEROSPIKE_ERR_RECORD_TOO_BIG', 'src/main/client/put.c', 106).
Here is my Aerospike server namespace configuration
namespace test {
replication-factor 1
memory-size 1G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/test.dat
filesize 2G
data-in-memory true # Store data in memory in addition to file.
}
}

By default namespaces have a write-block-size of 1 MiB. This is also the maximum configurable size and will limit the max object size the application is able to write to Aerospike.
If you need to go beyond 1 MiB see Large Data Types as a possible solution.
UPDATE 2019/09/06
Since Aerospike 3.16, the write-block-size limit has been increased from 1 MiB to 8 MiB.

Yes, but unfortunately, Aerospike has deprecated LDT (https://www.aerospike.com/blog/aerospike-ldt/). They now recommend to use Lists or Maps, but as stated in their post:
"the new implementation does not solve the problem of the 1MB Aerospike database row size limit. A future key feature of the product will be an enhanced implementation that transcends the 1MB limit for a number of types"
In other terms, it is still an unsolved problem when storing your data on SSD or HDD. However, you can store larger data on memory namespaces.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas