env:jdk8
VM flag:-Xms6144m -Xmx6144m -XX:+UseG1GC (other flags as default)
From the gc log, there is no mixed gc.Why the young gc decreasing the OU?
Related
Loading 1500 images of size (1000,1000,3) breaks the code and throughs kill 9 without any further error. Memory used before this line of code is 16% of system total memory. Total size of images direcotry is 7.1G.
X = np.asarray(images).astype('float64')
y = np.asarray(labels).astype('float64')
system spec is:
OS: macOS Catalina
processor: 2.2 GHz 6-Core Intel Core i7 16 GB 2
memory: 16 GB 2400 MHz DDR4
Update:
getting the bellow error while running the code on 32 vCPUs, 120 GB memory.
MemoryError: Unable to allocate 14.1 GiB for an array with shape (1200, 1024, 1024, 3) and data type float32
You would have to provide some more info/details for an exact answer but, assuming that this is a memory error(incredibly likely, size of the images on disk does not represent the size they would occupy in memory, so that is irrelevant. In 100% of all cases, the images in memory will occupy a lot more space due to pointers, objects that are needed and so on. Intuitively I would say that 16GB of ram is nowhere nearly enough to load 7GB of images. It's impossible to tell you how much you would need but from experience I would say that you'd need to bump it up to 64GB. If you are using Keras, I would suggest looking into the DirectoryIterator.
Edit:
As Cris Luengo pointed out, I missed the fact that you stated the size of the images.
I have a p3.2xlarge instance, I ran a couple of experiments on the instance, and now that I want to run a new experiment (deep learning) I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.78 GiB total capacity; 14.70 GiB already allocated; 34.44 MiB free; 14.76 GiB reserved in total by PyTorch)
I wonder if there's any way that I can free the allocated memories so that I can run my experiment? Is freeing the memories even the solution for such an error?
my problem is with Tensorflow and the usage of the CPU.
My System:
CPU => AMD FX 8320 (8 Cores á 3,5ghz) and 8 Threads
Grafik => GTX 970
RAM => 16Gb and i belive ddr3 2600
I want to run a A3C algorithm for Starcraft 2 (pysc2) on my pc what works fine but the usage of the cpu ist somewhat strange.
If i start the algorithm with 4 Workers i get something about 150k Steps in 1h
and all cpu's are used about 25-30%
If i start the same algorithm with 8 Workers i get something about 120k Steps in 1h and all cpu's are used about 25-30%
If i now start the algorithm with 4 workers twice i get each 150k steps 1h and the cpu usage is 60-70%
Why cant i start the algorithm with 8 Workers, get the double amount of steps in 1H and the cpu is used to 70%?
I am trying to build one horizontal scalability system based on Redis Cluster. So I've measured the throughput of Redis Cluster with different nodes. But finally, the measured result doesn't show the linear scalability as the cluster spec stated, “High performance and linear scalability up to 1000 nodes.”
redis cluster benchmark:
The image above shows the measure result of redis cluster of (3+3), (4+4), (5+5), (6+6), (8+8), (10+10) and (12+12). (3+3) means 3 master node plus 3 slave nodes. The result of C (reate) and you (update) don't show the linear scalability of redis cluster as following picture.
I'd like to know why these measured result don't show the linear scalability. Is there any possible reason to limit the scaling?
My test environment and related information are described as below
Server
HW: HP BL460c G9, 24 CPU (E5-2620 v3 #2.40GHz), 64G memory, 300G disk
I have two machines. In order to know the capacity of one HW machine, I run all master nodes on one machine and all slaves nodes on another machine. All redis nodes will be include in one Redis Cluster.
OS: SLES 12
I have updates some system settings to achieve higher performance.
echo 65535 > /proc/sys/net/core/somaxconn
echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog
echo never > /sys/kernel/mm/transparent_hugepage/enabled
sysctl vm.overcommit_memory=1
sysctl vm.swappiness=0
Furthermore, I've turned off the swap, which could cause very unstable throughput when AOF re-write happened even swappiness is already set to 0. As observed, 15 million records in my test will occupy around 48G memory.
Redis 3.0.6: To eliminate the burst caused by RDB, I turned off all RDB and only enable AOF. For other configurations in redis.conf, just left with default values.
Client
HW: HP DL380 G7, 16 CPU (E5620 #2.40GHz), 24G memory, 600G disk
OS: SLES 12
YCSB (0.6.0) with jedis (2.8.0)
I will use hash key to store all records (1 key and 21 fields) and N sorted sets to store all keys and its random scores. Here N is the number of master nodes in the cluster. N sorted sets will be distributed evenly in each master node.
The YCSB workload configuration is pasted below:
workload=com.yahoo.ycsb.workloads.CoreWorkload
recordcount=15000000
operationcount=150000000
insertstart=0
fieldcount=21
fieldlength=188
readallfields=true
writeallfields=false
fieldlengthdistribution=zipfian
readproportion=0.0
updateproportion=1.0
insertproportion=0
readmodifywriteproportion=0.0
scanproportion=0
maxscanlength=1000
scanlengthdistribution=uniform
insertorder=hashed
requestdistribution=zipfian
hotspotdatafraction=0.2
hotspotopnfraction=0.8
table=subscriber
measurementtype=histogram
histogram.buckets=1000
timeseries.granularity=1000
At most cases, the computer resource is enough in my view though the throughput already hit the limit.
CPU: there are much CPU left, 60~70% CPU idle
I/O usage: it's not so busy, it's 30~40% utility at peak time.
Memory: only memory could be exhausted almost at peak time, i.e. when AOF re-write happened. At most time it's around 80%.
In the neo4j-wrapper.conf file I see this:
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
#wrapper.java.initmemory=512
#wrapper.java.maxmemory=512
Does that mean that I should not worry about -Xms and -Xmx?
I've read elsewhere that -XX:ParallelGCThreads=4 -XX:+UseNUMA -XX:+UseConcMarkSweepGC would be good.
Should I add that on my Intel® Core™ i7-4770 Quad-Core Haswell 32 GB DDR3 RAM 2 x 240 GB 6 Gb/s SSD (Software-RAID 1) machine?
I would still configure it manually.
Set both to 12 GB and use the remaining 16GB for memory mapping in neo4j.properties. Try to match it to you store file sizes