Physical CPU in AIX - aix

Can someone let me know Why the number of physical CPU's is greater than the number of virtual CPU's in AIX.
Online Virtual CPUs : 8,
Active Physical CPUs in system : 48,
Desired Virtual CPUs : 8
Partition Number : 30
Type : Shared-SMT-4
Mode : Uncapped
Entitled Capacity : 0.80
Partition Group-ID : 32798
Shared Pool ID : 0
**Online Virtual CPUs : 8**
Maximum Virtual CPUs : 160
Minimum Virtual CPUs : 1
Online Memory : 84992 MB
Maximum Memory : 127488 MB
Minimum Memory : 256 MB
Variable Capacity Weight : 128
Minimum Capacity : 0.10
Maximum Capacity : 16.00
Capacity Increment : 0.01
Maximum Physical CPUs in system : 48
**Active Physical CPUs in system : 48**
Active CPUs in Pool : 48
Shared Physical CPUs in system : 48
Maximum Capacity of Pool : 4800
Entitled Capacity of Pool : 1190
Unallocated Capacity : 0.00
Physical CPU Percentage : 10.00%
Unallocated Weight : 0
Memory Mode : Dedicated
Total I/O Memory Entitlement : -
Variable Memory Capacity Weight : -
Memory Pool ID : -
Physical Memory in the Pool : -
Hypervisor Page Size : -
Unallocated Variable Memory Capacity Weight: -
Unallocated I/O Memory entitlement : -
Memory Group ID of LPAR : -
**Desired Virtual CPUs : 8**
Desired Memory : 84992 MB
Desired Variable Capacity Weight : 128
Desired Capacity : 0.80
Target Memory Expansion Factor : -
Target Memory Expansion Size : -
Power Saving Mode : Disabled
Sub Processor Mode : -

Your "Entitled Capacity" is 0.8. And the each fraction of a single processor equals 0.1 of one physical processor. So you get 8 virtual processors. Here you can get more information about this:
What is the capacity entitlement?
Physical processors are presented to a logical partition's operating
systems as virtual processors. Physical processors are virtualized
into portions or fractions. Each fraction of a single processor equals
0.1 of one processor. There is an additional fraction of 0.01 The number of cores assigned to a partition is represented by the Capacity
Entitlement. To display the assigned capacity entitlement for a shared
partition use the command # lparstat|awk -F "ent=" '/ent\=/ {print
$NF}' The output will be the number of processorsthis partition is
entitled to use. This is the upper threshold the partition can have
from the processor pool (Capped mode). The partition can use more than
the assigned capacity entitlement (Uncapped mode). Capped and uncapped
modes details will be illustrated later in this document. The number
of virtual processors and processing units that are assigned to a
partition can be changed through the HMC.
Capacity Entitlement considerations:
Capacity entitlement should be correctly configured for normal
production operation and to cover workload during peak time. Having
enough capacity entitlement is important so as not to impact operating
system performance and processor affinity. Running over entitled
capacity can cause bad affinity and noticeable performance degradation
affecting business operation.
Virtual Processors:
A virtual processor is a representation of a physical processor core
to the operating system of a partition that uses shared processors. It
is the number of physical processors that the logical partition can
spread out across. It represents the upper threshold for the number of
physical processors that can be used. We recommend not to increase the
ratio between the virtual processors to entitled capacity to more than
1.6 Each partition has its own assigned virtual processors. The partition will work only on the virtual processors needed for its
workload. The unneeded virtual processors assigned to a partition will
fold away using processor folding feature. To display the current
assigned virtual processors use the command # lparstat -i | grep -i
"Desired Virtual CPUs" Using an HMC, you can change the number of
virtual processors and processing units that are assigned to the
partition.

The given Physical CPU is the number of Physical CPUs installed on the power machine where this lpar is hosted. The Virtual CPU is the allocated Virtual CPU to this particular LPAR.
Also Desired Virtual CPU and Online Virtual CPU are the same thing.

Related

how to get the current consumed cpu % of vmhost in vcenter using powershell

How to get the current consumed CPU% of vmhost in vcenter using powershell script.
Below command doesn't gives similar output what we checked manually.
Get-Stat -Entity $command1 -Stat cpu.usagemhz.average -Realtime -MaxSamples 1
Get-Stat -Entity $myHost -Stat cpu.usage.average -Realtime -MaxSamples 1 -Instance ""
From VMware's doc on this cpu usage perf counter:
Actively used CPU, as a percentage of the total available CPU, for
each physical CPU on the host. Active CPU is approximately equal to
the ratio of the used CPU to the available CPU.
Available CPU = # of physical CPUs × clock rate.
100% represents all CPUs on the host. For example, if a four-CPU host
is running a virtual machine with two CPUs, and the usage is 50%, the
host is using two CPUs completely
Explanations from Luc Dekens around the -Instance filter...
If the ESX/ESXi server is equipped with a quadcore CPU, there will be
four instances: 0, 1, 2 and 3. In this case the instance corresponds
with the numeric position within the CPU core
And there will be a so-called aggregate, which is the metric averaged
over all the instances.
These instances each get their own identifier which will be part of
the returned statistical data. The aggregate instance is always
represented by a blank identifier.
...and -MaxSamples
Although I asked for 1 sample (-MaxSamples 1) the cmdlet returned 9
values. The -MaxSamples parameter apparently only looks at the
Timestamp. It doesn’t count the number of returned values

Opencl Maximum Size of Private memory per Work Item

I Have an AMD RX 570 4G,
Opencl tells me that I can use a Maximum of 256 Workgroup and 256 WorkItem per group...
Let's say I use all 256 Workgroup with 256 WorkItem in each of them,
Now, What is the Maximum Size of private memory per work item?
Is Private memory Equal to Total VRAM(4GB) Divided by Total Work Items(256x256)?
Or is it equal to Cache if so, how?
VRAM is represented in OpenCL as global memory.
Private memory is initially allocated from the register file. Your RX 570 is from AMD's Polaris architecture, a.k.a. GCN 4 where each compute unit (64 shader processors) has access to 256 vector (SIMD) registers (64x32 bits wide) and 512 32-bit scalar registers. So that works out to about 66KiB per CU, but it's not as simple as just quoting that total.
A workgroup will always be scheduled on a single compute unit, so if you assign it 256 work items, then it will have to perform every vector instruction 4 times in sequence (64 x 4 = 256) and the vector registers will (simplifying slightly) effectively have to be treated as 64 256-entry registers.
Scalar registers are used for data and calculations which are identical on each work item, e.g. incrementing a loop counter, holding buffer base pointers, etc.
Private memory will usually spill to global if you use more than will fit in your register file. So performance simply drops.
So essentially, on GCN, your optimal workgroup size is usually 64. Use as little private memory as possible; definitely aim for less than half of the available register file so that more than one workgroup can be scheduled so latency from memory access can be papered over, otherwise your shader cores will be spending a lot of time just waiting for data to arrive or be written out.
Cache is used for OpenCL local and constant memory spaces. (Constant will again spill to global if you try to use too much. The size of local memory can be checked via the OpenCL API and again is divided among workgroups scheduled on the same compute unit, so if you use more than half, only one group can run on a CU, etc.)
I don't know where you're getting a limit of 256 workgroups from, the limit is essentially set by whether the GPU uses 32-bit or 64-bit addressing. Most applications won't get close to 4bn work items even in the 32-bit case.
Private memory space is registers on the GPU die (0 cycle access latency) and not related to the amount of VRAM (global memory space) at all. The amount of private memory depends on the device (private memory per compute unit).
I don't know private memory size for the RX 570, but for older HD7000 series GPUs it is 256kB per CU. If you have a work group size of 256, you get 1kB per work item, which is equal to 256 float variables.
Cache size determines the size of local and constant memory space.

GTX 970 bandwidth calculation

I am trying to calculate the theoretical bandwidth of gtx970. As per the specs given in:-
http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-970/specifications
Memory clock is 7Gb/s
Memory bus width = 256
Bandwidth = 7*256*2/8 (*2 because it is a DDR)
= 448 GB/s
However, in the specs it is given as 224GB/s
Why is there a factor 2 difference? Am i making a mistake, if so please correct me.
Thanks
The 7 Gbps seems to be the effective clock, i.e. including the data rate. Also note that the field explanation for this Wikipedia list says that "All DDR/GDDR memories operate at half this frequency, except for GDDR5, which operates at one quarter of this frequency", which suggests that all GDDR5 chips are in fact quad data rate, despite the DDR abbreviation.
Finally, let me point out this note from Wikipedia, which disqualifies the trivial effective clock * bus width formula:
For accessing its memory, the GTX 970 stripes data across 7 of its 8 32-bit physical memory lanes, at 196 GB/s. The last 1/8 of its memory (0.5 GiB on a 4 GiB card) is accessed on a non-interleaved solitary 32-bit connection at 28 GB/s, one seventh the speed of the rest of the memory space. Because this smaller memory pool uses the same connection as the 7th lane to the larger main pool, it contends with accesses to the larger block reducing the effective memory bandwidth not adding to it as an independent connection could.
The clock rate reported is an "effective" clock rate and already takes into account the transfer on both rising and falling edges. The trouble is the factor of 2 for DDR.
Some discussion on devtalk here: https://devtalk.nvidia.com/default/topic/995384/theoretical-bandwidth-vs-effective-bandwidth/
In fact, your format is correct, but the memory clock is wrong. GeForce GTX 970's memory clock is 1753MHz(refers to https://www.techpowerup.com/gpu-specs/geforce-gtx-970.c2620).

Why no linear scaling of Redis Cluster

I am trying to build one horizontal scalability system based on Redis Cluster. So I've measured the throughput of Redis Cluster with different nodes. But finally, the measured result doesn't show the linear scalability as the cluster spec stated, “High performance and linear scalability up to 1000 nodes.”
redis cluster benchmark:
The image above shows the measure result of redis cluster of (3+3), (4+4), (5+5), (6+6), (8+8), (10+10) and (12+12). (3+3) means 3 master node plus 3 slave nodes. The result of C (reate) and you (update) don't show the linear scalability of redis cluster as following picture.
I'd like to know why these measured result don't show the linear scalability. Is there any possible reason to limit the scaling?
My test environment and related information are described as below
Server
HW: HP BL460c G9, 24 CPU (E5-2620 v3 #2.40GHz), 64G memory, 300G disk
I have two machines. In order to know the capacity of one HW machine, I run all master nodes on one machine and all slaves nodes on another machine. All redis nodes will be include in one Redis Cluster.
OS: SLES 12
I have updates some system settings to achieve higher performance.
echo 65535 > /proc/sys/net/core/somaxconn
echo 65535 > /proc/sys/net/ipv4/tcp_max_syn_backlog
echo never > /sys/kernel/mm/transparent_hugepage/enabled
sysctl vm.overcommit_memory=1
sysctl vm.swappiness=0
Furthermore, I've turned off the swap, which could cause very unstable throughput when AOF re-write happened even swappiness is already set to 0. As observed, 15 million records in my test will occupy around 48G memory.
Redis 3.0.6: To eliminate the burst caused by RDB, I turned off all RDB and only enable AOF. For other configurations in redis.conf, just left with default values.
Client
HW: HP DL380 G7, 16 CPU (E5620 #2.40GHz), 24G memory, 600G disk
OS: SLES 12
YCSB (0.6.0) with jedis (2.8.0)
I will use hash key to store all records (1 key and 21 fields) and N sorted sets to store all keys and its random scores. Here N is the number of master nodes in the cluster. N sorted sets will be distributed evenly in each master node.
The YCSB workload configuration is pasted below:
workload=com.yahoo.ycsb.workloads.CoreWorkload
recordcount=15000000
operationcount=150000000
insertstart=0
fieldcount=21
fieldlength=188
readallfields=true
writeallfields=false
fieldlengthdistribution=zipfian
readproportion=0.0
updateproportion=1.0
insertproportion=0
readmodifywriteproportion=0.0
scanproportion=0
maxscanlength=1000
scanlengthdistribution=uniform
insertorder=hashed
requestdistribution=zipfian
hotspotdatafraction=0.2
hotspotopnfraction=0.8
table=subscriber
measurementtype=histogram
histogram.buckets=1000
timeseries.granularity=1000
At most cases, the computer resource is enough in my view though the throughput already hit the limit.
CPU: there are much CPU left, 60~70% CPU idle
I/O usage: it's not so busy, it's 30~40% utility at peak time.
Memory: only memory could be exhausted almost at peak time, i.e. when AOF re-write happened. At most time it's around 80%.

Does Neo4j calculate JVM heap on Ubuntu?

In the neo4j-wrapper.conf file I see this:
# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size in MB.
#wrapper.java.initmemory=512
#wrapper.java.maxmemory=512
Does that mean that I should not worry about -Xms and -Xmx?
I've read elsewhere that -XX:ParallelGCThreads=4 -XX:+UseNUMA -XX:+UseConcMarkSweepGC would be good.
Should I add that on my Intel® Core™ i7-4770 Quad-Core Haswell 32 GB DDR3 RAM 2 x 240 GB 6 Gb/s SSD (Software-RAID 1) machine?
I would still configure it manually.
Set both to 12 GB and use the remaining 16GB for memory mapping in neo4j.properties. Try to match it to you store file sizes