Cannot see Cache level data movement in Gem5 simulations - gem5

I am using the following CLI:
M5_PATH=/home/febin/Storage/Gem5/gem5ist/m5/system/ Gem5/gem5/build/X86/gem5.opt --debug-flags=Cache,Exec,DRAM,TLB Gem5/gem5/configs/example/fs.py --kernel x86_64-vmlinux-2.6.22.9 --num-cpus=64 --num-dirs=64 --caches --elastic-trace-en --num-l2caches=16 --ruby --network=garnet2.0 --topology=Mesh_XY --mesh-rows=8 --command-line="paper3/Blackscholes/blackscholes.out 1 paper3/Blackscholes/in_16.txt paper3/Blackscholes/output.txt" >> paper3/Gem5_fs
I am able to see Exec, DRAM and TLB traces; but I cannot see any data from Cache. Same for SE simulations why is this ?

As mentioned by Daniel, you have to use --debug-flags RubyCache for ruby.
The flag is different because Ruby models the caches itself separately from the classic system.

Related

How to get information on latest successful pod deployment in OpenShift 3.6

I am currently working on making a CICD script to deploy a complex environment into another environment. We have multiple technology involved and I currently want to optimize this script because it's taking too much time to fetch information on each environment.
In the OpenShift 3.6 section, I need to get the last successful deployment for each application for a specific project. I try to find a quick way to do so, but right now I only found this solution :
oc rollout history dc -n <Project_name>
This will give me the following output
deploymentconfigs "<Application_name>"
REVISION STATUS CAUSE
1 Complete config change
2 Complete config change
3 Failed manual change
4 Running config change
deploymentconfigs "<Application_name2>"
REVISION STATUS CAUSE
18 Complete config change
19 Complete config change
20 Complete manual change
21 Failed config change
....
I then take this output and parse each line to know which is the latest revision that have the status "Complete".
In the above example, I would get this list :
<Application_name> : 2
<Application_name2> : 20
Then for each application and each revision I do :
oc rollout history dc/<Application_name> -n <Project_name> --revision=<Latest_Revision>
In the above example the Latest_Revision for Application_name is 2 which is the latest complete revision not building and not failed.
This will give me the output with the information I need which is the version of the ear and the version of the configuration that was used in the creation of the image use for this successful deployment.
But since I have multiple application, this process can take up to 2 minutes per environment.
Would anybody have a better way of fetching the information I required?
Unless I am mistaken, it looks like there are no "one liner" with the possibility to get the information on the currently running and accessible application.
Thanks
Assuming that the currently active deployment is the latest successful one, you may try the following:
oc get dc -a --no-headers | awk '{print "oc rollout history dc "$1" --revision="$2}' | . /dev/stdin
It gets a list of deployments, feeds it to awk to extract the name $1 and revision $2, then compiles your command to extract the details, finally sends it to standard input to execute. It may be frowned upon for not using xargs or the like, but I found it easier for debugging (just drop the last part and see the commands printed out).
UPDATE:
On second thoughts, you might actually like this one better:
oc get dc -a -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.spec.template.spec.containers[0].env}{"\n\t"}{.spec.template.spec.containers[0].image}{"\n-------\n"}{end}'
The example output:
daily-checks
[map[name:SQL_QUERIES_DIR value:daily-checks/]]
docker-registry.default.svc:5000/ptrk-testing/daily-checks#sha256:b299434622b5f9e9958ae753b7211f1928318e57848e992bbf33a6e9ee0f6d94
-------
jboss-webserver31-tomcat
registry.access.redhat.com/jboss-webserver-3/webserver31-tomcat7-openshift#sha256:b5fac47d43939b82ce1e7ef864a7c2ee79db7920df5764b631f2783c4b73f044
-------
jtask
172.30.31.183:5000/ptrk-testing/app-txeq:build
-------
lifebicycle
docker-registry.default.svc:5000/ptrk-testing/lifebicycle#sha256:a93cfaf9efd9b806b0d4d3f0c087b369a9963ea05404c2c7445cc01f07344a35
You get the idea, with expressions like .spec.template.spec.containers[0].env you can reach for specific variables, labels, etc. Unfortunately the jsonpath output is not available with oc rollout history.
UPDATE 2:
You could also use post-deployment hooks to collect the data, if you can set up a listener for the hooks. Hopefully the information you need is inherited by the PODs. More info here: https://docs.openshift.com/container-platform/3.10/dev_guide/deployments/deployment_strategies.html#lifecycle-hooks

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs.
Thanks.
You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.
You can check the environment variables SLURM_STEP_GPUS or SLURM_JOB_GPUS for a given node:
echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}
Note CUDA_VISIBLE_DEVICES may not correspond to the real value (see #isarandi's comment).
Also, note this should work for non-Nvidia GPUs as well.
Slurm stores this information in an environment variable, SLURM_JOB_GPUS.
One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:
set | grep SLURM | while read line; do echo "# $line"; done

Is it possible to query data from Whisper (Graphite DB) from console?

I have configured Graphite to monitor my application metrics. And I configured Zabbix to monitor my servers CPU and other metrics.
Now I want to pass some critical Graphite metrics to Zabbix to add triggers for them.
So I want to do something like
$ whisper get prefix1.prefix2.metricName
> 155
Is it possible?
P.S. I know about Graphite-API project, I don't want to install extra app.
You can use the whisper-fetch program which is provided in the whisper installation package.
Use it like this:
whisper-fetch /path/to/dot.wsp
Or to get e.g. data from the last 5 minutes:
whisper-fetch --from=$(date +%s -d "-5 min") /path/to/dot.wsp
Defaults will result in output like this:
1482318960 21.187000
1482319020 None
1482319080 21.187000
1482319140 None
1482319200 21.187000
You can change it to json using the --json option.
OK! I found it myself: http://graphite.readthedocs.io/en/latest/render_api.html?highlight=rawJson (I can use curl and return csv or json).
Answer was found here custom querying in graphite
Also see: https://github.com/graphite-project/graphite-web/blob/master/docs/render_api.rst

Mono human readable GC statistics in runtime

Is there a Mono profiler mode similar to Java -Xloggc?
I would like to see a human readable GC report while my application is running. Currently Mono can be run with --profile=log option but the output is in binary format and every time I need to run mprof-report to read it. The output file also contains a lot of info which is not interesting for me.
I tried to reduce the file size by specifying heapshot=14400000ms to collect statistics every few hours but it didn't help a lot. In a week I had few gigabytes log.
I also tried to use "sample" profiler but the overhead was too much.
You can use Mono's trace filters for this. Just set the MONO_LOG_MASK to gc and lower the MONO_LOG_LEVEL. Then run your app normally and you will get the human-readable GC statistics while your app is running:
$ export MONO_LOG_MASK=gc
$ export MONO_LOG_LEVEL=debug
$ mono ... # run your application normally ..
...
# notice the human readable GC output
mono: GC_MAJOR: (LOS overflow) pause 26.00ms, total 26.06ms, bridge 0.00ms major 31472K/0K los 1575K/0K
Mono: GC_MINOR: (Nursery full) pause 2.30ms, total 2.35ms, bridge 0.00ms promoted 31456K major 31456K los 5135K
Mono: GC_MINOR: (Nursery full) pause 2.43ms, total 2.45ms, bridge 0.00ms promoted 31456K major 31456K los 8097K
Mono: GC_MINOR: (Nursery full) pause 1.80ms, total 1.82ms, bridge 0.00ms promoted 31472K major 31472K los 11425K

Hadoop jobs getting poor locality

I have some fairly simple Hadoop streaming jobs that look like this:
yarn jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar \
-files hdfs:///apps/local/count.pl \
-input /foo/data/bz2 \
-output /user/me/myoutput \
-mapper "cut -f4,8 -d," \
-reducer count.pl \
-combiner count.pl
The count.pl script is just a simple script that accumulates counts in a hash and prints them out at the end - the details are probably not relevant but I can post it if necessary.
The input is a directory containing 5 files encoded with bz2 compression, roughly the same size as each other, for a total of about 5GB (compressed).
When I look at the running job, it has 45 mappers, but they're all running on one node. The particular node changes from run to run, but always only one node. Therefore I'm achieving poor data locality as data is transferred over the network to this node, and probably achieving poor CPU usage too.
The entire cluster has 9 nodes, all the same basic configuration. The blocks of the data for all 5 files are spread out among the 9 nodes, as reported by the HDFS Name Node web UI.
I'm happy to share any requested info from my configuration, but this is a corporate cluster and I don't want to upload any full config files.
It looks like this previous thread [ why map task always running on a single node ] is relevant but not conclusive.
EDIT: at #jtravaglini's suggestion I tried the following variation and saw the same problem - all 45 map jobs running on a single node:
yarn jar \
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar \
wordcount /foo/data/bz2 /user/me/myoutput
At the end of the output of that task in my shell, I see:
Launched map tasks=45
Launched reduce tasks=1
Data-local map tasks=18
Rack-local map tasks=27
which is the number of data-local tasks you'd expect to see on one node just by chance alone.