Record Cpu usage per minute for particular process - system

I want to record Cpu usage ,cpu time ,VM size in notepad per minute for any particular process(not for all.Is there any way to this,because i work as a performance/stress tester and its my duty to take the cpu performance after at particular time and the script takes more time so it is some time inconvenient to me take the all reading
please suggest.
thank u.

Use performance monitor, if the Windows is the system that you are working on. It has all kinds of log options, and will do what you need.
Performance monitor gives you the option of recording the performance data for particular process. Look under 'process'...

Related

Technique to measure GPU utilization over a given period of time

We run an HPC cluster with GPUs. We would like to report the overall GPU utilization for the job. I know I can do it by periodically sampling in the background and doing the math. I was wondering if there was a tool where I could basically start the sampling period at the beginning of the job and then stop it at the end of the job and just have it report the overall average GPU utilization? For instance, AFAICT nvidia-smi will only do 1 second intervals. I am looking (hoping) for an option on it or a similar tool for start/stop functionality. Note that an arbitrary time period wont work unless I can end it early and get the results up that point as you never know how long the job will run. I would appreciate any pointers / ideas anyone could provide.

Recommended way of measuring execution time in Tensorflow Federated

I would like to know whether there is a recommended way of measuring execution time in Tensorflow Federated. To be more specific, if one would like to extract the execution time for each client in a certain round, e.g., for each client involved in a FedAvg round, saving the time stamp before the local training starts and the time stamp just before sending back the updates, what is the best (or just correct) strategy to do this? Furthermore, since the clients' code run in parallel, are such a time stamps untruthful (especially considering the hypothesis that different clients may be using differently sized models for local training)?
To be very practical, using tf.timestamp() at the beginning and at the end of #tf.function client_update(model, dataset, server_message, client_optimizer) -- this is probably a simplified signature -- and then subtracting such time stamps is appropriate?
I have the feeling that this is not the right way to do this given that clients run in parallel on the same machine.
Thanks to anyone can help me on that.
There are multiple potential places to measure execution time, first might be defining very specifically what is the intended measurement.
Measuring the training time of each client as proposed is a great way to get a sense of the variability among clients. This could help identify whether rounds frequently have stragglers. Using tf.timestamp() at the beginning and end of the client_update function seems reasonable. The question correctly notes that this happens in parallel, summing all of these times would be akin to CPU time.
Measuring the time it takes to complete all client training in a round would generally be the maximum of the values above. This might not be true when simulating FL in TFF, as TFF maybe decided to run some number of clients sequentially due to system resources constraints. In practice all of these clients would run in parallel.
Measuring the time it takes to complete a full round (the maximum time it takes to run a client, plus the time it takes for the server to update) could be done by moving the tf.timestamp calls to the outer training loop. This would be wrapping the call to trainer.next() in the snippet on https://www.tensorflow.org/federated. This would be most similar to elapsed real time (wall clock time).

SUMO sim time and real time difference

I used traci.simulation.getTime to get the current sim time of SUMO.
However, this time runs faster than real time.
For example, when sim time grow from 0-100, real time just grow from 0-20.
How can I make SUMO simulation time be the same with real time?
I tried --step-length = 1, but this didn't work
The --step-length property is a value in seconds describing the length of one simulation step. If you put a higher number here vehicles have less time to react, but your simulation probably runs faster.
For the real time issue you might have a look to the sumo-user mailinglist. I think the mail gives a pretty good answer to your issue:
the current limit to the real time factor is the speed of your
computer. If you want to slow the GUI down you can change the delay
value (which is measured in milliseconds) so a value of 100 would add
100ms to every simulation step (if you simulation is small and runs
with the default step length of 1s this means factor 10). If you want
to speed it up, run without GUI or buy a faster computer ;-).
To check how close your simulation is to wall clock time, you can check the generated output from SUMO. The thing you're looking for is called Real time factor

Same Length Sql on Local and Production

I have a sql local and on production servers which is of same length. When I test sql on local it takes about 2 seconds to run and when i run the same thing on production or server it takes about 7 seconds to run.
Why so much difference?
the primary factor responsible for variation of SQL response time (especially when running the same query a few times in a row) is caching. Actually, there may be several caching effects at play at the same time:
Code caching (next time you issue the same query you won't have to do the hard parse -- saves time and resources)
Data caching, first of all
a) database-level caching (use of buffer cache)
but also
b) OS-level caching
or even
c) hardware-level caching
You can determine what exactly is going on by enabling autotrace and analyzing its output. If the first time you see a lot of recursive calls, and none (or much less) subsequently, that tells you about code caching (cursor sharing preventing you from parsing every time). If the first time you see a lot of physical reads, but much fewer subsequently, then it's database buffer cache at play. If the number of physical reads stays the same, but elapsed time changes, then it could be low-level data caching (OS or hardware).
There are, of course, other factors that may affect elapsed time -- such as database workload -- but if you are observing this over a short period of time, then it's probably not them.
There are two more ways to get this fixed, what you do is run the proc through cache tables it will make it faster, or just indexed it

Algorithmically suggest best node to perform demanding computation

At work we perform demanding numerical computations.
We have a network of several Linux boxes with different processing capabilities. At any given time, there can be anywhere from zero to dozens of people connected to a given box.
I created a script to measure the MFLOPS (Million of Floating Point Operations per Second) using the Linpack Benchmark; it also provides number of cores and memory.
I would like to use this information together with the load average (obtained using the uptime command) to suggest the best computer for performing a demanding computation. In other words, its 3:00pm; I have a meeting in two hours; I need to run a demanding process: what node will get me the answer fastest?
I envision a script which will output a suggestion along the lines of:
SUGGESTED HOSTS (IN ORDER OF PREFERENCE)
HOST1.MYNETWORK
HOST2.MYNETWORK
HOST3.MYNETWORK
Such suggestion should favor fast computers (high MFLOPS) if the load average is low and, as load average increases for a given node, it should favor available nodes instead (i.e., I'd rather run in a slower computer with no users than in an eight-core with forty dudes logged in).
How should I prioritize? What algorithm (rationale) would you use? Again, what I have is:
Load Average (1min, 5min, 15min)
MFLOPS measure
Number of users logged in
RAM (installed and available)
Number of cores (important to normalize the load average)
Any thoughts? Thanks!
You don't have enough data to make an well-informed decision. It sounds as though the scheduling is very volatile: "At any given time, there can be anywhere from zero to dozens of people connected to a given box." So the current load does not necessarily reflect the future load of the machines.
To properly asses what hosts someone should use to minimize computation time would require knowing when the current jobs will terminate. If a powerful machine is about to be done doing most of its jobs, it would be a good candidate even though it currently has a high load.
If you want to guess purely on the current situation, you can do a weighed calculation to find out which hosts have the most MFLOPS available.
MFLOPS available = host's MFLOPS + (number of logical processors - load average)
Sort the hosts by MFLOPS available and suggest them in a descending order.
This formula assumes that the MFLOPS of a host is linearly related to its load average. This might not be exactly true, but it's probably fairly close.
I would favor the most recent load average since it's closer to the current/future situation, whereas, jobs from 15 minutes ago might have completed by now.
Have you considered a distributed approach to computation? Not all computations can be broken up such that more than one cpu can work on them. But perhaps your problem space can benefit from some parallelization. Have a look at Hadoop.
You don't need to know FLOPS. beowulf modules paralell computing center has I go to has the script for sure
PDC operates leading-edge, high-performance computers on a national level. PDC offers easily accessible computational resources that primarily cater to the ...