Timestamp in telemetry_view.csv file (generated by jprofiler) - jprofiler

I profile some application with the sampling method with a delay of 5 ms (jprofiler 7):
JProfiler> Offline profiling mode.
JProfiler> Protocol version 39
...
JProfiler> Using sampling (5 ms)
JProfiler> Time measurement: elapsed time
JProfiler> CPU profiling enabled
JProfiler> Saving snapshot memory_view.0.jps ...
JProfiler> Done.
Then I exported the result into a csv file:
"Time [s]","Committed size","Free size","Used size"
0.0,38912000,26542000,12370000
1.0,38912000,21710000,17202000
2.0,38912000,10829000,28083000
3.0,55168000,28363000,26805000
Here I am surprised since the csv file contains one measure per second while the sampling was done every 5 ms.
Does exist a way to increase the number of outputs in the csv file?
Thanks for your help.

You have exported a telemetry view. Telemetry views are always sampled with a 1s interval. The sampling interval for CPU profiling is only relevant for call stack measurements and the CPU views.

Related

h2o task taking unexpectedly long leading to it getting stuck

I successfully initialise a cluster and train a DRF model. Then on the same cluster I try do a grid search for an XGBoost model.
H2OGridSearch(
H2OXGBoostEstimator(my_model_params),
hyper_params=my_grid_params,
search_criteria=my_search_criteria
)
Sometimes (not always) the grid search never finishes. Upon inspection in the H2O flow I found the job stuck at 0% progress with a 'RUNNING' status.
What I saw in the logs is the following
WARN: XGBoost task of type 'Boosting Iteration (tid=0)' is taking unexpectedly long, it didn't finish in 360 seconds.
WARN: XGBoost task of type 'Boosting Iteration (tid=0)' is taking unexpectedly long, it didn't finish in 420 seconds.
...
WARN: XGBoost task of type 'Boosting Iteration (tid=0)' is taking unexpectedly long, it didn't finish in 60240 seconds.
and after that I get
ERRR: water.api.HDFSIOException: HDFS IO Failure:
but the job's status is still 'RUNNING'.
I'm using h2o 3.30.0.6 via Python 3.7.
The problem is that the error is not reproducible and sometimes it just works fine.
Any hints on how to track down the root cause?
Is there a parameter I can set for killing the whole job when a boosting iteration takes too long?
For XGBoost, if it becomes unresponsive, you may need to allocate additional memory for it, since it uses memory independent of H2O (algortihms)
Why does my H2O cluster on Hadoop became unresponsive when running XGBoost even when I supplied 4 times the datasize memory?
This is why the extramempercent option exists, and we recommend setting this to a high value, such as 120. What happens internally is that when you specify -node_memory 10G and -extramempercent 120, the h2o driver will ask Hadoop for 10𝐺∗(1+1.2)=22𝐺 of memory. At the same time, the h2o driver will limit the memory used by the container JVM (the h2o node) to 10G, leaving the 10𝐺∗120 memory “unused.” This memory can be then safely used by XGBoost outside of the JVM. Keep in mind that H2O algorithms will only have access to the JVM memory (10GB), while XGBoost will use the native memory for model training. For example:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 20g -extramempercent 120
Source

ignite write big data in a pressure test, io write and read time tow high?

I hava some write and find in ignite with sql.When I have a pressure test with 100 persons, ignite server's cpu is low, but iowait is two heigh, and io write and read time is two heigh. So I have some ways to reduce iowait?
I use 2.7.6 version with two SSD machines as cluster.
this is iowait time picture
[1]: https://i.stack.imgur.com/P0bka.png
[2]: https://i.stack.imgur.com/hyfv1.png
this is iowait cpu picture
In persistent scenarios, you are bound by WAL writes. You can try changing WAL Mode to LOG_ONLY to mitigate these peaks slightly.
I recommend making checkpoints further apart (by changing "checkpoint frequency" setting, in ms) and maybe increasing "checkpoint page buffer".

FileChannel.transferTo (supposedly zero-copy) not giving any performance gain

I am working on a REST API that has an endpoint to download a file that could be > 2 GB in size. I have read that Java's FileChannel.transferTo(...) will use zero-copy if the OS supports it. My server is running on localhost during development on my MacBook Pro OS 10.11.6.
I compared the following two methods of writing file to response stream:
Copying a fixed number of bytes from FileChannel to WritableByteChannel using transferTo
Reading a fixed number of bytes from FileInputStream into a byte array (size 4096) and writing to OutputStream in a loop.
The time taken for a 5.2GB file is between 20 and 23 seconds with both methods. I tried transferTo with the fixed number of bytes in single transfer set to following values: 4KB (i.e. 4 * 1024), 1MB and 50MB. The time taken to write is in the same range in all the 3 cases.
Time taken is measured from before entering the while-loop to after exiting the while-loop, in which bytes are read from the file. This is all on the server side. The network hop time does not figure into this.
Any ideas on what the reason could be? I am quite sure MacOS 10.11.6 should support zero-copy (i.e. sendfile system call).
EDIT (6/18/2018):
I found the following blog post from 2015, saying that sendfile on MacOS X is broken. Could it be that this problem still exists?
https://blog.phusion.nl/2015/06/04/the-brokenness-of-the-sendfile-system-call/
The (high) transfer rate that you are quoting is likely close to or at the limit of what a SATA device can do anyway. If my guess is right, you will not see a performance gain reflected in the time it takes to run your test - however there will likely be a change in the CPU load during the test. Given that you have a relatively powerful machine, your CPU and memory are fast enough. Any method (zero-copy or not) will work at the same speed - which is the speed of your disk. However, zero-copy will cause a lot less CPU load and will not grab unnecessary bandwidth from your memory, either. Therefore, you should test different methods and see which one ends up using the least amount of CPU and choose that method for your application.

OSX - how to get memory usage of NSTask?

I need to start a external program and wait until it ends. After it was running I nead peak memory usage and CPU time oder CPU tiks.
How can I do this in C / Objective-C on OSX 10.11?
I looked at NSTask, but I have no idea how to get peak memory usage and CPU time.
You can get the PID by calling -processIdentifier and then use Mach's task_info() function to get the information you seek. See: Memory used by a process under mac os x
I believe task info should be available until the process is waited, so just be sure to do that after the process finishes.

Is there a relation between available RAM and Ring size in OpenStack SWIFT?

I was reading about OpenStack SWIFT and its different components. But I have a doubt, if available RAM is more, then can we afford to have Rings of bigger size ? And how does Ring Size affect the system ?
Ring size has nothing to do with the RAM size.
Following is the command to build the object Ring:
swift-ring-builder <builder_file> create <part_power> <replicas> <min_part_hours>
I am quoting the explanation text for above command from the documentation about ring-preparation.
This will start the ring build process creating the with 2^ partitions. is the time in hours before a specific partition can be moved in succession (24 is a good value for this).
It means if you choose to be 10 then 2^10=1024 partitions are going to be created.
You can read in detail from the SWIFT Administrator’s Guide.