I'm having an issue with GC pause (~400ms) which I'm trying to reduce. I noticed that I always have one worker a lot slower than others :
2013-06-03T17:24:51.606+0200: 605364.503: [GC pause (mixed)
Desired survivor size 109051904 bytes, new threshold 1 (max 1)
- age 1: 47105856 bytes, 47105856 total
, 0.47251300 secs]
[Parallel Time: 458.8 ms]
[GC Worker Start (ms): 605364503.9 605364503.9 605364503.9 605364503.9 605364503.9 605364504.0
Avg: 605364503.9, Min: 605364503.9, Max: 605364504.0, Diff: 0.1]
--> [**Ext Root Scanning (ms)**: **356.4** 3.1 3.7 3.6 3.2 3.0
Avg: 62.2, **Min: 3.0, Max: 356.4, Diff: 353.4**] <---
[Update RS (ms): 0.0 22.4 33.6 21.8 22.3 22.3
Avg: 20.4, Min: 0.0,
As you can see one worker took 356 ms when others took only 3 ms !!!
If someone has an idea or think it's normal ..
[I'd rather post this as a comment, but I still lack the necessary points to do so]
No idea as to whether it is normal, but I've come across the same problem:
2014-01-16T13:52:56.433+0100: 59577.871: [GC pause (young), 2.55099911 secs]
[Parallel Time: 2486.5 ms]
[GC Worker Start (ms): 59577871.3 59577871.4 59577871.4 59577871.4 59577871.4 59577871.5 59577871.5 59577871.5
Avg: 59577871.4, Min: 59577871.3, Max: 59577871.5, Diff: 0.2]
[Ext Root Scanning (ms): 152.0 164.5 159.0 183.7 1807.0 117.4 113.8 138.2
Avg: 354.5, Min: 113.8, Max: 1807.0, Diff: 1693.2]
I've been unable to find much about the subject but here http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2013-February/001484.html
Basically as you surmise one the GC worker threads is being held up
when processing a single root. I've seen s similar issue that's caused
by filling up the code cache (where JIT compiled methods are held).
The code cache is treated as a single root and so is claimed in its
entirety by a single GC worker thread. As a the code cache fills up,
the thread that claims the code cache to scan starts getting held up.
A full GC clears the issue because that's where G1 currently does
class unloading: the full GC unloads a whole bunch of classes allowing
any the compiled code of any of the unloaded classes' methods to be
freed by the nmethod sweeper. So after a a full GC the number of
compiled methods in the code cache is less.
It could also be the just the sheer number of loaded classes as the
system dictionary is also treated as a single claimable root.
I think I'll try enabling the code cache flushing and let you know. If you finally managed to solve this problem please let me know, I'm striving to get done with it as well.
Kind regards
Related
Trying out WSL2 for the first time. Running Ubuntu 18.04 on a Dell Latitude 9510 with an SSD. Noticed build speeds of a React project were brutally slow. Per all the articles on the web I'm running the project out of ~ and not the windows mount. Ran a benchmark using sysbench --test=fileio --file-test-mode=seqwr run in ~ and got:
File operations:
reads/s: 0.00
writes/s: 3009.34
fsyncs/s: 3841.15
Throughput:
read, MiB/s: 0.00
written, MiB/s: 47.02
General statistics:
total time: 10.0002s
total number of events: 68520
Latency (ms):
min: 0.01
avg: 0.14
max: 22.55
95th percentile: 0.31
sum: 9927.40
Threads fairness:
events (avg/stddev): 68520.0000/0.00
execution time (avg/stddev): 9.9274/0.00
If I'm reading this correctly, that wrote 47 mb/s. Ran the same test on my mac mini and got 942 mb/s. Is this normal? This seems like the Linux i/o speeds on WSL are unusably slow. Any thoughts on ways to speed this up?
---edit---
Not sure if this is a fair comparison, but the output of winsat disk -drive c on the same machine from the Windows side. Smoking fast:
> Dshow Video Encode Time 0.00000 s
> Dshow Video Decode Time 0.00000 s
> Media Foundation Decode Time 0.00000 s
> Disk Random 16.0 Read 719.55 MB/s 8.5
> Disk Sequential 64.0 Read 1940.39 MB/s 9.0
> Disk Sequential 64.0 Write 1239.84 MB/s 8.6
> Average Read Time with Sequential Writes 0.077 ms 8.8
> Latency: 95th Percentile 0.219 ms 8.9
> Latency: Maximum 2.561 ms 8.7
> Average Read Time with Random Writes 0.080 ms 8.9
> Total Run Time 00:00:07.55
---edit 2---
Windows version: Windows 10 Pro, Version 20H2 Build 19042
Late answer, but I had the same issue and wanted to post my solution for anyone who has the problem:
Windows Defender seems to destroy the read speeds in WSL. I added the entire rootfs folder as an exclusion. If you're comfortable turning off Windows Defender, I recommend that as well. Any antivirus probably has similar issues, so adding the WSL directories as an exclusion is probably you best bet.
I tested if my ros2 node was subscribing exactly to /camera/depth/image_rect_raw topic from reassense ROS2 node.I attatched a realsense camera to TX2 board and configured as 15 fps.
However, I expected that the rate of subscribing /camera/depth/image_rect_raw should be similar to 15 hz but it is different as below. Why is there the rate difference between publising and subscribing image topic? Is it possible to match the subscribing rate to publishing rate?
$ ros2 topic hz /camera/depth/image_rect_raw
average rate: 10.798
min: 0.040s max: 0.144s std dev: 0.03146s window: 13
average rate: 8.610
min: 0.040s max: 0.357s std dev: 0.06849s window: 22
average rate: 8.085
min: 0.040s max: 0.357s std dev: 0.07445s window: 30
average rate: 9.498
min: 0.015s max: 0.357s std dev: 0.06742s window: 45
average rate: 9.552
min: 0.015s max: 0.415s std dev: 0.07555s window: 55
average rate: 9.265
min: 0.015s max: 0.415s std dev: 0.07543s window: 63
average rate: 8.510
min: 0.015s max: 0.415s std dev: 0.08619s window: 68
average rate: 7.940
min: 0.015s max: 0.480s std dev: 0.09757s window: 73
average rate: 7.539
min: 0.015s max: 0.480s std dev: 0.10456s window: 77
average rate: 7.750
min: 0.015s max: 0.480s std dev: 0.09972s window: 87
It is likely that the difference is due to the transport delay of putting the image into the network. The significance of this delay depends on whether your subscriber is running on the Jetson or if it is on a separate device on the same physical network. Regardless, I would suggest changing the default QoS policies to get better performance for video streaming such as setting RELIABILITY=BEST_EFFORT. That said the biggest improvement (if you are streaming over a network) will likely come from using the image_transport_plugins to compress the images before they are published. Although these are CPU based (theora, etc.) they will likely help.
Another thing to consider is using the compression hardware accelerators that are built into the Jetson though that will require some more work until the maintainers of image_transport_plugins or another enterprising developer gets this working.
i'm currently using a Graph Database using Redis for a Julia project.
Sometimes Redis requests are taking 300 ms to execute and i don't understand why.
I run a simple request 10.000 times (the code of the request is below) and it took me :
using Redis, BenchmarkTools
conn = RedisConnection(port=6382) Redis.execute_command(conn,["FLUSHDB"])
q = string("CREATE (:Type {nature :'Test',val:'test'})") BenchmarkTools.DEFAULT_PARAMETERS.seconds = 1000 BenchmarkTools.DEFAULT_PARAMETERS.samples = 10000
stats = #benchmark Redis.execute_command(conn,[ "GRAPH.QUERY", "GraphDetection", q])
And got this results :
BenchmarkTools.Trial: memory estimate: 3.09 KiB allocs estimate: 68
minimum time: 1.114 ms (0.00% GC)
median time: 1.249 ms (0.00% GC)
mean time: 18.623 ms (0.00% GC)
maximum time: 303.269 ms (0.00% GC)
samples: 10000 evals/sample: 1
The Huge difference between median time and mean time came from the problem i'm talking about (the request take either [1-3] ms or [300-310] ms )
I'm not familiar with Julia but please note RedisGraph report its internal execution time, I'll suggest using this report for measurement,
In addition it would be helpful to understand when (on which sample) did RedisGraph took over 100ms to process the query, usually it is the first query which causes RedisGraph to do some extra work.
My kernel archive 100% utilization, but the kernel time is at only 3% and there is no time overlap between memory copies and kernels.
Especially the high utilization and the low kernel time don't make sense to me.
So how should I proceed in optimizing my kernel?
I already made sure, that I only have coalesced and pinned memory access, like the profiler recommended.
`Quadro FX 580 utilization = 100.00% (62117.00/62117.00)`
Kernel time = 3.05 % of total GPU time
Memory copy time = 0.9 % of total GPU time
Kernel taking maximum time = Pinned (0.7% of total GPU time)
Memory copy taking maximum time = memcpyHtoD (0.5% of total GPU time)
There is no time overlap between memory copies and kernels on GPU
Furtermore I have no warp serialization, no divergent branches, and no occupancy limiting factor.
Kernel details: Grid size: [4 1 1], Block size: [256 1 1]
Register Ratio: 0.9375 ( 7680 / 8192 ) [10 registers per thread]
Shared Memory Ratio: 0.09375 ( 1536 / 16384 ) [60 bytes per Block]
Active Blocks per SM: 3 (Maximum Active Blocks per SM: 8)
Active threads per SM: 768 (Maximum Active threads per SM: 768)
Potential Occupancy: 1 ( 24 / 24 )
Achieved occupancy: 0.333333 (on 4 SMs)
Occupancy limiting factor: None
p.s. I don't claim that I wrote wundercode, but I just don't know how to proceed from here.
it seems the grid size of your kernel is too small to make full use of SM.
why not decrease block size and increase the grid size.
i think it will do some help.
I'm on a project upgrading from Rails 2 -> 3. We are removing Ultrasphinx (which is not supported in Rails 3) and replacing it with ThinkingSphinx. One problem - the Cucumber tests for searching, which used to work, are failing as ThinkingSphinx is not indexing the files in test mode.
This is the relevant part of env.rb:
require 'cucumber/thinking_sphinx/external_world'
Cucumber::ThinkingSphinx::ExternalWorld.new
Cucumber::Rails::World.use_transactional_fixtures = false
And here is the step (declared in my common_steps.rb file) that indexes my objects:
Given /^ThinkingSphinx is indexed$/ do
puts "Indexing the new database objects"
# Update all indexes
ThinkingSphinx::Test.index
sleep(0.25) # Wait for Sphinx to catch up
end
And this is what I have in my .feature file (after the model objects are created)
And ThinkingSphinx is indexed
This is the output of ThinkingSphinx when its run in test mode (this is WRONG, it should be finding documents but it is not)
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'C:/Users/PaulG/Programming/Projects/TechTV/config/test.sphinx.conf'...
indexing index 'collection_core'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.027 sec, 0 bytes/sec, 0.00 docs/sec
distributed index 'collection' can not be directly indexed; skipping.
indexing index 'video_core'...
collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.018 sec, 0 bytes/sec, 0.00 docs/sec
distributed index 'video' can not be directly indexed; skipping.
total 0 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 8 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=4332).
In comparison, this is the output I get when I run
rake ts:index
To index the development environment:
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
using config file 'C:/Users/PaulG/Programming/Projects/TechTV/config/development.sphinx.conf'...
indexing index 'collection_core'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 39 bytes
total 0.031 sec, 1238 bytes/sec, 127.04 docs/sec
distributed index 'collection' can not be directly indexed; skipping.
indexing index 'video_core'...
collected 4 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 4 docs, 62 bytes
total 0.023 sec, 2614 bytes/sec, 168.66 docs/sec
distributed index 'video' can not be directly indexed; skipping.
total 10 reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 20 writes, 0.001 sec, 0.1 kb/call avg, 0.0 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=5476).
Notice how its actually finding documents in my development database, but not my test database. The indexer is working in dev, but not test? I've spent 2 days on this, and am no closer to a solution. Any help would be overwhelmingly appreciated.
I figured it out this morning, hopefully I can save someone else the troubles I experienced. Looks like it wasn't a fault of Cucumber, but of DatabaseCleaner.
I fixed this issue by changing this line in env.rb:
DatabaseCleaner.strategy = :transaction
to
DatabaseCleaner.strategy = :truncation