How to optimize golang program that spends most time in runtime.osyield and runtime.usleep

How to optimize golang program that spends most time in runtime.osyield and runtime.usleep - optimization

I've been working on optimizing code that analyzes social graph data (with lots of help from https://blog.golang.org/profiling-go-programs) and I've successfully reworked a lot of slow code.
All data is loaded into memory from db first, and the data analysis from there appears CPU bound (max memory consumption < 10MB, CPU1 # 100%)
But now most of my program's time seems to be in runtime.osyield and runtime.usleep. What's the way to prevent that?
I've set GOMAXPROCS=1 and the code does not spawn any goroutines (other than what the golang libraries may call).
This is my top10 output from pprof
(pprof) top10
62550ms of 72360ms total (86.44%)
Dropped 208 nodes (cum <= 361.80ms)
Showing top 10 nodes out of 77 (cum >= 1040ms)
flat flat% sum% cum cum%
20760ms 28.69% 28.69% 20850ms 28.81% runtime.osyield
14070ms 19.44% 48.13% 14080ms 19.46% runtime.usleep
11740ms 16.22% 64.36% 23100ms 31.92% _/C_/code/sc_proto/cloudgraph.(*Graph).LeafProb
6170ms 8.53% 72.89% 6170ms 8.53% runtime.memmove
4740ms 6.55% 79.44% 10660ms 14.73% runtime.typedslicecopy
2040ms 2.82% 82.26% 2040ms 2.82% _/C_/code/sc_proto.mAvg
890ms 1.23% 83.49% 1590ms 2.20% runtime.scanobject
770ms 1.06% 84.55% 1420ms 1.96% runtime.mallocgc
760ms 1.05% 85.60% 760ms 1.05% runtime.heapBitsForObject
610ms 0.84% 86.44% 1040ms 1.44% _/C_/code/sc_proto/cloudgraph.(*Node).DeepestChildren
(pprof)
The _ /C_/code/sc_proto/* functions are my code.
And the output from web:
(better, SVG version of graph here: https://goo.gl/Tyc6X4)

Found the answer myself, so I'm posting this here for anyone else who is having a similar problem. And special thanks to #JimB for sending me down the right path.
As can be seen from the graph, the paths which lead to osyield and usleep are garbage collection routines. This program was using a linked list which generated a lot of pointers, which created a lot of work for the gc, which occasionally blocked execution of my code while it cleaned up my mess.
Ultimately the solution to this problem came from https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs (which was an awesome resource btw). I followed the instructions about the memory profiler there; and the recommendation to replace collections of pointers with slices cleared up my garbage collection issues, and my code is much faster now!

Related

GNURadio Companion and OFDM TX and RX in single Graph

I am following this github example for understanding OFDM on gnuradio-companion, I am able to execute ofdm_tx individually (64 and 512 FFT point) without any issues, but when I connect these two in single graph, I am able to get spectrum from ofdm_tx (no output from ofdm_rx or getting straight line).
My question here, each time I close my output spectrum, my tool get hanged and in background (inside gnu-companion) I observe the following message tarin (attached, printscreen). Similar thing also observed when I run ofdm_rx individually.
Error message in Console :
packet_headerparser_b :info: Detected an invalid packet at item 1448.
header_payload_demux :info :parser returned #f
Please guide me in this regard,

by selecting "NO" for vector source "Repeat" variable , issue sorted out (no hang), but not able to see spectrum anymore.

How to interpret the RabbitMQ Message stats?

I to want get and historize queue metrics for the "Enqueued, Dequeued an Size" (Terminology formerly met on ActiveMQ).
The moving charts provided in the management plugin are not enough for the monitoring that I need to do.
So with RabbitMQ, I'm getting data from https://rabbitmq-server:15672/api/queues/myvhost
This returns json.. for a queue, I can obtain real life production data like :
"messages":0, // for "Size"
"message_stats":{
"deliver_get":171528, // for "Dequeued"
"ack":162348,
"redeliver":9513,
"deliver_no_ack":0,
"deliver":171528,
"get":0,
"publish":51293 // for "Enqueued"
(...)
I'm in particular surprised by the publish counter:
Its value can even decrease between 2 measures done with a couple of minutes of delay ! (see sample chart around 17:00)
As you can see on my data, the deliver_get is significantly larger than the publish.
https://my-rabbitmq:15672/doc/stats.html doesn't give a lot of details that could explain what I actually notice.
Also, under the message_stats object that I obtain, I'm missing the some counters like confirm and return which could be related to the enqueuing.
Are there relationships between these metrics ? (like deliver_get + messages = redeliver + publish.. but that one doesn't work with my figures)
Is there another more detailed documentation about these metrics ?

Memory error when running medium sized merge function ipython notebook jupyter

I'm trying to merge around 100 dataframes with a for loop and am getting a memory error. I'm using ipython jupyter notebook
Here is a sample of the data:
timestamp Namecoin_cap
0 2013-04-28 5969081
1 2013-04-29 7006114
2 2013-04-30 7049003
Each frame is around 1000 lines long
Here's the error in detail, I've also include my merge function.
My system is currently using up 64% of it memory
I have searched for similar issues but it seems most are for very large arrays >1GB, my data is relatively small in comparison.
EDIT: Something is suspicious. I wrote a beta program before, this was to test with 4 dataframes, i just exported that through pickle and it is 500kb. Now when i try to export the 100 frames one I get a memory error. It does however export a file that is 2GB. So i suspect somewhere down the line my code has created some kind of loop, creating a very large file. NB the 100 frames are stored in a dictionary
EDIT2: I have exported the scrypt to .py
http://pastebin.com/GqaHr7xc
This is a .xlsx that cointains asset names the script needs
The script fetches data regarding various assets, then cleans it up and saves each asset to a data frame in a dictionary
I'd be really appreciative if someone could have a look and see if there's anything immediately wrong. Other wise please advise on what tests I can run.
EDIT3: I'm finding it really hard to understand why this is happening, the code worked fine in the beta, all i have done now is add more assets.
EDIT4: I ran I size check on the object (dict of dfs) and it is 1,066,793 bytes
EDIT5: The problem is in the merge function for coin 37
for coin in coins[:37]:
data2['merged'] = pd.merge(left=data2['merged'],right=data2[coin], left_on='timestamp', right_on='timestamp', how='left')
This is when the error occurs. for coin in coins[:36]:' doesn't produce an error howeverfor coin in coins[:37]:' produces the error, any ideas ?
EDIT6: the 36th element is 'Syscoin', i did coins.remove('Syscoin') however the memory problem still occurs. So it seems to be a problem with the 36th element in coins no matter what the coin is
EDIT7: goCards suggestions seemed to work however the next part of the code:
merged = data2['merged']
merged['Total_MC'] = merged.drop('timestamp',axis=1).sum(axis=1)
Produces a memory error. I'm stumped

In regard to storage, I would recommend using a simple csv over pickle. Csv is a more generic format. It is human readable,and you can check your data quality easier especially as your data grows.
file_template_string='%s.csv'
for eachKey in dfDict:
filename = file_template_string%(eachKey)
dfDict[eachKey].to_csv(filename)
If you need to date the files you can also put a timestamp in the filename.
import time
from datetime import datetime
cur = time.time()
cur = datetime.fromtimestamp(cur)
file_template_string = "%s_{0}.csv".format(cur.strftime("%m_%d_%Y_%H_%M_%S"))
There are some obvious errors in your code.
for coin in coins: #line 61,89
for coin in data: #should be
df = data2['Namecoin'] #line 87
keys = data2.keys()
keys.remove('Namecoin')
for coin in keys:
df = pd.merge(left=df,right=data2[coin], left_on='timestamp', right_on='timestamp', how='left')

Same issue happened to me!
"MemoryError:" by notebook on execution of pandas. I have also screen printed quite lot of observations before issued happened.
Reinstalling Anaconda didn't help. Later realized that i was working with IPython notebook instead Jupyter notebook. Switched to Jupyter notebook. Everything worked fine!

I am getting 'Local Search phase started with an uninitialized Solution' when I run on a larger dataset

I am developing a solver using Optaplanner 6.1.0, similar to the Vehicle Routing Problem. When I run my solver on 700 installers and 200 bookings, it will successfully solve the planning problem. But, when I used against a larger dataset (700 installers and 1220 bookings), I get
Caused by: java.lang.IllegalStateException: Local Search phase started with an uninitialized Solution. First initialize the Solution. For example, run a Construction Heuristic phase first.
but right before the exception,
16:10:40,378 INFO [DefaultConstructionHeuristicPhase] [http-listener-1(4)] Construction Heuristic phase (0) ended: step total (194), time spent (30693), best score (-1hard/-688803soft).
I am using <constructionHeuristicType>FIRST_FIT_DECREASING</constructionHeuristicType>
in my config.
Am I using it wrong?

Maybe the value range for a planning variable is empty. Especially with value range provider from entity, this is more likely. Feel free to file a jira that the error message should improve in such a case.
Diagnostic todo: Comment out the local solver phase, run the solver (so it only does the construction heuristic) and then iterate through the planning entities and print out the value for each planning value. Check if there are any nulls in there.
The fact that you have 194 steps, instead 200 steps in your CH indicates this. (If those other 6 planning entities are immovable, this won't trigger this exception (more info), so that's not the problem.)

How to dump Permgen?

I wanted to take the dump of the Permgen of a application server.
I do not want to use -XX:+TraceClassLoading -XX:+TraceClassUnloading as i do not want to restart the server, Neither i want to use jconsole.
I there any tool like jmap(used to heap dump didnt find any option for permgen) to get the permgen so that i can supply only the pid.

jmap -permstat <pid>
is going to produce an output like that :
30337 intern Strings occupying 2746200 bytes.
class_loader classes bytes parent_loader alive? type
<bootstrap> 2031 7253392 null live <internal>
0x517474f0 1 1760 null dead sun/reflect/DelegatingClassLoader#0x43f95d38
0x4f83f670 1 1744 0x4ebfb8e8 dead sun/reflect/DelegatingClassLoader#0x43f95d38
[...]
total = 287 10020 35889952 N/A alive=3, dead=284 N/A
This is not a full dump, but doing that is going to allow you to do some investigation.
I am still looking on how to find more information.

It is not possible to 'dump permgen' as it's done for the heap.
In addition to jmap -permstat as others have presented, you can analyze standard heap dump to shed some light on your permanent generation as described in this blog entry: 'The Unknown Generation: Perm'.
Because a heap dump does not really contain a lot of information about perm space, perm problems are difficult to tackle. Recently, I found this great article by Sporar, Sundararajan and Kieviet. The authors shed some light on the permanent generation. Of course, I had to check right away if and how I can use the Eclipse Memory Analyzer to analyze this “unknown” generation. This is what this blog is about.

jmap -permstat <pid>

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to optimize golang program that spends most time in runtime.osyield and runtime.usleep - optimization

Related

GNURadio Companion and OFDM TX and RX in single Graph

How to interpret the RabbitMQ Message stats?

Memory error when running medium sized merge function ipython notebook jupyter

I am getting 'Local Search phase started with an uninitialized Solution' when I run on a larger dataset

How to dump Permgen?

Categories

Resources