Tensorflow slower on GPU than on CPU - tensorflow

Using Keras with Tensorflow backend, I am trying to train an LSTM network and it is taking much longer to run it on a GPU than a CPU.
I am training an LSTM network using the fit_generator function. It takes CPU ~250 seconds per epoch while it takes GPU ~900 seconds per epoch. The packages in my GPU environment include
keras-applications 1.0.8 py_0 anaconda
keras-base 2.2.4 py36_0 anaconda
keras-gpu 2.2.4 0 anaconda
keras-preprocessing 1.1.0 py_1 anaconda
...
tensorflow 1.13.1 gpu_py36h3991807_0 anaconda
tensorflow-base 1.13.1 gpu_py36h8d69cac_0 anaconda
tensorflow-estimator 1.13.0 py_0 anaconda
tensorflow-gpu 1.13.1 pypi_0 pypi
My Cuda compilation tools are of version 9.1.85 and my CUDA and Driver version are
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 On | 00000000:0A:00.0 Off | N/A |
| 0% 39C P8 5W / 225W | 7740MiB / 7952MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2080 On | 00000000:42:00.0 Off | N/A |
| 0% 33C P8 19W / 225W | 142MiB / 7951MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 49251 C .../whsu014/.conda/envs/whsuphd/bin/python 7729MiB |
| 1 1354 G /usr/lib/xorg/Xorg 16MiB |
| 1 49251 C .../whsu014/.conda/envs/whsuphd/bin/python 113MiB |
+-----------------------------------------------------------------------------+
When I insert this line of code
tf.Session(config = tf.configProto(log_device_placement = True)):
I see the below in my terminal
...
ining_1/Adam/Const_10: (Const)/job:localhost/replica:0/task:0/device:GPU:0
training_1/Adam/Const_11: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-25 11:27:31.720653: I tensorflow/core/common_runtime/placer.cc:1059] training_1/Adam/Const_11: (Const)/job:localhost/replica:0/task:0/device:GPU:0
training_1/Adam/add_15/y: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-25 11:27:31.720666: I tensorflow/core/common_runtime/placer.cc:1059] training_1/Adam/add_15/y: (Const)/job:localhost/replica:0/task:0/device:GPU:0
...
So it seems that Tensorflow is using GPU.
When I profile the code,
on GPU this is the first 10 lines
10852017 function calls (10524203 primitive calls) in 184.768 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
16200 173.827 0.011 173.827 0.011 {built-in method _pywrap_tensorflow_internal.TF_SessionRunCallable}
6 0.926 0.154 0.926 0.154 {built-in method _pywrap_tensorflow_internal.TF_SessionMakeCallable}
62 0.813 0.013 0.813 0.013 {built-in method _pywrap_tensorflow_internal.TF_SessionRun_wrapper}
156954 0.414 0.000 0.415 0.000 {built-in method numpy.array}
16200 0.379 0.000 1.042 0.000 training.py:643(_standardize_user_data)
24300 0.338 0.000 0.338 0.000 {method 'partition' of 'numpy.ndarray' objects}
68 0.301 0.004 0.301 0.004 {built-in method _pywrap_tensorflow_internal.ExtendSession}
32458 0.223 0.000 2.122 0.000 tensorflow_backend.py:156(get_session)
3206 0.212 0.000 0.238 0.000 tf_stack.py:31(extract_stack)
76024 0.210 0.000 0.702 0.000 ops.py:5246(get_controller)
...
on CPU this is the first 10 lines
22123473 function calls (21647174 primitive calls) in 60.173 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
16269 42.491 0.003 42.491 0.003 {built-in method tensorflow.python._pywrap_tensorflow_internal.TF_Run}
16269 0.568 0.000 48.964 0.003 session.py:1042(_run)
56 0.532 0.010 0.532 0.010 {built-in method time.sleep}
153641 0.458 0.000 0.460 0.000 {built-in method numpy.core.multiarray.array}
183148/125354 0.447 0.000 1.316 0.000 python_message.py:469(init)
1226659 0.362 0.000 0.364 0.000 {built-in method builtins.getattr}
2302110/2301986 0.339 0.000 0.358 0.000 {built-in method builtins.isinstance}
8 0.285 0.036 0.285 0.036 {built-in method tensorflow.python._pywrap_tensorflow_internal.TF_ExtendGraph}
12150 0.267 0.000 0.271 0.000 callbacks.py:211(on_batch_end)
147026/49078 0.264 0.000 1.429 0.000 python_message.py:1008(ByteSize)
...
This is my code.
def train_generator(x_list, y_list):
# 0.1 validatioin split
train_length = (len(x_list)//10)*9
while True:
for i in range(train_length):
train_x = np.array([x_list[i]])
train_y = np.array([y_list[i]])
yield train_x, train_y
def val_generator(x_list, y_list):
# 0.1 validation split
val_length = len(x_list)//10
while True:
for i in range(-val_length, 0, 1):
val_x = np.array([x_list[i]])
val_y = np.array([y_list[i]])
yield val_x, val_y
with tf.Session(config = tf.ConfigProto(log_device_placement = True)):
model = Sequential()
model.add(LSTM(64, return_sequences=False,
input_shape=(None, 24)))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
checkpointer = ModelCheckpoint(filepath="weights.hdf5",
monitor='val_loss', verbose=1,
save_best_only=True)
history = model.fit_generator(generator=train_generator(train_x,
train_y),
steps_per_epoch=(len(train_x)//10)*9,
epochs=5,
validation_data=val_generator(train_x,
train_y),
validation_steps=len(train_x)//10,
callbacks=[checkpointer],
verbose=2, shuffle=False)
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.show()
I expect a significant speed up when using GPU for training. How can I fix this? Can someone help me to understand what is causing the slowdown? Thank you.

Couple of observations:
Use CuDNNLSTM instead of LSTM to train on GPU, you will see considerable increase in speed.
Sometimes, for very small networks, the overhead of transferring between CPU and GPU outweighs the parallel computations made on GPU; in other words, there is more time lost on transferring the data than time gained by training on GPU.
GPUs should be used for highly intensive tasks and computations (very big LSTM/heavy CNN networks). Nevertheless, for very small MLPs and even small LSTMs you might observe that the network trains equally fast on CPU and GPU or that in some particular cases the speed on CPU is even better (very particular cases with super small networks).
UPDATE FOR TENSORFLOW >= 2.0
The imports default to using CuDNNLSTM/CuDNNGRU if the video card is detected; therefore it is not needed explicitly to import them.

Related

CPLEX OPL : Overflow occurred, please use oplrun -profile

I am running a fairly large model in OPL, it has 576723 constraints, 1132515 variables, 3855 binary, 27150711 Non zero co-efficients.
At about 12 minutes the optimisation stops, it says 1 solution but displays no solution. In the profiler tab I get the Overflow occurred, please use oplrun -profile message.
The Engine log looks as below ( Updated on 24th Sep):
Found incumbent of value 0.000000 after 0.02 sec. (30.57 ticks)
Presolve has eliminated 65039 rows and 117138 columns...
Presolve has improved bounds 1277962 times...
Aggregator has done 20701 substitutions...
Aggregator has done 42701 substitutions...
Aggregator has done 65901 substitutions...
Aggregator has done 89601 substitutions...
Aggregator has done 114601 substitutions...
Aggregator has done 141901 substitutions...
Aggregator has done 172001 substitutions...
Aggregator has done 205101 substitutions...
Aggregator has done 242201 substitutions...
Aggregator has done 285501 substitutions...
Aggregator has done 339801 substitutions...
Aggregator has done 425001 substitutions...
Tried aggregator 2 times.
MIP Presolve eliminated 65049 rows and 119516 columns.
MIP Presolve modified 3304560 coefficients.
Aggregator did 505533 substitutions.
Reduced MIP has 6138 rows, 507466 columns, and 15507869 nonzeros.
Reduced MIP has 2761 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 52.98 sec. (140577.29 ticks)
Tried aggregator 1 time.
Reduced MIP has 6138 rows, 507466 columns, and 15507869 nonzeros.
Reduced MIP has 2761 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 4.59 sec. (4115.32 ticks)
Probing time = 0.33 sec. (193.08 ticks)
Clique table members: 674.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 16 threads.
Root relaxation solution time = 5983.52 sec. (4525135.08 ticks)
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
* 0+ 0 0.0000 4585.0158 ---
0 0 1414.4727 839 0.0000 1414.4727 74713 ---
0 0 cutoff 0.0000 5409203 ---
Elapsed time = 19950.47 sec. (18809991.19 ticks, tree = 0.01 MB, solutions = 1)
Clique cuts applied: 2
Cover cuts applied: 57
Implied bound cuts applied: 91
Flow cuts applied: 121
Mixed integer rounding cuts applied: 236
Gomory fractional cuts applied: 4
Root node processing (before b&c):
Real time = 19950.63 sec. (18810086.10 ticks)
Parallel b&c, 16 threads:
Real time = 0.00 sec. (0.00 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 19950.63 sec. (18810086.10 ticks)
<<< solve
OBJECTIVE: 0
<<< post process
<<< done
Profiler Report
Time PeakMemory SelfTime LocalMem Count Nodes Description
20,190.282 100% 9.902 G 100% 0.753 0% 879.507 M 9% 1 126 TOTAL
0.000 0% 0 B 0% 0.000 0% 256 B 0% 1 1 READING MODEL DEFINITION Ashes200_data
38.626 0% 840.113 M 8% 0.128 0% 721.418 M 7% 1 97 LOADING MODEL Ashes200_data-0000025C59804DD8
7.277 0% 103.191 M 1% 2.750 0% 84.547 M 1% 1 52 LOADING DATA D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data 3yr.dat
0.005 0% 28 K 0% 0.005 0% 400 B 0% 1 1 INIT TimePeriods at 13:1-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.003 0% 8 K 0% 0.003 0% 54.047 K 0% 1 1 INIT PitBlocks at 14:1-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 16 K 0% 0.001 0% 35.641 K 0% 1 1 INIT DumpBlocks at 15:1-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 576 B 0% 1 1 INIT Stockpiles at 17:1-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 424 B 0% 1 1 INIT Plants at 19:1-21 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.329 0% 22.695 M 0% 0.329 0% 18.362 M 0% 1 1 INIT Pathid at 21:1-22 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 904 B 0% 1 1 INIT AverageGrade at 48:1-37 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 864 B 0% 1 1 INIT DensityGradeBins at 49:1-42 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 8 K 0% 0.002 0% 5.531 K 0% 1 1 INIT grade at 26:1-30 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 5.516 K 0% 1 1 INIT oreTons at 27:1-32 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 5.562 K 0% 1 1 INIT density at 28:1-32 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 5.523 K 0% 1 1 INIT wasteVolume at 29:1-36 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.068 0% 0 B 0% 0.068 0% 5.523 K 0% 1 1 INIT totalVolume at 30:1-36 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 8 K 0% 0.002 0% 3.773 K 0% 1 1 INIT dumpVolume at 32:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 872 B 0% 1 1 INIT resourceMaxCap at 35:1-40 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 840 B 0% 1 1 INIT resourceMinCap at 36:1-40 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.013 0% 0 B 0% 0.013 0% 1.484 K 0% 1 1 INIT processMinCap at 37:1-46 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 1.516 K 0% 1 1 INIT processMaxCap at 38:1-46 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 1.477 K 0% 1 1 INIT GradeMin at 39:1-42 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 936 B 0% 1 1 INIT SellPrice at 41:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 840 B 0% 1 1 INIT wasteMiningCost at 42:1-41 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 840 B 0% 1 1 INIT coalMiningCost at 43:1-40 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 840 B 0% 1 1 INIT washCost at 44:1-34 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 840 B 0% 1 1 INIT HaulageCost at 45:1-37 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 848 B 0% 1 1 INIT StockPileRehandlingCost at 46:1-49 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 128 B 0% 1 1 INIT SwellFactor at 52:1-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 56 K 0% 0.001 0% 2.031 K 0% 1 1 INIT StockPileMaxCap at 56:1-52 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 2.094 K 0% 1 1 INIT StockPileMinCap at 55:1-52 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 128 B 0% 1 1 INIT DisountRate at 58:1-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 128 B 0% 1 1 INIT DumpCapacity at 60:1-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.323 0% 0 B 0% 0.323 0% 130.461 K 0% 1 2 INIT PitBlocksType at 287:1-27 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 976 B 0% 1 1 INIT ijk at 278:1-284:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.003 0% 0 B 0% 0.003 0% 57.203 K 0% 1 2 INIT DumpBlocksType at 273:1-34 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 872 B 0% 1 1 INIT blockType at 263:1-268:3 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.273 0% 48 K 0% 0.273 0% 90.812 K 0% 1 2 INIT PitLagInfoXYB at 79:1-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 872 B 0% 1 1 INIT xyz at 64:1-69:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.172 0% 0 B 0% 0.172 0% 55.266 K 0% 1 1 INIT DumpLagInfoXYB at 78:1-26 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.165 0% 0 B 0% 0.165 0% 20.453 K 0% 1 1 INIT DumpXYZ at 72:1-29 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 2.555 K 0% 1 1 INIT PlantXYZ at 73:1-26 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 2.742 K 0% 1 1 INIT StockpilesXYZ at 74:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.003 0% 40 K 0% 0.003 0% 30.953 K 0% 1 1 INIT PitXYZ at 71:1-27 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
2.463 0% 56.422 M 1% 2.463 0% 45.421 M 0% 1 2 INIT rawPbd at 131:1-20 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1.117 K 0% 1 1 INIT Raw at 121:1-130:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.011 0% 188 K 0% 0.011 0% 174.133 K 0% 1 1 INIT rawPbm at 132:1-20 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.024 0% 652 K 0% 0.024 0% 414.375 K 0% 1 1 INIT rawPbs at 133:1-20 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.625 0% 21.031 M 0% 0.625 0% 19.388 M 0% 1 2 INIT sourceDestD at 108:1-37 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 744 B 0% 1 1 INIT sourceDestination at 103:1-106:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.005 0% 292 K 0% 0.005 0% 61.859 K 0% 1 1 INIT sourceDestM at 109:1-37 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.006 0% 240 K 0% 0.006 0% 177.469 K 0% 1 1 INIT sourceDestS at 110:1-37 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 12.562 K 0% 1 2 INIT NullVariablesSet at 450:1-40 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 744 B 0% 1 1 INIT nullVariables at 445:1-448:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
28.810 0% 463.848 M 5% 0.000 0% 416.204 M 4% 1 29 PRE PROCESSING
0.410 0% 640 K 0% 0.345 0% 649.023 K 0% 1 4 EXECUTE anonymous#1 at 90:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.065 0% 640 K 0% 0.065 0% 647.672 K 0% 1 3 INIT OntopDumpLag at 85:6-87:52 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 8 K 0% 0.000 0% 280 B 0% 1 1 INIT D at 81:11-14 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 296 B 0% 1 1 INIT BottomPitBenNo at 82:24-25 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
5.935 0% 229.957 M 2% 5.935 0% 211.206 M 2% 1 8 EXECUTE anonymous#2 at 158:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 624 B 0% 1 1 INIT emptysetd at 153:22-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 5.641 K 0% 1 2 INIT Pbd at 148:12-14 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1.117 K 0% 1 1 INIT Path at 136:1-145:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 840 B 0% 1 1 INIT emptysetm at 154:22-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 2.875 K 0% 1 1 INIT Pbm at 149:13-15 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 792 B 0% 1 1 INIT emptysets at 155:22-24 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 2.875 K 0% 1 1 INIT Pbs at 150:12-14 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.902 0% 51.145 M 1% 0.788 0% 47.271 M 0% 1 2 EXECUTE anonymous#3 at 237:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.114 0% 51.145 M 1% 0.114 0% 47.271 M 0% 1 1 INIT hc at 233:1-31 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
1.678 0% 2.129 M 0% 1.029 0% 1.958 M 0% 1 2 EXECUTE anonymous#4 at 303:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.649 0% 2.129 M 0% 0.649 0% 1.957 M 0% 1 1 INIT OntopPit at 290:7-299:28 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
5.335 0% 117.746 M 1% 5.163 0% 106.703 M 1% 1 6 EXECUTE anonymous#5 at 367:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 624 B 0% 1 1 INIT MaxS at 364:10-12 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.061 0% 42.777 M 0% 0.061 0% 39.647 M 0% 1 1 INIT splitPitBlocksPath at 353:1-34 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 0 B 0% 0.001 0% 121.359 K 0% 1 1 INIT splitPitBlocksPathM at 354:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.001 0% 536 K 0% 0.001 0% 361.719 K 0% 1 1 INIT splitPitBlocksPathS at 355:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.109 0% 43.75 M 0% 0.109 0% 43.522 M 0% 1 1 INIT splitDumpBlocksPath at 356:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
14.550 0% 62.246 M 1% 14.436 0% 48.431 M 0% 1 6 EXECUTE anonymous#6 at 470:1-8 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 268 K 0% 0.000 0% 263.789 K 0% 1 1 INIT capBMT at 453:1-46 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.112 0% 50.91 M 1% 0.112 0% 47.161 M 0% 1 1 INIT capBDT at 455:1-50 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 536 K 0% 0.002 0% 555.375 K 0% 1 1 INIT capBST at 457:1-50 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 268 K 0% 0.000 0% 143.125 K 0% 1 1 INIT capBT at 459:1-35 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 147.5 K 0% 1 1 INIT capschedulePit at 461:1-44 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
2.101 0% 256.062 M 3% 0.009 0% 218.874 M 2% 1 10 INIT npv at 699:19-703:108 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 5.156 K 0% 1 1 INIT Dfbmt at 684:1-103 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.006 0% 0 B 0% 0.006 0% 419.82 K 0% 1 1 INIT Xbmt at 672:1-89 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 3.914 K 0% 1 1 INIT Dfbdt at 687:1-59 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
1.350 0% 108.723 M 1% 1.350 0% 106.255 M 1% 1 1 INIT Xbdt at 673:1-91 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 3.914 K 0% 1 1 INIT Dfbst at 690:1-58 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.012 0% 1.051 M 0% 0.012 0%1,020.648 K 0% 1 1 INIT Xbst at 674:1-91 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 4.805 K 0% 1 1 INIT Dfsmt at 694:1-87 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 3.758 K 0% 1 1 INIT Xsmt at 663:1-51 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.724 0% 109.227 M 1% 0.724 0% 106.847 M 1% 1 1 INIT ypt at 677:1-47 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.018 0% 16.035 M 0% 0.018 0% 315.047 K 0% 1 1 INIT schedulePit at 676:1-87 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.287 0% 0 B 0% 0.287 0% 875.367 K 0% 1 1 INIT OnBelowDump at 313:6-323:47 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.002 0% 0 B 0% 0.002 0% 202.047 K 0% 1 1 INIT scheduleDump at 668:1-52 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 2.156 K 0% 1 1 INIT StockPileVol at 54:1-45 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.003 0% 272 K 0% 0.003 0% 280.82 K 0% 1 1 INIT zbt at 675:1-70 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
177.026 1% 9.063 G 92% 10.546 0% 158.001 M 2% 1 2 EXTRACTING Ashes200_data-0000025C59804DD8
166.480 1% 8.179 G 83% 166.480 1% 17.213 M 0% 1 1 OBJECTIVE at 714:1-716:4 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
19,951.910 99% 4.281 G 43%5,989.816 30% 319.668 M 3% 1 13 CPLEX MIP Optimization
52.990 0% 389.746 M 4% 52.990 0% 389.746 M 4% 1 1 CPLEX Pre Solve
4.589 0% 256.008 M 3% 4.589 0% 256.008 M 3% 1 1 CPLEX Pre Solve
0.000 0% 0 B 0% 0.000 0% 0 B 0% 1 1 CPLEX Solve LP Relaxation
13,904.515 69% 1.05 G 11% 23.882 0% 467.602 M 5% 1 9 CPLEX Generating Cuts for Root Node
13,714.446 68% 52 K 0% 2.292 0% 78.424 M 1% 7 3 CPLEX Solve LP Relaxation
13,711.520 68% 0 B 0%13,711.520 68% 110.169 M 1% 4 1 CPLEX Solve LP Relaxation
0.634 0% 52 K 0% 0.634 0% 225.727 M 2% 1 1 CPLEX Pre Solve
165.425 1% 604.797 M 6% 0.170 0% 604.797 M 6% 1 3 CPLEX Heuristics
165.255 1% 324.707 M 3% 0.289 0% 81.177 M 1% 4 2 CPLEX Solve LP Relaxation
164.966 1% 309.051 M 3% 164.966 1% 152.584 M 2% 2 1 CPLEX Solve LP Relaxation
0.130 0% 0 B 0% 0.130 0% 0 B 0% 1 1 CPLEX Probing
0.632 0% 225.676 M 2% 0.632 0% 225.676 M 2% 1 1 CPLEX Pre Solve
21.967 0% 8.656 M 0% 0.009 0% 35.43 K 0% 1 12 POST PROCESSING
21.958 0% 8.656 M 0% 17.082 0% 39.516 K 0% 1 11 EXECUTE anonymous#7 at 1300:1-1301:0 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.011 0% 8.656 M 0% 0.011 0% 9.328 K 0% 1 2 INIT solXbmt at 1252:21-112 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1,000 B 0% 1 1 INIT SolXbmt at 1245:1-1250:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
2.327 0% 0 B 0% 2.327 0% 7.258 K 0% 1 2 INIT solXbdt at 1263:24-118 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1,000 B 0% 1 1 INIT SolXbdt at 1255:1-1260:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.036 0% 0 B 0% 0.036 0% 7.352 K 0% 1 2 INIT solXbst at 1275:22-117 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1,000 B 0% 1 1 INIT SolXbst at 1267:1-1272:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 7.258 K 0% 1 2 INIT solXsmt at 1284:21-111 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 1,000 B 0% 1 1 INIT SolXsmt at 1277:1-1282:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
2.502 0% 0 B 0% 2.502 0% 6.18 K 0% 1 2 INIT solPath at 1297:21-84 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
0.000 0% 0 B 0% 0.000 0% 944 B 0% 1 1 INIT SolPath at 1291:1-1295:2 D:\PhD\Minex_Data\FINAL_PAPER2022\AshesPit200\Ashes_Pit200\Ashes200_data.mod
<<< profile
Kindly suggest how to overcome this problem.
Use better units. An objective value of 8.95478e+11 indicates you are using cents instead of billions of dollars. Also, make sure any big-M constants are not larger than needed.

Tensorflow 2.0 utilize all CPU cores 100%

My Tensorflow model makes heavy use of data preprocessing that should be done on the CPU to leave the GPU open for training.
top - 09:57:54 up 16:23, 1 user, load average: 3,67, 1,57, 0,67
Tasks: 400 total, 1 running, 399 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19,1 us, 2,8 sy, 0,0 ni, 78,1 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
MiB Mem : 32049,7 total, 314,6 free, 5162,9 used, 26572,2 buff/cache
MiB Swap: 6779,0 total, 6556,0 free, 223,0 used. 25716,1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17604 joro 20 0 22,1g 2,3g 704896 S 331,2 7,2 4:39.33 python
This is what top shows me. I would like to make this python process use at least 90% of available CPU across all cores. How can this be achieved?
GPU utilization is better, around 90%. Even though I don't know why it is not at 100%
Mon Aug 10 10:00:13 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A |
| 35% 41C P2 90W / 260W | 10515MiB / 11016MiB | 11% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1128 G /usr/lib/xorg/Xorg 102MiB |
| 0 1648 G /usr/lib/xorg/Xorg 380MiB |
| 0 1848 G /usr/bin/gnome-shell 279MiB |
| 0 10633 G ...uest-channel-token=1206236727 266MiB |
| 0 13794 G /usr/lib/firefox/firefox 6MiB |
| 0 17604 C python 9457MiB |
+-----------------------------------------------------------------------------+
All i found was a solution for tensorflow 1.0:
sess = tf.Session(config=tf.ConfigProto(
intra_op_parallelism_threads=NUM_THREADS))
I have an Intel 9900k and a RTX 2080 Ti and use Ubuntu 20.04
E: When I add the following code on top, it uses 1 core 100%
tf.config.threading.set_intra_op_parallelism_threads(1)
tf.config.threading.set_inter_op_parallelism_threads(1)
But increasing this number to 16 again only utilizes all cores ~30%
Just setting the set_intra_op_parallelism_threads and set_inter_op_parallelism_threads wasn't working for me. Incase someone else is in the same place, after a lot of struggle with the same issue, below piece of code worked for me in limiting the CPU usage of tensorflow below 500%:
import os
import tensorflow as tf
num_threads = 5
os.environ["OMP_NUM_THREADS"] = "5"
os.environ["TF_NUM_INTRAOP_THREADS"] = "5"
os.environ["TF_NUM_INTEROP_THREADS"] = "5"
tf.config.threading.set_inter_op_parallelism_threads(
num_threads
)
tf.config.threading.set_intra_op_parallelism_threads(
num_threads
)
tf.config.set_soft_device_placement(True)
There can be many issues for this, I solved it for me the following way:
Set
tf.config.threading.set_intra_op_parallelism_threads(<Your_Physical_Core_Count>) tf.config.threading.set_inter_op_parallelism_threads(<Your_Physical_Core_Count>)
both to your physical core count. You do not want Hyperthreading for highly vectorized operations as you cannot benefit from parallized operations when there aren't any gaps.
"With a high level of vectorization, the number of execution gaps is
very small and there is possibly insufficient opportunity to make up
any penalty due to increased contention in HT."
From: Saini et al, published by NASAA dvanced Supercomputing Division, 2011: The Impact of Hyper-Threading on Processor
Resource Utilization in Production Applications
EDIT: I am not sure anymore, if one of the two has to be 1. But one 100% needs to be set to Physical.

Tensorflow can't find the GPU

pip install tensorflow==1.14.0
pip install tensorflow-gpu==1.14
import tensorflow as tf
tf.test.is_gpu_available(
cuda_only=False,
min_cuda_compute_capability=None
)
False
GPUs = GPU.getGPUs()
gpu = GPUs[0]
def printm():
process = psutil.Process(os.getpid())
print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available ), " | Proc size: " + humanize.naturalsize( process.memory_info().rss))
print("# of CPU: {0}".format(psutil.cpu_count()))
print("CPU type: {0}".format(platform.uname()))
print("GPU Type: {0}".format(gpu.name))
print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()
Gen RAM Free: 65.8 GB | Proc size: 294.4 MB
# of CPU: 8
CPU type: uname_result(system='Linux', node='lian-2', release='5.3.0-53-generic', version='#47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020', machine='x86_64', processor='x86_64')
GPU Type: GeForce RTX 2080 Ti
GPU RAM Free: 10992MB | Used: 26MB | Util 0% | Total 11018MB
I reinstalled cuda 10
Mon Jun 8 15:48:44 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 24% 28C P8 2W / 260W | 26MiB / 11018MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1805 G /usr/lib/xorg/Xorg 9MiB |
| 0 2192 G /usr/bin/gnome-shell 14MiB |
+-----------------------------------------------------------------------------+
define CUDNN_MAJOR 7
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 2
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
I'm trying to train this from a jupyter notebook (ubuntu 18 ssh):
model = Sequential()
model.add(layers.Conv1D(16, 9, activation='relu', input_shape=(800, X_train.shape[-1])))
model.add(layers.MaxPooling1D(2))
model.add(layers.Bidirectional(layers.CuDNNGRU(8, return_sequences=True)))
model.add(layers.Bidirectional(layers.CuDNNGRU(8)))
model.add(layers.Dense(8, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer= 'Adam', loss='binary_crossentropy', metrics=['acc'])
model.summary()
train_to_epoch = 150
start_epoch = 3
t1 = datetime.datetime.now()
print('Training start time = %s' % t1)
history = model.fit(X_train, y_train,
batch_size=128, epochs=train_to_epoch, verbose=0,
validation_data=(X_val, y_val))
print('\nTraining Duration = %s' % (datetime.datetime.now()-t1))
InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNN' used by {{node bidirectional_1/CudnnRNN}}with these attrs: [input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="gru", seed2=0, is_training=true, seed=87654321, dropout=0]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
[[bidirectional_1/CudnnRNN]]
Maybe is another version of CUDA needed?
Yes, it is the CUDA 10.1 version.

Actual and Percentage Difference on consecutive columns in a Pandas or Pyspark Dataframe

I would like to perform two different calculations across consecutive columns in a pandas or pyspark dataframe.
Columns are weeks and the metrics are displayed as rows.
I want to calculate the actual and percentage differences across the columns.
The input/output tables incl. the calculations used in Excel are displayed in the following image.
I want to replicate these calculations on a pandas or pyspark dataframe.
Raw Data Attached:
Metrics Week20 Week21 Week22 Week23 Week24 Week25 Week26 Week27
Sales 20301 21132 20059 23062 19610 22734 22140 20699
TRXs 739 729 690 779 701 736 762 655
Attachment Rate 4.47 4.44 4.28 4.56 4.41 4.58 4.55 4.96
AOV 27.47 28.99 29.07 29.6 27.97 30.89 29.06 31.6
Profit 5177 5389 5115 5881 5001 5797 5646 5278
Profit per TRX 7.01 7.39 7.41 7.55 7.13 7.88 7.41 8.06
in pandas you could use pct_change(axis=1) and diff(axis=1) methods:
df = df.set_index('Metrics')
# list of metrics with "actual diff"
actual = ['AOV', 'Attachment Rate']
rep = (df[~df.index.isin(actual)].pct_change(axis=1).round(2)*100).fillna(0).astype(str).add('%')
rep = pd.concat([rep,
df[df.index.isin(actual)].diff(axis=1).fillna(0)
])
In [131]: rep
Out[131]:
Week20 Week21 Week22 Week23 Week24 Week25 Week26 Week27
Metrics
Sales 0.0% 4.0% -5.0% 15.0% -15.0% 16.0% -3.0% -7.0%
TRXs 0.0% -1.0% -5.0% 13.0% -10.0% 5.0% 4.0% -14.0%
Profit 0.0% 4.0% -5.0% 15.0% -15.0% 16.0% -3.0% -7.0%
Profit per TRX 0.0% 5.0% 0.0% 2.0% -6.0% 11.0% -6.0% 9.0%
Attachment Rate 0 -0.03 -0.16 0.28 -0.15 0.17 -0.03 0.41
AOV 0 1.52 0.08 0.53 -1.63 2.92 -1.83 2.54

Confusing DebugDiag report to identify High Memory usage

Recently I got a new batch of dumps to identify the HighMemory usage in 3 of our WCF Services. Which are hosted on 64Bit AppPool and Windows Server 2012.
Application one :
ProcessUp Time : 22 days
GC Heap usage : 2.69 Gb
Loaded Modules : 220 Mb
Commited Memory : 3.08 Gb
Native memory : 2 Gb
Issue identified as large GC heap usage is due to un closed WCF client proxy objects. Which are accounting for almost 2.26 Gb and rest for cache in GC heap.
Application Two :
ProcessUp Time : 9 Hours
GC Heap usage : 4.43 Gb
Cache size : 2.45 Gb
Loaded Modules : 224 Mb
Commited Memory : 5.13 Gb
Native memory heap : 2 Gb
Issue identified as most of the objects are of System.Web.CaheEntry, as they are due to large cache size. 2.2 Gb of String object on Gc heap has roots to CacheRef root objects.
Application Three :
Cache size : 950 Mb
GC heap : 1.2 Gb
Native Heap : 2 Gb
We recently upgrade to Windows Server 2012, I had old dumps as well. Those dumps does not show the native heap for the same application. It was only around 90 Mb.
I also use WinDbg to explore the Native heap with !heap -s command.
which shows very minimal native heap sizes as shown below.
I am just confused Why DebugDiag 2.0 is showing 2Gb of Native Heap in every WCF service. My understand is that !heap -s should also dump the same native heaps and it should match the debug diag reports graphs. Report also shows values in Thoushand of TBytes.
0:000> !heap -s
LFH Key : 0x53144a890e31e98b
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000000fc42c10000 00000002 32656 31260 32552 2885 497 6 2 f LFH
000000fc42a40000 00008000 64 4 64 2 1 1 0 0
000000fc42bf0000 00001002 3228 1612 3124 43 9 3 0 0 LFH
000000fc43400000 00001002 1184 76 1080 1 5 2 0 0 LFH
000000fc43390000 00001002 1184 148 1080 41 7 2 0 0 LFH
000000fc43d80000 00001002 60 8 60 5 1 1 0 0
000000fc433f0000 00001002 60 8 60 5 1 1 0 0
000000fc442a0000 00001002 1184 196 1080 1 6 2 0 0 LFH
000000fc44470000 00041002 60 8 60 5 1 1 0 0
000001008e9f0000 00041002 164 40 60 3 1 1 0 0 LFH
000001008f450000 00001002 3124 1076 3124 1073 3 3 0 0
External fragmentation 99 % (3 free blocks)
-------------------------------------------------------------------------------------
Can anybody explain me why WinDbg command !heap -s and DebugDiag report is varing. Or I have incorrect knowledge of above command.
I also use Pykd script to dump native object stats. Which does not show much large number of objects.
Also what is mean by External fragmentation 99 % (3 free blocks)
in above output. I understand that fragmented memory has less large block of continuous memory place. But fail to relate it with Percentage.
Edit 1 :
Application 2 :
0:000> !address -summary
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 363 7ffb`04a14000 ( 127.981 Tb) 99.98%
<unknown> 952 4`e8c0c000 ( 19.637 Gb) 98.54% 0.01%
Image 2122 0`0e08d000 ( 224.551 Mb) 1.10% 0.00%
Heap 88 0`03372000 ( 51.445 Mb) 0.25% 0.00%
Stack 124 0`013c0000 ( 19.750 Mb) 0.10% 0.00%
Other 7 0`001be000 ( 1.742 Mb) 0.01% 0.00%
TEB 41 0`00052000 ( 328.000 kb) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kb) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 643 4`ebf44000 ( 19.687 Gb) 98.79% 0.02%
MEM_IMAGE 2655 0`0eb96000 ( 235.586 Mb) 1.15% 0.00%
MEM_MAPPED 37 0`00b02000 ( 11.008 Mb) 0.05% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 363 7ffb`04a14000 ( 127.981 Tb) 99.98%
MEM_RESERVE 725 3`b300d000 ( 14.797 Gb) 74.25% 0.01%
MEM_COMMIT 2610 1`485cf000 ( 5.131 Gb) 25.75% 0.00%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 868 1`3939d000 ( 4.894 Gb) 24.56% 0.00%
PAGE_EXECUTE_READ 157 0`09f10000 ( 159.063 Mb) 0.78% 0.00%
PAGE_READONLY 890 0`035ed000 ( 53.926 Mb) 0.26% 0.00%
PAGE_WRITECOPY 433 0`0149c000 ( 20.609 Mb) 0.10% 0.00%
PAGE_EXECUTE_READWRITE 148 0`0065d000 ( 6.363 Mb) 0.03% 0.00%
PAGE_EXECUTE_WRITECOPY 67 0`0017c000 ( 1.484 Mb) 0.01% 0.00%
PAGE_READWRITE|PAGE_GUARD 41 0`000b9000 ( 740.000 kb) 0.00% 0.00%
PAGE_NOACCESS 4 0`00004000 ( 16.000 kb) 0.00% 0.00%
PAGE_EXECUTE 2 0`00003000 ( 12.000 kb) 0.00% 0.00%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free 101`070a0000 7ef6`e77b2000 ( 126.964 Tb)
<unknown> fd`72f14000 0`d156c000 ( 3.271 Gb)
Image 7ff9`91344000 0`012e8000 ( 18.906 Mb)
Heap 100`928a0000 0`00544000 ( 5.266 Mb)
Stack fc`43240000 0`0007b000 ( 492.000 kb)
Other fc`42ea0000 0`00181000 ( 1.504 Mb)
TEB 7ff7`ee852000 0`00002000 ( 8.000 kb)
PEB 7ff7`eeaaf000 0`00001000 ( 4.000 kb)
Application Three :
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 323 7ffb`9f8ea000 ( 127.983 Tb) 99.99%
<unknown> 832 4`4bbb6000 ( 17.183 Gb) 98.15% 0.01%
Image 2057 0`0e5ab000 ( 229.668 Mb) 1.28% 0.00%
Heap 196 0`04f52000 ( 79.320 Mb) 0.44% 0.00%
Stack 127 0`01440000 ( 20.250 Mb) 0.11% 0.00%
Other 7 0`001be000 ( 1.742 Mb) 0.01% 0.00%
TEB 42 0`00054000 ( 336.000 kb) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kb) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 783 4`51099000 ( 17.266 Gb) 98.63% 0.01%
MEM_IMAGE 2444 0`0ec06000 ( 236.023 Mb) 1.32% 0.00%
MEM_MAPPED 35 0`00a67000 ( 10.402 Mb) 0.06% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 323 7ffb`9f8ea000 ( 127.983 Tb) 99.99%
MEM_RESERVE 621 3`e3504000 ( 15.552 Gb) 88.83% 0.01%
MEM_COMMIT 2641 0`7d202000 ( 1.955 Gb) 11.17% 0.00%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 919 0`6dc07000 ( 1.715 Gb) 9.80% 0.00%
PAGE_EXECUTE_READ 153 0`0a545000 ( 165.270 Mb) 0.92% 0.00%
PAGE_READONLY 734 0`02cf5000 ( 44.957 Mb) 0.25% 0.00%
PAGE_WRITECOPY 470 0`01767000 ( 23.402 Mb) 0.13% 0.00%
PAGE_EXECUTE_READWRITE 240 0`009cf000 ( 9.809 Mb) 0.05% 0.00%
PAGE_EXECUTE_WRITECOPY 76 0`001c5000 ( 1.770 Mb) 0.01% 0.00%
PAGE_READWRITE|PAGE_GUARD 42 0`000be000 ( 760.000 kb) 0.00% 0.00%
PAGE_NOACCESS 5 0`00005000 ( 20.000 kb) 0.00% 0.00%
PAGE_EXECUTE 2 0`00003000 ( 12.000 kb) 0.00% 0.00%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free 52`892e0000 7fa5`65548000 ( 127.646 Tb)
<unknown> 4f`4ec81000 0`e9c3f000 ( 3.653 Gb)
Image 7ff9`91344000 0`012e8000 ( 18.906 Mb)
Heap 52`8833b000 0`00fa4000 ( 15.641 Mb)
Stack 4e`37a70000 0`0007b000 ( 492.000 kb)
Other 4e`37720000 0`00181000 ( 1.504 Mb)
TEB 7ff7`ee828000 0`00002000 ( 8.000 kb)
PEB 7ff7`eea43000 0`00001000 ( 4.000 kb)