Kuwahara filter with performance issues - optimization

On implementing an edge preserving filter similar to ImageJ's Kuwahara filter, which assigns each pixel to the mean of the area with the smallest deviation around it, I'm struggling with performance issues.
Counterintuitively, the calculation of means and deviations to separate matrices is fast compared to the final resorting to compile the output array. The ImageJ implementation above seems to expect about 70% of total processing time for this step though.
Given two arrays means and stds, whose sizes are 2 kernel sizes p bigger than the output array 'res' in each axis, I want to assign a pixel to the mean of the area with the smallest deviation:
#vector to middle of surrounding area (approx.)
p2 = p/2
# x and y components of vector to each quadrant
index2quadrant = np.array([[1, 1, -1, -1],[1, -1, 1, -1]]) * p2
Iterate over all pixels of output array of shape (asize, asize):
for col in np.arange(asize) + p:
for row in np.arange(asize) + p:
Searching for the minimum std dev in the 4 quadrants around the current coordinate, and using the corresponding index to assign the previously computed mean:
minidx = np.argmin(stds[index2quadrant[0] + col, index2quadrant[1] + row])
#assign mean
res[col - p, row - p] = means[index2quadrant[:,minidx][0] + col,index2quadrant[:,minidx][1] + row]
The Python profiler gives the following results on filtering a 1024x1024 array with a 8x8 pixel kernel:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 30.024 30.024 <string>:1(<module>)
1048576 2.459 0.000 4.832 0.000 fromnumeric.py:740(argmin)
1 23.720 23.720 30.024 30.024 kuwahara.py:4(kuwahara)
2 0.000 0.000 0.012 0.006 numeric.py:65(zeros_like)
2 0.000 0.000 0.000 0.000 {math.log}
1048576 2.373 0.000 2.373 0.000 {method 'argmin' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.012 0.006 0.012 0.006 {method 'fill' of 'numpy.ndarray' objects}
8256 0.189 0.000 0.189 0.000 {method 'mean' of 'numpy.ndarray' objects}
16512 0.524 0.000 0.524 0.000 {method 'reshape' of 'numpy.ndarray' objects}
8256 0.730 0.000 0.730 0.000 {method 'std' of 'numpy.ndarray' objects}
1042 0.012 0.000 0.012 0.000 {numpy.core.multiarray.arange}
1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty_like}
2 0.003 0.002 0.003 0.002 {numpy.core.multiarray.zeros}
8 0.002 0.000 0.002 0.000 {zip}
For me, there is not much of an indication (-> numpy?), where the time is lost, since except for argmin, the total time seems to be negligible.
Do you have any suggestions, how to improve performance?

Related

Find first and last positive value of every season over 50 years

i've seen some similar question but can't figure out how to handle my problem.
I have a dataset with evereyday total snow values from 1970 till 2015.
Now i want to find out when there was the first and the last day with snow.
I want to do this for every season.
One season should be from, for example 01.06.2000 - 30.5.2001, this season is then Season 2000/2001.
I have already set my date column as index(format year-month-day, 2006-04-24)
When I select a specific range with
df_s = df["2006-04-04" : "2006-04-15"]
I am able to find out the first and last day with snow in this period with
firstsnow = df_c[df_c['Height'] > 0].head(1)
lastsnow = df_c[df_c['Height'] > 0].tail(1)
I want to do this now for the whole dataset, so that I'm able to compare each season and see how the time of first snow changed.
My dataframe looks like this(here you see a selected period with values),Height is Snowheight, Diff is the difference to the previous day. Height and Diff are Float64.
Height Diff
Date
2006-04-04 0.000 NaN
2006-04-05 0.000 0.000
2006-04-06 0.000 0.000
2006-04-07 16.000 16.000
2006-04-08 6.000 -10.000
2006-04-09 0.001 -5.999
2006-04-10 0.000 -0.001
2006-04-11 0.000 0.000
2006-04-12 0.000 0.000
2006-04-13 0.000 0.000
2006-04-14 0.000 0.000
2006-04-15 0.000 0.000
(12, 2)
<class 'pandas.core.frame.DataFrame'>
I think i have to work with the groupby function, but i don't know how to apply this function in this case.
You can use the trick to create new column with only positive value, and None otherwise. Then use ffill and bfill to get the head and tail
Sample data:
df = pd.DataFrame({'name': ['a1','a2','a3','a4','a5','b1','b2','b3','b4','b5'],
'gr':[1]*5+[2]*5,
'val1':[None,-1,2,1,None,-1,4,7,3,-2]})
Input:
name gr val1
0 a1 1 NaN
1 a2 1 -1.0
2 a3 1 2.0
3 a4 1 1.0
4 a5 1 NaN
5 b1 2 -1.0
6 b2 2 4.0
7 b3 2 7.0
8 b4 2 3.0
9 b5 2 -2.0
Set positive then ffill and bfill:
df['positive'] = np.where(df['val1']>0, df['val1'], None)
df['positive'] = df.groupby('gr')['positive'].apply(lambda g: g.ffill())
df['positive'] = df.groupby('gr')['positive'].apply(lambda g: g.bfill())
Check result:
df.groupby('gr').head(1)
df.groupby('gr').tail(1)
name gr val1 positive
0 a1 1 NaN 2.0
5 b1 2 -1.0 4.0
name gr val1 positive
4 a5 1 NaN 1.0
9 b5 2 -2.0 3.0

comparison between two dataframes and find highest difference

I have two dataframes df1 and df2. Both are indexed the same with [i_batch, i_example]
The columns are different rmse errors. I would like to find [i_batch, i_example] that df1 is a lot lower than df2, or find the rows that df1 has less error than df2 based on the common [i_batch, i_example].
Note that it is possible that a specific [i_batch, i_example] only happens in one of the df1 or df2. But I need to only consider [i_batch, i_example] that exists in both df1 and df2.
df1 =
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY rmse_WIDTH
i_batch i_example
0 0.0 1.064 1.018 0.995 0.991 1.190 0.967 1.029 1.532
1 0.0 1.199 1.030 1.007 1.048 1.278 0.967 1.156 1.468
1.0 1.101 1.026 1.114 2.762 0.967 0.967 1.083 1.186
2 0.0 1.681 1.113 1.090 1.001 1.670 0.967 1.205 1.160
1.0 1.637 1.122 1.183 0.987 1.521 0.967 1.191 1.278
2.0 1.252 1.035 1.035 2.507 1.108 0.967 1.210 1.595
3 0.0 1.232 1.014 1.019 1.627 1.143 0.967 1.080 1.583
1.0 1.195 1.028 1.019 1.151 1.097 0.967 1.071 1.549
2.0 1.233 1.010 1.004 1.616 1.135 0.967 1.082 1.573
3.0 1.179 1.017 1.014 1.368 1.132 0.967 1.099 1.518
and
df2 =
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY rmse_WIDTH
i_batch i_example
1 0.0 0.071 0.034 0.048 0.114 0.006 1.309e-03 0.461 0.004
1.0 0.052 0.055 0.062 2.137 0.023 8.232e-04 0.357 0.011
2 0.0 1.665 0.156 0.178 0.112 0.070 3.751e-03 2.326 0.016
1.0 0.880 0.210 0.088 0.055 0.202 1.449e-03 0.899 0.047
2.0 0.199 0.072 0.078 1.686 0.010 6.240e-04 0.239 0.008
3 0.0 0.332 0.068 0.097 1.211 0.022 5.127e-04 0.167 0.016
1.0 0.252 0.075 0.070 0.368 0.013 5.295e-04 0.136 0.008
2.0 0.268 0.067 0.064 1.026 0.010 5.564e-04 0.175 0.010
3.0 0.171 0.051 0.054 0.473 0.011 4.150e-04 0.220 0.009
5 0.0 0.014 0.099 0.119 0.389 0.123 3.846e-04 0.313 0.037
For instance how can I get the [i_batch, i_example] that `df1[rmse_ACCELERATION] < df1[rmse_ACCELERATION]'?
Do a merge and then just filter according to your needs
df_merge = df_1.merge(df_2,
left_index=True,
right_index=True,
suffixes=('_1','_2'))
df_merge[
df_merge['rmse_ACCELERATION_1'] < df_merge['rmse_ACCELERATION_2']
].index
However I don't see any records with same [i_batch, i_example] in both dataframes that passes the condition
Use .sub(), that directly matches the indices and subtracts matches.
df3=df1.sub(df2)
df3[(df3<0).any(1)]
Or go specific and try searching in df1 by
df1[(df1.sub(df2)<0).any(1)]
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y \
i_batch i_example
2 0.0 0.016 0.957 0.912
rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY \
i_batch i_example
2 0.0 0.889 1.6 0.963249 -1.121
rmse_WIDTH
i_batch i_example
2 0.0 1.144

resnet50 (bs 128) training steps is not showing up on MAC(Darwin Kernel Version 18.2.0), but showing up when training on Linux (Ubuntu 18.04)

I was using below command on both Linux and MAC, there is no problem on Linux:
python tf_cnn_benchmarks.py --model=resnet50_v2 --num_inter_threads=2 --batch_size=128 --num_batches=3000 --data_format NCHW --train_dir /tmp/output_dir --data_name=imagenet --data_dir /xxx/xxxx/xxxx/ --datasets_use_prefetch=False --save_model_steps=10000 --print_training_accuracy=True --summary_verbosity=2 --eval_during_training_every_n_epochs=1 --num_learning_rate_warmup_epochs=5 --num_epochs_per_decay=30 --learning_rate_decay_factor=0.1 --variable_update=parameter_server --optimizer=momentum --init_learning_rate=0.05 --num_eval_epochs=1 --save_summaries_steps=50
On my MAC, the training log is like below:
TensorFlow: 1.14
Model: resnet50_v2
Dataset: imagenet
Mode: training + evaluation
SingleSess: False
Batch size: 128 global
128 per device
Num batches: x000
Num epochs: 0.x0
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: momentum
Variables: parameter_server
==========
Generating training model
Generating evaluation model
Initializing graph
Running warm up
Done with training
Running final evaluation at global_step xx10
10 12.7 examples/sec
20 13.1 examples/sec
issue is:
before "done with training", it supposed to show training steps like below, but it does not.
Running warm up
Done warm up
Step Img/sec total_loss top_1_accuracy top_5_accuracy
1 images/sec: 13.1 +/- 0.0 (jitter = 0.0) 7.493 0.000 0.000
10 images/sec: 31.6 +/- 4.5 (jitter = 14.2) 7.504 0.000 0.000
20 images/sec: 31.9 +/- 2.7 (jitter = 10.7) 7.490 0.000 0.008
30 images/sec: 32.9 +/- 2.2 (jitter = 10.1) 7.515 0.000 0.000
40 images/sec: 33.4 +/- 2.0 (jitter = 12.1) 7.495 0.016 0.016
....
on Linux, the logs shown are correct/as expected:
TensorFlow: 1.14
Model: resnet50_v2
Dataset: imagenet
Mode: training + evaluation
SingleSess: False
Batch size: 128 global
128 per device
Num batches: x000
Num epochs: 0.x0
Devices: ['/gpu:0']
Data format: NCHW
Optimizer: momentum
Variables: parameter_server
==========
Generating training model
Generating evaluation model
Initializing graph
Running warm up
Done warm up
Step Img/sec total_loss top_1_accuracy top_5_accuracy
1 images/sec: 13.1 +/- 0.0 (jitter = 0.0) 7.493 0.000 0.000
10 images/sec: 31.6 +/- 4.5 (jitter = 14.2) 7.504 0.000 0.000
20 images/sec: 31.9 +/- 2.7 (jitter = 10.7) 7.490 0.000 0.008
30 images/sec: 32.9 +/- 2.2 (jitter = 10.1) 7.515 0.000 0.000
40 images/sec: 33.4 +/- 2.0 (jitter = 12.1) 7.495 0.016 0.016
50 images/sec: 32.7 +/- 1.8 (jitter = 9.3) 7.495 0.000 0.000
60 images/sec: 33.5 +/- 1.6 (jitter = 9.4) 7.536 0.000 0.000
70 images/sec: 35.3 +/- 1.4 (jitter = 9.7) 7.521 0.000 0.000
.....
Thanks a lot!

Restart cumulative sum in pandas dataframe

I am trying to start a cumulative sum in a pandas dataframe, restarting everytime the absolute value is higher than 0.009. Could give you a excerpt of my tries but I assume they would just distract you. Have tried several things with np.where but at a certain point they start to overlap and it takes wrong things out.
Column b is the desired output.
df = pd.DataFrame({'values':(49.925,49.928,49.945,49.928,49.925,49.935,49.938,49.942,49.931,49.952)})
df['a']=df.diff()
values a b
0 49.925 NaN 0.000
1 49.928 0.003 0.003
2 49.945 0.017 0.020 (restart cumsum next row)
3 49.928 -0.017 -0.017 (restart cumsum next row)
4 49.925 -0.003 -0.003
5 49.935 0.010 0.007
6 49.938 0.003 0.010 (restart cumsum next row)
7 49.942 0.004 0.004
8 49.931 -0.011 -0.007
9 49.952 0.021 0.014 (restart cumsum next row)
So the actual objective is for python to understand that I want to restart the cumulative sum when it exceeds the absolute value of 0.009
I couldn't solve this in a vectorized manner, however applying a stateful function appears to work.
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
df = pd.DataFrame({'values':(49.925,49.928,49.945,49.928,49.925,49.935,49.938,49.942,49.931,49.952)})
df['a']=df.diff()
accumulator = 0.0
reset = False
def myfunc(x):
global accumulator, reset
if(reset):
accumulator = 0.0
reset = False
accumulator += x
if abs(accumulator) > .009:
reset = True
return accumulator
df['a'].fillna(value=0, inplace=True)
df['b'] = df['a'].apply(myfunc)
print(df)
Produces
0.24.2
values a b
0 49.925 0.000 0.000
1 49.928 0.003 0.003
2 49.945 0.017 0.020
3 49.928 -0.017 -0.017
4 49.925 -0.003 -0.003
5 49.935 0.010 0.007
6 49.938 0.003 0.010
7 49.942 0.004 0.004
8 49.931 -0.011 -0.007
9 49.952 0.021 0.014

Pandas DataFrame Transpose and Matrix Multiplication

I am looking for a way to perform a matrix multiplication on two sets of columns in a dataframe. One set of columns will need to be transposed and then multiplied with the other set. Then I need to take the resulting matrix and do an element wise product with a scalar matrix and add up. Below is an example:
Data for testing:
import pandas as pd
import numpy as np
dftest = pd.DataFrame(data=[['A',0.18,0.25,0.36,0.21,0,0.16,0.16,0.64,0.04,0,0],['B',0,0,0.5,0.5,0,0,0,0.25,0.75,0,0]],columns = ['Ticker','f1','f2','f3','f4','f5','p1','p2','p3','p4','p5','multiplier'])
Starting dataframe with data for Tickers. f1 through f5 represent one set of categories and p1 through p5 represent another.
dftest
Out[276]:
Ticker f1 f2 f3 f4 f5 p1 p2 p3 p4 p5 multiplier
0 A 0.18 0.25 0.36 0.21 0 0.16 0.16 0.64 0.04 0 0
1 B 0.00 0.00 0.50 0.50 0 0.00 0.00 0.25 0.75 0 0
For each row, I need to do transpose columns p1 through p5 and then multiply them to columns f1 through f5. I think I have found the solution using below.
dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]))
Out[408]:
f1 f2 f3 f4 f5
Ticker
A p1 0.0288 0.04 0.0576 0.0336 0.0
p2 0.0288 0.04 0.0576 0.0336 0.0
p3 0.1152 0.16 0.2304 0.1344 0.0
p4 0.0072 0.01 0.0144 0.0084 0.0
p5 0.0000 0.00 0.0000 0.0000 0.0
B p1 0.0000 0.00 0.0000 0.0000 0.0
p2 0.0000 0.00 0.0000 0.0000 0.0
p3 0.0000 0.00 0.1250 0.1250 0.0
p4 0.0000 0.00 0.3750 0.3750 0.0
p5 0.0000 0.00 0.0000 0.0000 0.0
Next I need to do a element wise product of the above matrix against another 5x5 matrix that is in another DataFrame and then add up the columns or rows (you get the same result either way). If I extend the above statement as below, I get the result I want.
dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: pd.DataFrame(m.values * x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]).values, columns = m.columns, index = m.index).sum().sum())
Out[409]:
Ticker
A 2.7476
B 1.6250
dtype: float64
So far so good, I think. Happy to know a better and faster way to do this. The next question and this is where I am stuck.
How do I take this and update the "multiplier" column on my original dataFrame?
if I try to do the following:
dftest['multiplier']=dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: pd.DataFrame(m.values * x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]).values, columns = m.columns, index = m.index).sum().sum())
I get NaNs in the multiplier column.
dftest
Out[407]:
Ticker f1 f2 f3 f4 f5 p1 p2 p3 p4 p5 multiplier
0 A 0.18 0.25 0.36 0.21 0 0.16 0.16 0.64 0.04 0 NaN
1 B 0.00 0.00 0.50 0.50 0 0.00 0.00 0.25 0.75 0 NaN
I suspect it has to do with indexing and whether all the indices after grouping are translating back to the original dataframe. Second, do I need a group by statement for this one? Since it is a row by row solution, can't I just do it without grouping or group by the index? any suggestions on that?
I need to do this without iterating row by row because the whole code will iterate due to some optimization I have to do. So I need to run this whole process, look at the results and if they are outside some constraints, calculate new f1 through f5 and p1 through p5 and run the whole thing again.
I posted a question on this earlier but it was confusing so this a second attempt. Hope it makes sense.
Thanks in advance for all your help.