So I have this dataset, that the first column starts with the name inside quotes. Is it possible to capture the name as a single field?
"Mazda RX4" 21 6 160 110 3.9 2.62 16.46 0 1 4 4
"Mazda RX4 Wag" 21 6 160 110 3.9 2.875 17.02 0 1 4 4
"Datsun 710" 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
"Hornet 4 Drive" 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
"Hornet Sportabout" 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
"Valiant" 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
"Duster 360" 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4
"Merc 240D" 24.4 4 146.7 62 3.69 3.19 20 1 0 4 2
"Merc 230" 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
"Merc 280" 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
Note that sometimes the name is single field (like "Valiant"), sometimes 2 (like "Mazda RX4" or 3 "Mazda RX4 Wag")
So base on the number of fields, I came up with this awk code that works as I wanted, however I wonder if there is any other systematic way to do so?
awk '{name=$1; for (i=2; i<=NF-11; i++) name=name " " $i; printf "%s\n", name}' data/mtcars.dat | head
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
You could use " as the input field separator. That would assign an empty field to $1, the full name to $2, and the rest of the line to $3.
$ awk 'BEGIN{FS="\""}{print $2}' < test.dat
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Just to make it as short as possible:
awk -F\" '$0=$2' file
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Or some more robust:
awk -F\" '{$0=$2}1' file
awk NF=1 FPAT='[^"]+'
Result
Mazda RX4
Mazda RX4 Wag
Datsun 710
Hornet 4 Drive
Hornet Sportabout
Valiant
Duster 360
Merc 240D
Merc 230
Merc 280
Related
I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
B 01-07-19 4 509 690 406
B 03-07-19 2 413 725 414
B 04-07-19 2 170 702 912
B 09-08-19 3 851 616 477
B 10-08-19 9 475 447 555
B 11-08-19 1 412 403 708
B 12-08-19 2 299 537 321
B 13-08-19 4 310 119 125
C 14-08-19 4 912 755 657
C 15-08-19 4 586 771 394
C 17-08-19 2 500 528 764
C 18-08-19 1 982 383 654
C 20-08-19 3 336 691 496
C 21-08-19 3 206 433 263
C 22-08-19 2 373 319 111
D 10-12-18 2 170 702 912
E 10-12-18 2 912 755 657
E 14-12-18 2 373 319 111
I want to shift values in each column (among 123_Var 456_Var 789_Var columns).
The value will be shifted only if there's a one day difference, otherwise, a NaN value will be remained.
The shifting should be applied for each ID separately. (by Groupby.)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Var_S 456_Var_S 789_Var_S
A 16-07-19 3 777 250 810 NaN NaN NaN
A 17-07-19 9 637 121 529 777.0 250.0 810.0
A 20-07-19 2 295 272 490 NaN NaN NaN
A 21-07-19 3 778 600 544 295.0 272.0 490.0
A 22-07-19 6 741 792 907 778.0 600.0 544.0
B 01-07-19 4 509 690 406 NaN NaN NaN
B 03-07-19 2 413 725 414 NaN NaN NaN
B 04-07-19 2 170 702 912 413.0 725.0 414.0
B 09-08-19 3 851 616 477 NaN NaN NaN
B 10-08-19 9 475 447 555 851.0 616.0 477.0
B 11-08-19 1 412 403 708 475.0 447.0 555.0
B 12-08-19 2 299 537 321 412.0 403.0 708.0
B 13-08-19 4 310 119 125 299.0 537.0 321.0
C 14-08-19 4 912 755 657 NaN NaN NaN
C 15-08-19 4 586 771 394 912.0 755.0 657.0
C 17-08-19 2 500 528 764 NaN NaN NaN
C 18-08-19 1 982 383 654 500.0 528.0 764.0
C 20-08-19 3 336 691 496 NaN NaN NaN
C 21-08-19 3 206 433 263 336.0 691.0 496.0
C 22-08-19 2 373 319 111 206.0 433.0 263.0
D 10-12-18 2 170 702 912 NaN NaN NaN
E 10-12-18 2 912 755 657 NaN NaN NaN
E 14-12-18 2 373 319 111 NaN NaN NaN
IIUC,
we can groupby, apply a filter and use .loc along with shift to assign your values:
df['Date'] = df['Date'].apply(pd.to_datetime,format='%d-%m-%y')
s = df.groupby('ID')['Date'].apply(lambda x : (x - x.shift()).eq('1 days'))
cols = df.filter(like='Var').columns.map(lambda x : x + '_S')
df[cols] = df.filter(like='Var').shift()
df.loc[~s,cols]= np.nan
print(df)
ID Date X 123_Var 456_Var 789_Var 123_Var_S 456_Var_S \
0 A 2019-07-16 3 777 250 810 NaN NaN
1 A 2019-07-17 9 637 121 529 777.0 250.0
2 A 2019-07-20 2 295 272 490 NaN NaN
3 A 2019-07-21 3 778 600 544 295.0 272.0
4 A 2019-07-22 6 741 792 907 778.0 600.0
5 B 2019-07-01 4 509 690 406 NaN NaN
6 B 2019-07-03 2 413 725 414 NaN NaN
7 B 2019-07-04 2 170 702 912 413.0 725.0
8 B 2019-08-09 3 851 616 477 NaN NaN
9 B 2019-08-10 9 475 447 555 851.0 616.0
10 B 2019-08-11 1 412 403 708 475.0 447.0
11 B 2019-08-12 2 299 537 321 412.0 403.0
12 B 2019-08-13 4 310 119 125 299.0 537.0
13 C 2019-08-14 4 912 755 657 NaN NaN
14 C 2019-08-15 4 586 771 394 912.0 755.0
15 C 2019-08-17 2 500 528 764 NaN NaN
16 C 2019-08-18 1 982 383 654 500.0 528.0
17 C 2019-08-20 3 336 691 496 NaN NaN
18 C 2019-08-21 3 206 433 263 336.0 691.0
19 C 2019-08-22 2 373 319 111 206.0 433.0
20 D 2018-12-10 2 170 702 912 NaN NaN
21 E 2018-12-10 2 912 755 657 NaN NaN
22 E 2018-12-14 2 373 319 111 NaN NaN
789_Var_S
0 NaN
1 810.0
2 NaN
3 490.0
4 544.0
5 NaN
6 NaN
7 414.0
8 NaN
9 477.0
10 555.0
11 708.0
12 321.0
13 NaN
14 657.0
15 NaN
16 764.0
17 NaN
18 496.0
19 263.0
20 NaN
21 NaN
22 NaN
You may want to consider this approach with iterrows():
for index, row in df.iterrows():
if df.loc[index, 'Date'] == df.loc[index-1, 'Date'] + pd.Timedelta(days=1):
df.loc[index, '123_Var_S'] = df.loc[index-1, '123_Var']
df.loc[index, '456_Var_S'] = df.loc[index-1, '456_Var']
df.loc[index, '789_Var_S'] = df.loc[index-1, '789_Var']
I have this data frame:
ID Date X 123_Var 456_Var 789_Var
A 16-07-19 3 777 250 810
A 17-07-19 9 637 121 529
A 18-07-19 7 878 786 406
A 19-07-19 4 656 140 204
A 20-07-19 2 295 272 490
A 21-07-19 3 778 600 544
A 22-07-19 6 741 792 907
B 01-07-19 4 509 690 406
B 02-07-19 2 732 915 199
B 03-07-19 2 413 725 414
B 04-07-19 2 170 702 912
B 09-08-19 3 851 616 477
B 10-08-19 9 475 447 555
B 11-08-19 1 412 403 708
B 12-08-19 2 299 537 321
B 13-08-19 4 310 119 125
C 01-12-18 4 912 755 657
C 02-12-18 4 586 771 394
C 04-12-18 9 498 122 193
C 05-12-18 2 500 528 764
C 06-12-18 1 982 383 654
C 07-12-18 1 299 496 488
C 08-12-18 3 336 691 496
C 09-12-18 3 206 433 263
C 10-12-18 2 373 319 111
I want to show the minimum value between current row and previous row values, for each column in 123_Var 456_Var 789_Var set.
That should be applied separately for each ID. (Groupby.)
The first row of each ID, will show the current value. (Since there's no "previous" value to compare.)
Expected result:
ID Date X 123_Var 456_Var 789_Var 123_Min2 456_Min2 789_Min2
A 16-07-19 3 777 250 810 777 250 810
A 17-07-19 9 637 121 529 637 121 529
A 18-07-19 7 878 786 406 637 121 406
A 19-07-19 4 656 140 204 656 140 204
A 20-07-19 2 295 272 490 295 140 204
A 21-07-19 3 778 600 544 295 272 490
A 22-07-19 6 741 792 907 741 600 544
B 01-07-19 4 509 690 406 509 690 406
B 02-07-19 2 732 915 199 509 690 199
B 03-07-19 2 413 725 414 413 725 199
B 04-07-19 2 170 702 912 170 702 414
B 09-08-19 3 851 616 477 170 616 477
B 10-08-19 9 475 447 555 475 447 477
B 11-08-19 1 412 403 708 412 403 555
B 12-08-19 2 299 537 321 299 403 321
B 13-08-19 4 310 119 125 299 119 125
C 01-12-18 4 912 755 657 912 755 657
C 02-12-18 4 586 771 394 586 755 394
C 04-12-18 9 498 122 193 498 122 193
C 05-12-18 2 500 528 764 498 122 193
C 06-12-18 1 982 383 654 500 383 654
C 07-12-18 1 299 496 488 299 383 488
C 08-12-18 3 336 691 496 299 496 488
C 09-12-18 3 206 433 263 206 433 263
C 10-12-18 2 373 319 111 206 319 111
IIUC, We use groupby.shift to select the previous var for each ID, then we can use DataFrame.where
to leave only the cells where the previous value is lower than the current value and fill with the current value in the rest. We use DataFrame.add_suffix to add _Min2 and we join with df with DataFrame.join
df_vars = df[['123_Var','456_Var','789_Var']]
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.shift()
.fillna(df_vars)
.where(lambda x: x.le(df_vars),df_vars)
.add_suffix('_Min2')
)
print(df)
Output
ID Date X 123_Var 456_Var 789_Var 123_Var_Min2 456_Var_Min2 789_Var_Min2
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 656.0 140.0 204.0
4 A 20-07-19 2 295 272 490 295.0 140.0 204.0
5 A 21-07-19 3 778 600 544 295.0 272.0 490.0
6 A 22-07-19 6 741 792 907 741.0 600.0 544.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 725.0 199.0
10 B 04-07-19 2 170 702 912 170.0 702.0 414.0
11 B 09-08-19 3 851 616 477 170.0 616.0 477.0
12 B 10-08-19 9 475 447 555 475.0 447.0 477.0
13 B 11-08-19 1 412 403 708 412.0 403.0 555.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 500.0 383.0 654.0
21 C 07-12-18 1 299 496 488 299.0 383.0 488.0
22 C 08-12-18 3 336 691 496 299.0 496.0 488.0
23 C 09-12-18 3 206 433 263 206.0 433.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0
Case 2: If you want check the n previous use groupby.rolling
df_vars = df[['123_Var','456_Var','789_Var']]
n = 3
df = df.join(df.groupby('ID')['123_Var','456_Var','789_Var']
.rolling(n,min_periods = 1).min()
.reset_index(drop=True)
.add_suffix(f'_Min{n}')
)
print(df)
ID Date X 123_Var 456_Var 789_Var 123_Var_Min3 456_Var_Min3 789_Var_Min3
0 A 16-07-19 3 777 250 810 777.0 250.0 810.0
1 A 17-07-19 9 637 121 529 637.0 121.0 529.0
2 A 18-07-19 7 878 786 406 637.0 121.0 406.0
3 A 19-07-19 4 656 140 204 637.0 121.0 204.0
4 A 20-07-19 2 295 272 490 295.0 121.0 204.0
5 A 21-07-19 3 778 600 544 295.0 140.0 204.0
6 A 22-07-19 6 741 792 907 295.0 140.0 204.0
7 B 01-07-19 4 509 690 406 509.0 690.0 406.0
8 B 02-07-19 2 732 915 199 509.0 690.0 199.0
9 B 03-07-19 2 413 725 414 413.0 690.0 199.0
10 B 04-07-19 2 170 702 912 170.0 690.0 199.0
11 B 09-08-19 3 851 616 477 170.0 616.0 199.0
12 B 10-08-19 9 475 447 555 170.0 447.0 414.0
13 B 11-08-19 1 412 403 708 170.0 403.0 477.0
14 B 12-08-19 2 299 537 321 299.0 403.0 321.0
15 B 13-08-19 4 310 119 125 299.0 119.0 125.0
16 C 01-12-18 4 912 755 657 912.0 755.0 657.0
17 C 02-12-18 4 586 771 394 586.0 755.0 394.0
18 C 04-12-18 9 498 122 193 498.0 122.0 193.0
19 C 05-12-18 2 500 528 764 498.0 122.0 193.0
20 C 06-12-18 1 982 383 654 498.0 122.0 193.0
21 C 07-12-18 1 299 496 488 299.0 122.0 193.0
22 C 08-12-18 3 336 691 496 299.0 383.0 488.0
23 C 09-12-18 3 206 433 263 206.0 383.0 263.0
24 C 10-12-18 2 373 319 111 206.0 319.0 111.0
A quite elegant solution is to apply rolling(2).min() to each group,
but to avoid the first row of NaN in each group, this first row
should be "replicated" from the source group.
To do your task, start from defining the following function:
def fnMin2(grp):
rv = pd.concat([pd.DataFrame([grp.iloc[0, -3:]]),
grp[['123_Var', '456_Var', '789_Var']].rolling(2).min().iloc[1:]])\
.astype('int')
rv.columns = [ it.replace('Var', 'Min2') for it in rv.columns ]
return grp.join(rv)
Then apply it to each group:
df.groupby('ID').apply(fnMin2)
Note that column names assigned to new columns in my solution are
just as you wish, contrary to the solution you accepted.
#this compares the next row to the previous row
ext = df.iloc[:,3:].gt(df.iloc[:,3:].shift(1))
#simply renamed the columns here
ext.columns=['123_min','456_min','789_min']
#join the two dataframes by columns
M = pd.concat([df,ext],axis=1)
#based on the conditions, if it is False,
#use value from current row,
#else use value from previous row
M['123_min']=np.where(M['123_min']==0,
M['123_Var'],
M['123_Var'].shift(1)
)
M['456_min']=np.where(M['456_min']==0,
M['456_Var'],
M['456_Var'].shift(1)
)
M['789_min']=np.where(M['789_min']==0,
M['789_Var'],
M['789_Var'].shift(1)
)
Which is faster in GLSL:
pow(x, 3.0f);
or
x*x*x;
?
Does exponentiation performance depend on hardware vendor or exponent value?
I wrote a small benchmark, because I was interested in the results.
In my personal case, I was most interested in exponent = 5.
Benchmark code (running in Rem's Studio / LWJGL):
package me.anno.utils.bench
import me.anno.gpu.GFX
import me.anno.gpu.GFX.flat01
import me.anno.gpu.RenderState
import me.anno.gpu.RenderState.useFrame
import me.anno.gpu.framebuffer.Frame
import me.anno.gpu.framebuffer.Framebuffer
import me.anno.gpu.hidden.HiddenOpenGLContext
import me.anno.gpu.shader.Renderer
import me.anno.gpu.shader.Shader
import me.anno.utils.types.Floats.f2
import org.lwjgl.opengl.GL11.*
import java.nio.ByteBuffer
import kotlin.math.roundToInt
fun main() {
fun createShader(code: String) = Shader(
"", null, "" +
"attribute vec2 attr0;\n" +
"void main(){\n" +
" gl_Position = vec4(attr0*2.0-1.0, 0.0, 1.0);\n" +
" uv = attr0;\n" +
"}", "varying vec2 uv;\n", "" +
"void main(){" +
code +
"}"
)
fun repeat(code: String, times: Int): String {
return Array(times) { code }.joinToString("\n")
}
val size = 512
val warmup = 50
val benchmark = 1000
HiddenOpenGLContext.setSize(size, size)
HiddenOpenGLContext.createOpenGL()
val buffer = Framebuffer("", size, size, 1, 1, true, Framebuffer.DepthBufferType.NONE)
println("Power,Multiplications,GFlops-multiplication,GFlops-floats,GFlops-ints,GFlops-power,Speedup")
useFrame(buffer, Renderer.colorRenderer) {
RenderState.blendMode.use(me.anno.gpu.blending.BlendMode.ADD) {
for (power in 2 until 100) {
// to reduce the overhead of other stuff
val repeats = 100
val init = "float x1 = dot(uv, vec2(1.0)),x2,x4,x8,x16,x32,x64;\n"
val end = "gl_FragColor = vec4(x1,x1,x1,x1);\n"
val manualCode = StringBuilder()
for (bit in 1 until 32) {
val p = 1.shl(bit)
val h = 1.shl(bit - 1)
if (power == p) {
manualCode.append("x1=x$h*x$h;")
break
} else if (power > p) {
manualCode.append("x$p=x$h*x$h;")
} else break
}
if (power.and(power - 1) != 0) {
// not a power of two, so the result isn't finished yet
manualCode.append("x1=")
var first = true
for (bit in 0 until 32) {
val p = 1.shl(bit)
if (power.and(p) != 0) {
if (!first) {
manualCode.append('*')
} else first = false
manualCode.append("x$p")
}
}
manualCode.append(";\n")
}
val multiplications = manualCode.count { it == '*' }
// println("$power: $manualCode")
val shaders = listOf(
// manually optimized
createShader(init + repeat(manualCode.toString(), repeats) + end),
// can be optimized
createShader(init + repeat("x1=pow(x1,$power.0);", repeats) + end),
// can be optimized, int as power
createShader(init + repeat("x1=pow(x1,$power);", repeats) + end),
// slightly different, so it can't be optimized
createShader(init + repeat("x1=pow(x1,${power}.01);", repeats) + end),
)
for (shader in shaders) {
shader.use()
}
val pixels = ByteBuffer.allocateDirect(4)
Frame.bind()
glClearColor(0f, 0f, 0f, 1f)
glClear(GL_COLOR_BUFFER_BIT or GL_DEPTH_BUFFER_BIT)
for (i in 0 until warmup) {
for (shader in shaders) {
shader.use()
flat01.draw(shader)
}
}
val flops = DoubleArray(shaders.size)
val avg = 10 // for more stability between runs
for (j in 0 until avg) {
for (index in shaders.indices) {
val shader = shaders[index]
GFX.check()
val t0 = System.nanoTime()
for (i in 0 until benchmark) {
shader.use()
flat01.draw(shader)
}
// synchronize
glReadPixels(0, 0, 1, 1, GL_RGBA, GL_UNSIGNED_BYTE, pixels)
GFX.check()
val t1 = System.nanoTime()
// the first one may be an outlier
if (j > 0) flops[index] += multiplications * repeats.toDouble() * benchmark.toDouble() * size * size / (t1 - t0)
GFX.check()
}
}
for (i in flops.indices) {
flops[i] /= (avg - 1.0)
}
println(
"" +
"$power,$multiplications," +
"${flops[0].roundToInt()}," +
"${flops[1].roundToInt()}," +
"${flops[2].roundToInt()}," +
"${flops[3].roundToInt()}," +
(flops[0] / flops[3]).f2()
)
}
}
}
}
The sampler function is run 9x 512² pixels * 1000 times, and evaluates the function 100 times each.
I run this code on my RX 580, 8GB from Gigabyte, and collected the following results:
Power
#Mult
GFlops*
GFlopsFp
GFlopsInt
GFlopsPow
Speedup
2
1
1246
1429
1447
324
3.84
3
2
2663
2692
2708
651
4.09
4
2
2682
2679
2698
650
4.12
5
3
2766
972
974
973
2.84
6
3
2785
978
974
976
2.85
7
4
2830
1295
1303
1299
2.18
8
3
2783
2792
2809
960
2.90
9
4
2836
1298
1301
1302
2.18
10
4
2833
1291
1302
1298
2.18
11
5
2858
1623
1629
1623
1.76
12
4
2824
1302
1295
1303
2.17
13
5
2866
1628
1624
1626
1.76
14
5
2869
1614
1623
1611
1.78
15
6
2886
1945
1943
1953
1.48
16
4
2821
1305
1300
1305
2.16
17
5
2868
1615
1625
1619
1.77
18
5
2858
1620
1625
1624
1.76
19
6
2890
1949
1946
1949
1.48
20
5
2871
1618
1627
1625
1.77
21
6
2879
1945
1947
1943
1.48
22
6
2886
1944
1949
1952
1.48
23
7
2901
2271
2269
2268
1.28
24
5
2872
1621
1628
1624
1.77
25
6
2886
1942
1943
1942
1.49
26
6
2880
1949
1949
1953
1.47
27
7
2891
2273
2263
2266
1.28
28
6
2883
1949
1946
1953
1.48
29
7
2910
2279
2281
2279
1.28
30
7
2899
2272
2276
2277
1.27
31
8
2906
2598
2595
2596
1.12
32
5
2872
1621
1625
1622
1.77
33
6
2901
1953
1942
1949
1.49
34
6
2895
1948
1939
1944
1.49
35
7
2895
2274
2266
2268
1.28
36
6
2881
1937
1944
1948
1.48
37
7
2894
2277
2270
2280
1.27
38
7
2902
2275
2264
2273
1.28
39
8
2910
2602
2594
2603
1.12
40
6
2877
1945
1947
1945
1.48
41
7
2892
2276
2277
2282
1.27
42
7
2887
2271
2272
2273
1.27
43
8
2912
2599
2606
2599
1.12
44
7
2910
2278
2284
2276
1.28
45
8
2920
2597
2601
2600
1.12
46
8
2920
2600
2601
2590
1.13
47
9
2925
2921
2926
2927
1.00
48
6
2885
1935
1955
1956
1.47
49
7
2901
2271
2279
2288
1.27
50
7
2904
2281
2276
2278
1.27
51
8
2919
2608
2594
2607
1.12
52
7
2902
2282
2270
2273
1.28
53
8
2903
2598
2602
2598
1.12
54
8
2918
2602
2602
2604
1.12
55
9
2932
2927
2924
2936
1.00
56
7
2907
2284
2282
2281
1.27
57
8
2920
2606
2604
2610
1.12
58
8
2913
2593
2597
2587
1.13
59
9
2925
2923
2924
2920
1.00
60
8
2930
2614
2606
2613
1.12
61
9
2932
2946
2946
2947
1.00
62
9
2926
2935
2937
2947
0.99
63
10
2958
3258
3192
3266
0.91
64
6
2902
1957
1956
1959
1.48
65
7
2903
2274
2267
2273
1.28
66
7
2909
2277
2276
2286
1.27
67
8
2908
2602
2606
2599
1.12
68
7
2894
2272
2279
2276
1.27
69
8
2923
2597
2606
2606
1.12
70
8
2910
2596
2599
2600
1.12
71
9
2926
2921
2927
2924
1.00
72
7
2909
2283
2273
2273
1.28
73
8
2909
2602
2602
2599
1.12
74
8
2914
2602
2602
2603
1.12
75
9
2924
2925
2927
2933
1.00
76
8
2904
2608
2602
2601
1.12
77
9
2911
2919
2917
2909
1.00
78
9
2927
2921
2917
2935
1.00
79
10
2929
3241
3246
3246
0.90
80
7
2903
2273
2276
2275
1.28
81
8
2916
2596
2592
2589
1.13
82
8
2913
2600
2597
2598
1.12
83
9
2925
2931
2926
2913
1.00
84
8
2917
2598
2606
2597
1.12
85
9
2920
2916
2918
2927
1.00
86
9
2942
2922
2944
2936
1.00
87
10
2961
3254
3259
3268
0.91
88
8
2934
2607
2608
2612
1.12
89
9
2918
2939
2931
2916
1.00
90
9
2927
2928
2920
2924
1.00
91
10
2940
3253
3252
3246
0.91
92
9
2924
2933
2926
2928
1.00
93
10
2940
3259
3237
3251
0.90
94
10
2928
3247
3247
3264
0.90
95
11
2933
3599
3593
3594
0.82
96
7
2883
2282
2268
2269
1.27
97
8
2911
2602
2595
2600
1.12
98
8
2896
2588
2591
2587
1.12
99
9
2924
2939
2936
2938
1.00
As you can see, a power() call takes exactly as long as 9 multiplication instructions. Therefore every manual rewriting of a power with less than 9 multiplications is faster.
Only the cases 2, 3, 4, and 8 are optimized by my driver. The optimization is independent of whether you use the .0 suffix for the exponent.
In the case of exponent = 2, my implementation seems to have lower performance than the driver. I am not sure, why.
The speedup is the manual implementation compared to pow(x,exponent+0.01), which cannot be optimized by the compiler.
Because the multiplications and the speedup align so perfectly, I created a graph to show the relationship. This relationship kind of shows that my benchmark is trustworthy :).
Operating System: Windows 10 Personal
GPU: RX 580 8GB from Gigabyte
Processor: Ryzen 5 2600
Memory: 16 GB DDR4 3200
GPU Driver: 21.6.1 from 17th June 2021
LWJGL: Version 3.2.3 build 13
While this can definitely be hardware/vendor/compiler dependent, advanced mathematical functions like pow() tend to be considerably more expensive than basic operations.
The best approach is of course to try both, and benchmark. But if there is a simple replacement for an advanced mathematical functions, I don't think you can go very wrong by using it.
If you write pow(x, 3.0), the best you can probably hope for is that the compiler will recognize the special case, and expand it. But why take the risk, if the replacement is just as short and easy to read? C/C++ compilers don't always replace pow(x, 2.0) by a simple multiplication, so I wouldn't necessarily count on all GLSL compilers to do that.
I have a dataframe named nf as below :
A B C D E A.1 B.1 C.1 D.1 E.1 A.2 B.2 C.2 D.2 E.2 F.2
122 434 345 435 566 657 466 762 123 645
434 453 786 654 980 424 786 897 564 243 345 455 432 435 432
234 553 588 899 533
123 875 789 456 876 667 988 887 234 342
and so on ....
where the values repeat every 5th column and in the 3rd row I have no values for the second half.
The above provided values are just a sample of the original values I have. In original I have 50 columns with values repeating columnwise every 10th. And rows i have alomst 120k. I want to reshape the values so that there are only 10 columns in such a way that values append at the last as below.
Desired output is :
A B C D E
122 434 345 435 566
434 453 786 654 980
234 553 588 899 533
123 875 789 456 876
657 466 762 123 645
424 786 897 564 243
667 988 887 234 342
345 455 432 435 432
All the values by columns should append at the bottom in the rows.
You can using stack and groupby
df.stack().groupby(level=1).apply(list).apply(pd.Series).T
Out[1178]:
A B C D E
0 122.0 434.0 345.0 435.0 566.0
1 657.0 466.0 762.0 123.0 645.0
2 434.0 453.0 786.0 654.0 980.0
3 424.0 786.0 897.0 564.0 243.0
4 345.0 455.0 432.0 435.0 432.0
5 234.0 553.0 588.0 899.0 533.0
6 123.0 875.0 789.0 456.0 876.0
7 667.0 988.0 887.0 234.0 342.0
Update
df.apply(lambda x : ','.join(x[x.notnull()].astype(str))).groupby(level=0).apply(','.join).str.split(',',expand=True).T
Out[1203]:
A B C D E F
0 122.0 434.0 345.0 435.0 566.0
1 434.0 453.0 786.0 654.0 980.0 None
2 234.0 553.0 588.0 899.0 533.0 None
3 123.0 875.0 789.0 456.0 876.0 None
4 657.0 466.0 762.0 123.0 645.0 None
5 424.0 786.0 897.0 564.0 243.0 None
6 667.0 988.0 887.0 234.0 342.0 None
7 345.0 455.0 432.0 435.0 432.0 None
I want to remove rows with "nan" or "-nan":
Reading:
excel_file = 'originale_ridotto.xlsx'
df = pd.read_excel(excel_file, na_values="NaN")
print(df)
print("I am here")
df.dropna(axis=0, how="any")
print(df)
Output of dataframe colunmns (Python 3.6.3):
Data e ora Potenza Teorica Totale CC [kW]
0 01/01/2017 00:05 0
1 01/01/2017 00:10 0
2 01/01/2017 00:15 0
3 01/01/2017 00:20 0
4 01/01/2017 00:25 0
5 01/01/2017 00:30 0
6 01/01/2017 00:35 0
7 01/01/2017 00:40 0
Potenza Attiva Totale AC [kW] Energia totale cumulata al contatore [kWh] \
0 0 7760812.5
1 0 7760812.5
2 0 7760812.5
3 0 7760812.5
4 0 7760812.5
5 0 7760812.5
6 0 7760812.5
7 0 7760812.5
Temperatura modulo [°C] Irraggiamento [W/m2]
0 0 5.0
1 0 6.0
2 0 NaN
3 0 2.0
4 0 3.0
5 0 NaN
6 0 7.0
7 0 9.0
Potenza Attiva Inv.1Blocco1 [kW]
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
Data e ora Potenza Teorica Totale CC [kW]
0 01/01/2017 00:05 0
1 01/01/2017 00:10 0
2 01/01/2017 00:15 0
3 01/01/2017 00:20 0
4 01/01/2017 00:25 0
5 01/01/2017 00:30 0
6 01/01/2017 00:35 0
7 01/01/2017 00:40 0
Potenza Attiva Totale AC [kW] Energia totale cumulata al contatore [kWh]
0 0 7760812.5
1 0 7760812.5
2 0 7760812.5
3 0 7760812.5
4 0 7760812.5
5 0 7760812.5
6 0 7760812.5
7 0 7760812.5
Temperatura modulo [°C] Irraggiamento [W/m2] \
0 0 5.0
1 0 6.0
2 0 NaN
3 0 2.0
4 0 3.0
5 0 NaN
6 0 7.0
7 0 9.0
Potenza Attiva Inv.1Blocco1 [kW]
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
df.dropna(axis=0, how="any") does not remove these rows. Why?
Could you help me?
You are creating a cleaned dataframe, but you are not "remembering" it. df.dropna(how='any') returns the cleaned df - you need to assign it and then use it:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,1000,size=(10, 10)), columns=list('ABCDEFGHIJ'))
# ignoring the warnings
df['A'][2] = np.NaN
df['C'][3] = np.NaN
df['I'][5] = np.NaN
df['E'][7] = np.NaN
print(df)
df = df.dropna(how='any') # this returns a NEW dataframe, it does not modify in place
print(df)
Output:
A B C D E F G H I J
0 314.0 664 855.0 101 764.0 251 503 783 153.0 474
1 903.0 77 546.0 205 113.0 519 115 45 988.0 964
2 NaN 155 481.0 243 165.0 696 255 123 802.0 228
3 406.0 603 NaN 84 390.0 545 651 549 440.0 982
4 796.0 626 139.0 810 474.0 257 407 264 680.0 164
5 443.0 132 545.0 380 420.0 885 704 596 NaN 778
6 285.0 317 238.0 437 508.0 189 501 738 605.0 290
7 144.0 426 220.0 573 NaN 758 581 420 544.0 173
8 864.0 369 541.0 405 863.0 45 522 178 705.0 419
9 936.0 664 547.0 793 68.0 77 364 633 547.0 790
A B C D E F G H I J
0 314.0 664 855.0 101 764.0 251 503 783 153.0 474
1 903.0 77 546.0 205 113.0 519 115 45 988.0 964
4 796.0 626 139.0 810 474.0 257 407 264 680.0 164
6 285.0 317 238.0 437 508.0 189 501 738 605.0 290
8 864.0 369 541.0 405 863.0 45 522 178 705.0 419
9 936.0 664 547.0 793 68.0 77 364 633 547.0 790