I have two dfs. The tables look like:
df1
1 2 3 4
Avg 0.21 0.13 0.5 0.24
df2 1 2 3 4
2021 1.01 1.01 1.01 1.01
2022 1.02 1.01 1.01 1.02
2023 1.02 1.02 1.03 1.02
2024 1.01 1.01 1.01 1.01
I want to mul. row['Avg'] in df1 and rows from 2021 to 2014 in df2, so results should look like this:
results
1 2 3 4
2022 1.02*0.21 1.01*0.13 1.01*0.5 1.02*0.24
2023 1.02*0.21 1.02*0.13 1.03*0.5 1.02*0.24
2024 1.01*0.21 1.01*0.13 1.01*0.5 1.01*0.24
How can I do it?
Try:
df2.mul(df1.to_numpy(), axis=1)
Output:
1 2 3 4
2021 0.2121 0.1313 0.505 0.2424
2022 0.2142 0.1313 0.505 0.2448
2023 0.2142 0.1326 0.515 0.2448
2024 0.2121 0.1313 0.505 0.2424
Related
I have data that is grouped by a column 'plant_name' and I need to write & apply a function to test for a trend on one of the columns, i.e., named "10%" or '90%' for example.
My data looks like this -
plant_name year count mean std min 10% 50% 90% max
0 ARIZONA I 2005 8760.0 8.25 2.21 1.08 5.55 8.19 11.09 15.71
1 ARIZONA I 2006 8760.0 7.87 2.33 0.15 4.84 7.82 10.74 16.75
2 ARIZONA I 2007 8760.0 8.31 2.25 0.03 5.52 8.27 11.23 16.64
3 ARIZONA I 2008 8784.0 7.67 2.46 0.21 4.22 7.72 10.78 15.73
4 ARIZONA I 2009 8760.0 6.92 2.33 0.23 3.79 6.95 9.96 14.64
5 ARIZONA I 2010 8760.0 8.07 2.21 0.68 5.51 7.85 11.14 17.31
6 ARIZONA I 2011 8760.0 7.54 2.38 0.33 4.44 7.45 10.54 17.77
7 ARIZONA I 2012 8784.0 8.61 1.92 0.33 6.37 8.48 11.07 15.84
8 ARIZONA I 2015 8760.0 8.21 2.13 0.60 5.58 8.24 10.88 16.74
9 ARIZONA I 2016 8784.0 8.39 2.27 0.46 5.55 8.32 11.34 16.09
10 ARIZONA I 2017 8760.0 8.32 2.11 0.85 5.70 8.25 11.12 17.96
11 ARIZONA I 2018 8760.0 7.94 2.28 0.07 5.17 7.72 11.04 16.31
12 ARIZONA I 2019 8760.0 7.71 2.49 0.38 4.28 7.75 10.87 15.79
13 ARIZONA I 2020 8784.0 7.57 2.43 0.50 4.36 7.47 10.78 15.69
14 CAETITE I 2005 8760.0 8.11 3.15 0.45 3.76 8.38 12.08 18.89
15 CAETITE I 2006 8760.0 7.70 3.21 0.05 3.50 7.66 12.05 19.08
16 CAETITE I 2007 8760.0 8.64 3.18 0.01 4.05 8.83 12.63 18.57
17 CAETITE I 2008 8784.0 7.87 3.09 0.28 3.75 7.80 11.92 18.54
18 CAETITE I 2009 8760.0 7.31 3.02 0.17 3.46 7.21 11.40 19.46
19 CAETITE I 2010 8760.0 8.00 3.24 0.34 3.63 8.03 12.29 17.27
I'm using this function from here -
import pymannkendall as mk
and you apply the function like this:
mk.original_test(dataframe)
I need the final dataframe to look like this which is the result of the series columns returned by the function (mk.original_test):
trend, h, p, z, Tau, s, var_s, slope, intercept = mk.original_test(data)
plant_name trend h p z Tau s var_s slope intercept
0 ARIZONA I no trend False 0.416 0.812 xxx x x x x
1 CAETITE I increasing True 0.002 3.6 xxx x x x x
I just am not sure how to use groupby to group by plant_name column and then apply the mk function by plant_name to either of the columns in the data shown. Thank you,
For a given column, you can run the test in a GroupBy.apply() and return the result as a Series indexed by result._fields:
def mktest(x):
result = mk.original_test(x)
return pd.Series(result, index=result._fields)
column = '10%'
df.groupby('plant_name', as_index=False)[column].apply(mktest)
plant_name
trend
h
p
z
Tau
s
var_s
slope
intercept
ARIZONA I
no trend
False
0.956276
-0.054827
-0.021978
-2.0
332.666667
-0.003333
5.361667
CAETITE I
no trend
False
0.452370
-0.751469
-0.333333
-5.0
28.333333
-0.026000
3.755000
Here is a sample of the original table.
# z speed dir U_geo V_geo U U[QCC] U[ign] U[siC] U[siD] V
0 40 2.83 181.0 0.05 2.83 -0.20 11 -0.20 2.24 0.95 2.83 11
1 50 2.41 184.8 0.20 2.40 -0.01 11 -0.01 2.47 0.94 2.41 11
2 60 1.92 192.4 0.41 1.88 0.25 11 0.25 2.46 0.94 1.91 11
3 70 1.75 201.7 0.65 1.63 0.50 11 0.50 2.47 0.94 1.68 11
I need to shift the entire table over by 1 column to produce this:
z speed dir U_geo V_geo U U[QCC] U[ign] U[siC] U[siD] V
0 40 2.83 181.0 0.05 2.83 -0.20 11 -0.20 2.24 0.95 2.83
1 50 2.41 184.8 0.20 2.40 -0.01 11 -0.01 2.47 0.94 2.41
2 60 1.92 192.4 0.41 1.88 0.25 11 0.25 2.46 0.94 1.91
3 70 1.75 201.7 0.65 1.63 0.50 11 0.50 2.47 0.94 1.68
Here is the code that ingests the data and tries to shift it over by one column
wind_rass_table_df=pd.read_csv(file_path, header=j+3, engine='python', nrows=77,sep=r'\s{2,}',skip_blank_lines=False,index_col=False)
wind_rass_table_df=wind_rass_table_df.shift(periods=1,axis=1)
Supposedly df.shift(axis=1) should shift the dataframe over by 1 column but it does more than that, it does this:
# z speed dir U_geo V_geo U U[QCC] U[ign] U[siC]
0 NaN NaN 2.83 181.0 0.05 2.83 40.0 -0.20 -0.20 2.24
1 NaN NaN 2.41 184.8 0.20 2.40 50.0 -0.01 -0.01 2.47
2 NaN NaN 1.92 192.4 0.41 1.88 60.0 0.25 0.25 2.46
3 NaN NaN 1.75 201.7 0.65 1.63 70.0 0.50 0.50 2.47
The shift function has taken the first column, inserted into the 7th column, shifted the 7th into the 8th and repeated the 8th, shifting the 9th over and so on.
What is the correct way of shifting a dataframe over by one column?
Many thanks!
You can use iloc and create another dataframe:
df = pd.DataFrame(data=df.iloc[:, :-1], columns=df.columns[1:], index=df.index)
Dear friends i want to transpose the following dataframe into a single column. I cant figure out a way to transform it so your help is welcome!! I tried pivottable but sofar no succes
X 0.00 1.25 1.75 2.25 2.99 3.25
X 3.99 4.50 4.75 5.25 5.50 6.00
X 6.25 6.50 6.75 7.50 8.24 9.00
X 9.50 9.75 10.25 10.50 10.75 11.25
X 11.50 11.75 12.00 12.25 12.49 12.75
X 13.25 13.99 14.25 14.49 14.99 15.50
and it should look like this
X
0.00
1.25
1.75
2.25
2.99
3.25
3.99
4.5
4.75
5.25
5.50
6.00
6.25
etc..
This will do it, df.columns[0] is used as I don't know what are your headers:
df = pd.DataFrame({'X': df.set_index(df.columns[0]).stack().reset_index(drop=True)})
df
X
0 0.00
1 1.25
2 1.75
3 2.25
4 2.99
5 3.25
6 3.99
7 4.50
8 4.75
9 5.25
10 5.50
11 6.00
12 6.25
13 6.50
14 6.75
15 7.50
16 8.24
17 9.00
18 9.50
19 9.75
20 10.25
21 10.50
22 10.75
23 11.25
24 11.50
25 11.75
26 12.00
27 12.25
28 12.49
29 12.75
30 13.25
31 13.99
32 14.25
33 14.49
34 14.99
35 15.50
ty so much!! A follow up question(a)
Is it also possible to stack the df into 2 columns X and Y
this is the data set
This is the data set.
1 2 3 4 5 6 7
X 0.00 1.25 1.75 2.25 2.99 3.25
Y -1.08 -1.07 -1.07 -1.00 -0.81 -0.73
X 3.99 4.50 4.75 5.25 5.50 6.00
Y -0.37 -0.20 -0.15 -0.17 -0.15 -0.16
X 6.25 6.50 6.75 7.50 8.24 9.00
Y -0.17 -0.18 -0.24 -0.58 -0.93 -1.24
X 9.50 9.75 10.25 10.50 10.75 11.25
Y -1.38 -1.42 -1.51 -1.57 -1.64 -1.75
X 11.50 11.75 12.00 12.25 12.49 12.75
Y -1.89 -2.00 -2.00 -2.04 -2.04 -2.10
X 13.25 13.99 14.25 14.49 14.99 15.50
Y -2.08 -2.13 -2.18 -2.18 -2.27 -2.46
How to seperate dataframe as follows:
yr mon day Tmax Tmin pcp
2013 4 22 5.09-10.92 0.0
2013 4 23 2.77 -9.63 0.5
2013 4 24 0.28 -9.90 9.9
2013 4 25 0.76 -6.70 12.2
2013 4 26 -0.35 -9.48 0.0
2013 4 27 7.22-10.47 0.0
2013 4 28 4.19-10.78 0.0
you see: there are no whitespaces between Tmax and Tmin in principle. The max width of Tmax and Tmin are 6 char-spaces. If less than 6 spaces , filled by whitespace. I want to read it to df and seperate each columns.
seperate columns as given char length?
try this:
df = pd.read_fwf(filename)
It seems you need str.extract floats and ints, solution works if all data are in one column which is selected by iloc:
pat="(\d+)\s*(\d+)\s*(\d+)\s*([-+]?\d+\.\d+|\d+)\s*([-+]?\d+\.\d+|\d+)\s*([-+]?\d+\.\d+|\d+)"
df1 = df.iloc[:, 0].str.extract(pat, expand=True)
df1.columns = ['year', 'mont','day','Tmax','Tmin','pcp']
print (df1)
year mont day Tmax Tmin pcp
0 2013 4 22 5.09 -10.92 0.0
1 2013 4 23 2.77 -9.63 0.5
2 2013 4 24 0.28 -9.90 9.9
3 2013 4 25 0.76 -6.70 12.2
4 2013 4 26 -0.35 -9.48 0.0
5 2013 4 27 7.22 -10.47 0.0
6 2013 4 28 4.19 -10.78 0.0
Another solution is use read_fwf and specify colspecs:
import pandas as pd
from pandas.compat import StringIO
temp=u"""yr mon day Tmax Tmin pcp
2013 4 22 5.09-10.92 0.0
2013 4 23 2.77 -9.63 0.5
2013 4 24 0.28 -9.90 9.9
2013 4 25 0.76 -6.70 12.2
2013 4 26 -0.35 -9.48 0.0
2013 4 27 7.22-10.47 0.0
2013 4 28 4.19-10.78 0.0 """
#after testing replace 'StringIO(temp)' to 'filename.csv'
names=['year', 'mont','day','Tmax','Tmin','pcp']
colspecs = [(0, 6), (9, 10), (12, 14), (21, 26),(26,32),(34,38)]
df = pd.read_fwf(StringIO(temp),colspecs=colspecs, names=names, header=0)
print (df)
year mont day Tmax Tmin pcp
0 2013 4 22 5.09 -10.92 0.0
1 2013 4 23 2.77 -9.63 0.5
2 2013 4 24 0.28 -9.90 9.9
3 2013 4 25 0.76 -6.70 12.2
4 2013 4 26 -0.35 -9.48 0.0
5 2013 4 27 7.22 -10.47 0.0
6 2013 4 28 4.19 -10.78 0.0
I would like to randomly insert in a new temp_table the records from the Initial Table below, grouping them by a new PO number (1234-1, 1234-2,etc..) where each group sum(TKG) is <20 and sum(TVOL) is <0.1
INITIAL TABLE
lineID PO Item QTY Weight Volume T.KG T.VOL
1 1234 ABCD 12 0.40 0.0030 4.80 0.036
2 1234 EFGH 8 0.39 0.0050 3.12 0.040
3 1234 IJKL 5 0.48 0.0070 2.40 0.035
4 1234 MNOP 8 0.69 0.0040 5.53 0.032
5 1234 QRST 9 0.58 0.0025 5.22 0.023
6 1234 UVWX 7 0.87 0.0087 6.09 0.061
7 1234 YZAB 10 0.71 0.0064 7.10 0.064
8 1234 CDEF 6 0.69 0.0054 4.14 0.032
9 1234 GHIJ 7 0.65 0.0036 4.55 0.025
10 1234 KLMN 9 0.67 0.0040 6.03 0.036
NEW Temp_Table should look like:
LineID PO Item QTY Weight Volume T.KG T.VOL
1 1234-1 ABCD 12 0.40 0.0030 4.80 0.036
2 1234-1 EFGH 8 0.39 0.0050 3.12 0.040
5 1234-1 QRST 9 0.58 0.0025 5.22 0.023
3 1234-2 IJKL 5 0.48 0.0070 2.40 0.035
4 1234-2 MNOP 8 0.69 0.0040 5.53 0.032
8 1234-2 CDEF 6 0.69 0.0054 4.14 0.032
6 1234-3 UVWX 7 0.87 0.0087 6.09 0.061
10 1234-3 KLMN 9 0.67 0.0040 6.03 0.036
9 1234-4 GHIJ 7 0.65 0.0036 4.55 0.025
7 1234-4 YZAB 10 0.71 0.0064 7.10 0.064
I can't figure out how to code this...
It's probably a job for a cursor.
The algorithm could basically be like this:
Collect the rows from the initial table one by one, accumulating sum(TKG) and sum(TVOL):
pick out the rows into the temp while the conditions are still met (omit those that exceed either sum);
use lineID as the order;
iterate up to the end of the list.
Upon hitting the end of the table call it a group, then start all over again, omitting the rows that have already been collected into the temp.
Continue while there still are rows not collected.
But I'm too lazy at the moment to give out the actual code, besides it's a homework, and cursors hate me anyway.
The logic of the 1234-1, 1234-2, etc is to break the records into groups that represent a carton. If the order has 100 line items, I may need n cartons (n groups) to pack all the items.