I need to change 0 level index ('Product Group') of pandas groupby object, based on conditions (sum of related values in column 'Sales').
Since code is very long and some files are needed, I`ll copy output.
the last string of code is:
tdk_regions = tdk[['Region', 'Sales', 'Product Group']].groupby(['Product Group', 'Region']).sum()
###The output will be like this
Product Group Region Sales
ALUMINUM & FILM CAPACITORS BG America 7.425599e+07
China 2.249969e+08
Europe 2.404613e+08
India 6.034134e+07
Japan 7.667371e+06
... ... ...
TEMPERATURE&PRESSURE SENSORS BG Europe 1.308471e+08
India 3.077273e+06
Japan 2.851744e+07
Korea 1.309189e+06
OSEAN 1.258075e+07
Try MultiIndex.rename:
df.index.rename("New Name", level=0, inplace=True)
print(df)
Prints:
Sales
New Name Region
ALUMINUM & FILM CAPACITORS BG America 74255990.0
China 224996900.0
Europe 240461300.0
India 60341340.0
Japan 7667371.0
Related
I have a pandas column like below:
import pandas as pd
data = {'id': ['001', '002', '003'],
'address': [['William J. Clare', '290 Valley Dr.', 'Casper, WY 82604','USA, United States'],
['1180 Shelard Tower', 'Minneapolis, MN 55426', 'USA, United States'],
['William N. Barnard', '145 S. Durbin', 'Casper, WY 82601', 'USA, United States']]
}
df = pd.DataFrame(data)
I wanted to pop the 1st element in the address column list if its name or if it doesn't contain any number.
output:
[['290 Valley Dr.', 'Casper, WY 82604','USA, United States'], ['1180 Shelard Tower', 'Minneapolis, MN 55426', 'USA, United States'], ['145 S. Durbin', 'Casper, WY 82601', 'USA, United States']]
This is continuation of my previous post. I am learning python and this is my 2nd project and I am struggling with this from morning please help me.
Assuming you define an address as a string starting with a number (you can change the logic):
for l in df['address']:
if not l[0][0].isdigit():
l.pop(0)
print(df)
updated df:
id address
0 001 [290 Valley Dr., Casper, WY 82604, USA, United...
1 002 [1180 Shelard Tower, Minneapolis, MN 55426, US...
2 003 [145 S. Durbin, Casper, WY 82601, USA, United ...
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am unable to compare the column values of 2 different dataframes.
First dataset has 500 rows and second dataset has 128 rows. I am mentioning few of the rows of datasets.
First dataset:
Country_name Weather President
USA 16 Trump
China 19 Xi
2nd dataset
Country_name Weather Currency
North Korea 26 NKT
China 19 Yaun
I want to compare the country_name column because I don't have Currency column in dataset 1 , so if the country_name matches, then I can append its value. My final dataframe should be like this
Country_name Weather President Currency
USA 16 Trump Dollar
China 19 Xi Yaun
In the above final dataframes, we have to include only those countries whose country_name is present in both the datasets and corresponding value of Currency should be appended as shown above.
If you just want to keep records that only match in Country_name, and execlude everything else, you can then use the merge function, which basically finds the intersection between two dataframes based on some given column:
d1 = pd.DataFrame(data=[['USA', 16, 'Trump'], ['China', 19, 'Xi']],
columns=['Country_name', 'Weather', 'President'])
d2 = pd.DataFrame(data=[['North Korea', 26, 'NKT'], ['China', 19, 'Yun']],
columns=['Country_name', 'Weather', 'Currency'])
result = pd.merge(d1, d2, on=['Country_name'], how='inner')\
.rename(columns={'Weather_x': 'Weather'}).drop(['Weather_y'], axis=1)
print(result)
Output
Country_name Weather President Currency
0 China 19 Xi Yun
home_team_name away_team_name home_ppg_per_odds_pre_game away_ppg_per_odds_pre_game
0 Manchester United Tottenham Hotspur 3.310000 4.840000
1 AFC Bournemouth Aston Villa 0.666667 3.230000
2 Norwich City Crystal Palace 0.666667 13.820000
3 Leicester City Sunderland 4.733333 3.330000
4 Everton Watford 0.583333 2.386667
5 Chelsea Manchester United 1.890000 3.330000
The home_ppg_per_odds_pre_game and away_ppg_per_odds_pre_game are basically the same metric. The former reprsents the value of this metric for the home_team, while the latter represents this metric for the away team. I want a mean of this metric for each team and that is regardless whether the team is playing home or away. In the example df you Manchester United as home_team_name in zero and as away_team_name in 5. I want the mean for Manchester United that includes all this examples.
df.groupby("home_team_name")["home_ppg_per_odds_pre_game"].mean()
This will only bring me the mean for the occasion when the team is playing home, but I want both home and away.
Since the two metrics are the same, you can append the home and away team metrics, like this:
data_df = pd.concat([df.loc[:,('home_team_name','home_ppg_per_odds_pre_game')], df.loc[:,('away_team_name','away_ppg_per_odds_pre_game')].rename(columns={'away_team_name':'home_team_name','away_ppg_per_odds_pre_game':'home_ppg_per_odds_pre_game'})])
Then you can use groupby to get the means:
data_df.groupby('home_team_name')['home_ppg_per_odds_pre_game'].mean().reset_index()
Been trying to search for this but somehow can't seem to find the right answer.
Given the following simple dataframe:
country continent population
0 UK Europe 111111
1 Spain Europe 222222
2 Malaysia Asia 333333
3 USA America 444444
How can I retrieve the country value if I have a condition WHERE an index value is given? For example, If I am given an index value of 2, I should return Malaysia.
Edit: Forget to mention that the input index value comes from a variable (think of it as a user select a particular row and the selected row provide an index value variable).
Thank you.
df.iloc[2]['country']
iloc is used for selection by position, see pandas.DataFrame.iloc documentation for further options.
index = 2
print(df.iloc[index]['country'])
Malaysia
I have a df like below:
df = pd.DataFrame({"location":["north", "south","north"], "store": ["a","b","c"], "date" : ["02112018","02152018","02182018"], "barcode1":["ok", "low","ok"], "barcode2": ["low","zero","zero"], "barcode3": ["ok","zero","low"]})
what I would like to have is like below:
what I have done is to repeat each row, number of barcodes times with below code:
df_1 = pd.DataFrame(np.repeat(df.iloc[:,:3].values,len(df.iloc[0,:3:]),axis=0))
df_1.columns = df.columns[:3]
and having the below output:
however I do not know how to get to df_desired.
sorry that I could not find a suitable title.
any help would be appreciated.
You could use pd.melt to unpivot a dataframe, .sort_values by store gives you the desired order of rows.
pd.melt(
df,
id_vars=['location', 'store', 'date'],
var_name='barcode',
value_name='control').sort_values(by=['store'])
location store date barcode control
0 north a 02112018 barcode1 ok
3 north a 02112018 barcode2 low
6 north a 02112018 barcode3 ok
1 south b 02152018 barcode1 low
4 south b 02152018 barcode2 zero
7 south b 02152018 barcode3 zero
2 north c 02182018 barcode1 ok
5 north c 02182018 barcode2 zero
8 north c 02182018 barcode3 low