pandas dataframe how to shift rows based on date - pandas

I am trying to assess the impact of a promotional campaign on our customers. The goal is to assess revenue from the point the promotion was offered. However promotion was offered for different customers at different points. How do I rearrange the data to Month 0, Month 1, Month 2, Month 3. Month 0 being the month the customer first got the promotion.

With below self explanatory code you can get your desired output:
# Create DataFrame
import pandas as pd
df = pd.DataFrame({"Account":[1,2,3,4,5,6],\
"May-18":[181,166,221,158,210,159],\
"Jun-18":[178,222,230,189,219,200],\
"Jul-18":[184,207,175,167,201,204],\
"Aug-18":[161,174,178,233,223,204],\
"Sep-18":[218,209,165,165,204,225],\
"Oct-18":[199,206,205,196,212,205],\
"Nov-18":[231,196,189,218,234,235],\
"Dec-18":[173,178,189,218,234,205],\
"Promotion Month":["Sep-18","Aug-18","Jul-18","May-18","Aug-18","Jun-18"]})
df = df.set_index("Account")
cols = ["May-18","Jun-18","Jul-18","Aug-18","Sep-18","Oct-18","Nov-18","Dec-18","Promotion Month"]
df = df[cols]
# Define function to select the four months after promotion
def selectMonths(row):
cols = df.columns.to_list()
colMonth0 = cols.index(row["Promotion Month"])
colsOut = cols[colMonth0:colMonth0+4]
out = pd.Series(row[colsOut].to_list())
return out
# Apply the function and set the index and columns of output DataFrame
out = df.apply(selectMonths, axis=1)
out.index = df.index
out.columns=["Month 0","Month 1","Month 2","Month 3"]
Then the output you get is:
>>> out
Month 0 Month 1 Month 2 Month 3
Account
1 218 199 231 173
2 174 209 206 196
3 175 178 165 205
4 158 189 167 233
5 223 204 212 234
6 200 204 204 225

Related

Combining pandas dataframes in this way

I have a dataframe df1 :-
REGION
DATE
Count
TIME PER ID
ABC
2021-03-22
2
44
I have another dataframe df2 :-
ID
REGION
DATE
TIME
11
ABC
2021-03-22
198
75
ABC
2021-03-22
250
I want to achieve this :-
ID
REGION
DATE
TIME
TIME PER ID
TOTAL TIME
11
ABC
2021-03-22
198
44
242
75
ABC
2021-03-22
250
44
294
Essentially I want to match the REGION and DATE and whatever value for TIME PER ID from df1 I want to populate that for those rows in df2 which matches the region and Date
Merge both dataframes and then create the new column.
output_df = df2.merge(df1, on=['REGION', 'DATE'], how='left')
output_df.loc[:, 'TOTAL'] = output_df['Time'] + output_df['TIME PER ID']

Empty Dataframe after being populated from URL

html_data = requests.get('https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue')
soup = BeautifulSoup(html_data.text, 'lxml')
all_tables = soup.find_all('table', attrs={'class': 'historical_data_table table'})
gme_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for table in all_tables:
if table.find('th').getText().startswith("Gamestop Quarterly Revenue"):
for row in table.find_all("tr"):
col = row.find_all("td")
if len(col) == 2:
date = col[0].text
revenue = col[1].text.replace('$', '').replace(',', '')
gme_revenue = gme_revenue.append({"Date": date, "Revenue": revenue}, ignore_index=True)
however, when I try to make a table, it comes up empty as
Empty DataFrame
Columns: [Date, Revenue]
Index: []
and after I do a test, this appears:
gme_revenue.empty
>>>True
unsure on why my data frame is empty. I've even copied the code from another data frame and it still doesn't work.
Help is appreciated.
Change
if table.find('th').getText().startswith("Gamestop Quarterly Revenue"):
to
if 'Quarterly' in table.find('th').text:
and it should work
Output:
Date Revenue
0 2020-10-31 1005
1 2020-07-31 942
2 2020-04-30 1021
3 2020-01-31 2194
4 2019-10-31 1439
... ... ...
59 2006-01-31 1667
60 2005-10-31 534
61 2005-07-31 416
62 2005-04-30 475
63 2005-01-31 709
64 rows × 2 columns

Summing columns and rows

How do I add up rows and columns.
The last column Sum needs to be the sum of the rows R0+R1+R2.
The last row needs to be the sum of these columns.
import pandas as pd
# initialize list of lists
data = [['AP',16,20,78], ['AP+', 10,14,55], ['SP',32,26,90],['Total',0, 0, 0]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Type', 'R0', 'R1', 'R2'])
The result:
Type R0 R1 R2 Sum
0 AP 16 20 78 NaN
1 AP+ 10 14 55 NaN
2 SP 32 26 90 NaN
3 Total 0 0 0 NaN
Let us try .iloc position selection
df.iloc[-1,1:]=df.iloc[:-1,1:].sum()
df['Sum']=df.iloc[:,1:].sum(axis=1)
df
Type R0 R1 R2 Sum
0 AP 16 20 78 114
1 AP+ 10 14 55 79
2 SP 32 26 90 148
3 Total 58 60 223 341
In general it may be better practice to specify column names:
import pandas as pd
# initialize list of lists
data = [['AP',16,20,78], ['AP+', 10,14,55], ['SP',32,26,90],['Total',0, 0, 0]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Type', 'R0', 'R1', 'R2'])
# List columns
cols_to_sum=['R0', 'R1', 'R2']
# Access last row and sum columns-wise
df.loc[df.index[-1], cols_to_sum] = df[cols_to_sum].sum(axis=0)
# Create 'Sum' column summing row-wise
df['Sum']=df[cols_to_sum].sum(axis=1)
df
Type R0 R1 R2 Sum
0 AP 16 20 78 114
1 AP+ 10 14 55 79
2 SP 32 26 90 148
3 Total 58 60 223 341

Column names after transposing a dataframe

I have a small dataframe - six rows (not counting the header) and 53 columns (a store name, and the rest weekly sales for the past year). Each row contains a particular store and each column the store's name and sales for each week. I need to transpose the data so that the weeks appear as rows, the stores appear as columns, and their sales appear as the rows.
To generate the input data:
df_store = pd.read_excel(SourcePath+SourceFile, sheet_name='StoreSales', header=0, usecols=['StoreName'])
# Number rows of all irrelevant stores.
row_numbers = [x+1 for x in df_stores[(df_store['StoreName'] != 'Store1') & (df_store['StoreName'] != 'Store2')
& (df_store['StoreName'] !='Store3')].index]
# Read in entire Excel file, skipping the rows of irrelevant stores.
df_store = pd.read_excel(SourcePath+SourceFile, sheet_name='StoreSales', header=0, usecols = "A:BE",
skiprows = row_numbers, converters = {'StoreName' : str})
# Transpose dataframe
df_store_t = df_store.transpose()
My output puts index numbers above each store name ( 0 to 5), and then each column starts out as StoreName (above the week), then each store name. Yet, I cannot manipulate them by their names.
Is there a way to clear those index numbers so that I can work directly with the resulting column names (e.g., rename "StoreName" to "WeekEnding" and make reference to each store columns ("Store1", "Store2", etc.?)
IIUC, you need to set_index first, then transpose, T:
See this example:
df = pd.DataFrame({'Store':[*'ABCDE'],
'Week 1':np.random.randint(50,200, 5),
'Week 2':np.random.randint(50,200, 5),
'Week 3':np.random.randint(50,200, 5)})
Input Dataframe:
Store Week 1 Week 2 Week 3
0 A 99 163 148
1 B 119 86 92
2 C 145 98 162
3 D 144 143 199
4 E 50 181 177
Now, set_index and transpose:
df_out = df.set_index('Store').T
df_out
Output:
Store A B C D E
Week 1 99 119 145 144 50
Week 2 163 86 98 143 181
Week 3 148 92 162 199 177

Why i am not getting the subtotal of rows

I am trying to get subtotal using pandas pivoting. I don't know why i am getting only column subtotal?
data = {'TypeOfInvestor':['Stocks', 'Bonds', 'Real Estate'],
'InvestorA': [96, 181, 88],
'InvestorB': [185, 3, 152],
'InvestorC': [39, 29, 142]}
df = pd.DataFrame(data)
pt = pd.pivot_table(df, values=['InvestorA', 'InvestorB', 'InvestorC'],
index=['TypeOfInvestor'],
aggfunc=np.sum, margins=True, margins_name='Total')
I expect to get subtotal of columns and subtotal of rows using pivot_table, but i am getting only subtotal of columns.
You can add this fairly easy by using .sum with axis=1:
pt['Total']= pt.sum(axis=1)
print(pt)
InvestorA InvestorB InvestorC Total
TypeOfInvestor
Bonds 181 3 29 213
Real Estate 88 152 142 382
Stocks 96 185 39 320
Total 365 340 210 915