Changing value of column in pandas chaining - pandas

I have a dataset like this:
year artist track time date.entered wk1 wk2
2000 Pac Baby 4:22 2000-02-26 87 82
2000 Geher The 3:15 2000-09-02 91 87
2000 three_DoorsDown Kryptonite 3:53 2000-04-08 81 70
2000 ATeens Dancing_Queen 3:44 2000-07-08 97 97
2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 84 62
2000 Aaliyah Try_Again 4:03 2000-03-18 59 53
2000 Yolanda Open_My_Heart 5:30 2000-08-26 76 76
My desired output is like this:
year artist track time date week rank
0 2000 Pac Baby 4:22 2000-02-26 1 87
1 2000 Pac Baby 4:22 2000-03-04 2 82
6 2000 ATeens Dancing_Queen 3:44 2000-07-08 1 97
7 2000 ATeens Dancing_Queen 3:44 2000-07-15 2 97
8 2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 1 84
Basically, I am tidying up the given billboard data.
Without pandas chaining I could do this easily like this:
df = pd.read_clipboard()
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'}))
df1['date'] = pd.to_datetime(df1['date']) + pd.to_timedelta((df1['week'] - 1) * 7, 'd')
df1 = df1.sort_values(by=['track', 'date'])
print(df1.head())
Question
Is there a way I can chain the df1['date'] = pd.to_datetime(...) part? So that the whole operation can fit into single chain?

Use assign:
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'})
.assign(date = lambda x: pd.to_datetime(x['date']) +
pd.to_timedelta((x['week'] - 1) * 7, 'd'))
.sort_values(by=['track', 'date'])
)

Related

Not sure the order of melt/stacking/unstacking to morph my Data Frame

I have a multiindex column dataframe. I want to preserve the existing index, but move a level from the multindex columns to become a sublevel of the index instead.
I can't figure out the correct incantation of melt/stack/unstack/pivot to move from what i have to what i want. Unstacking() turned things into a series and lost the original date index.
names = ['mike', 'matt', 'dave']
details = ['bla', 'foo', ]
columns = pd.MultiIndex.from_tuples((n,d) for n in names for d in details)
index = pd.date_range(start="2022-10-30", end="2022-11-3" ,freq="d", )
have = pd.DataFrame(np.random.randint(0,100, size = (5,6)), index=index, columns=columns)
have
want_columns = details
want_index = pd.MultiIndex.from_product([index, names])
want = pd.DataFrame(np.random.randint(0,100, size = (15,2)), index=want_index, columns=want_columns)
want
Use DataFrame.stack with level=0:
print (have.stack(level=0))
bla foo
2022-10-30 dave 88 18
matt 49 55
mike 92 45
2022-10-31 dave 33 27
matt 53 41
mike 24 16
2022-11-01 dave 48 19
matt 94 75
mike 11 19
2022-11-02 dave 16 90
matt 14 93
mike 38 72
2022-11-03 dave 80 15
matt 97 2
mike 11 94

Creating a year-week column from distinct year and week columns in pandas dataframe

There's a data frame on which I want to perform time series analysis. The data frame contains distinct columns of year and week:
Year Week stateID values
2018 2 231 55
2010 3 231 92
2000 5 231 56
2018 2 321 55
2010 3 321 45
For performing analysis, I want to concatenate year week column to datetime object having the format Year-Week. i.e:
Year-Week stateID values
2018-2 231 55
2010-3 231 92
2000-5 231 56
2018-2 321 55
2010-3 321 45
How can I achieve it in pandas?
IIUC,
df['Year-Week'] = df['Year'].astype(str) + '-' + df['Week'].astype(str)
df['Period'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Week'].astype(str) + '-' +'0', format='%Y-%W-%w').dt.to_period('W')
Output:
Year Week stateID values Year-Week Period
0 2018 2 231 55 2018-2 2018-01-08/2018-01-14
1 2010 3 231 92 2010-3 2010-01-18/2010-01-24
2 2000 5 231 56 2000-5 2000-01-31/2000-02-06
3 2018 2 321 55 2018-2 2018-01-08/2018-01-14
4 2010 3 321 45 2010-3 2010-01-18/2010-01-24
Something like this should work:
df["Year-Week"] = df.Year.astype(str).str.cat(df.Week.astype(str), sep="-")

Excluding IDs with some value day after day

I have this df:
ID Date X Y
A 16-07-19 123 56
A 17-07-19 456 84
A 18-07-19 0 58
A 19-07-19 123 81
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
C 14-07-19 0 86
C 16-07-19 456 91
C 17-07-19 456 86
C 18-07-19 0 41
C 19-07-19 456 26
C 20-07-19 456 17
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94
I want to exclude IDs that have any value in X column (except for 0) day after day.
For example: A has the value 123 on 16-07-19, and 456 on 17-07-19. So all A's observations should be excluded.
Expected result:
ID Date X Y
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94
Let's do this in a vectorized manner, to keep our code as efficient as possible
(meaning: we avoid using GroupBy.apply)
First we check if the difference in Date is equal to 1 day
We check if X column is not equal to 0
we create a temporary column m where we check if both conditions are True
We groupby on ID and remove all groups where any of the rows are True
# df['Date'] = pd.to_datetime(df['Date']) <- if Date is not datetime type
m1 = df['Date'].diff(1).eq(pd.Timedelta(1, unit='d'))
m2 = df['X'].ne(0)
df['m'] = m1&m2
df = df[~df.groupby('ID')['m'].transform('any')].drop(columns='m')
ID Date X Y
4 B 2019-07-19 456 70
5 B 2019-07-21 789 46
6 B 2019-07-22 0 19
7 B 2019-07-23 0 91
14 D 2019-06-07 789 98
15 D 2019-08-07 789 90
16 D 2019-09-07 0 94

Need assistance with below query

I'm getting this error:
Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
Code: import webbrowser
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
webbrowser.open(website)
league_frame = pd.read_clipboard()
And the above mentioned comes next.
I believe you need use read_html - returned all parsed tables and select Dataframe by position:
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
#select first parsed table
df1 = pd.read_html(website)[0]
print (df1.head())
Win % Wins Losses Year Team Comment
0 0.798 67 17 1882 Chicago White Stockings best pre-modern season
1 0.763 116 36 1906 Chicago Cubs best 154-game NL season
2 0.721 111 43 1954 Cleveland Indians best 154-game AL season
3 0.716 116 46 2001 Seattle Mariners best 162-game AL season
4 0.667 108 54 1975 Cincinnati Reds best 162-game NL season
#select second parsed table
df2 = pd.read_html(website)[1]
print (df2)
Win % Wins Losses Season Team \
0 0.890 73 9 2015–16 Golden State Warriors
1 0.110 9 73 1972–73 Philadelphia 76ers
2 0.106 7 59 2011–12 Charlotte Bobcats
Comment
0 best 82 game season
1 worst 82-game season
2 worst season statistically

how to concat corresponding rows value to make column name in pandas?

I have the below dataframe has in a messy way and I need to club row 0 and 1 to make that as columns and keep rest rows from 3 asis:
Start Date 2005-01-01 Unnamed: 3 Unnamed: 4 Unnamed: 5
Dat an_1 an_2 an_3 an_4 an_5
mt mt s t inch km
23 45 67 78 89 9000
change to below dataframe :
Dat_mt an_1_mt an_2 _s an_3_t an_4_inch an_5_km
23 45 67 78 89 9000
IIUC
df.columns=df.loc[0]+'_'+df.loc[1]
df=df.loc[[2]]
df
Out[429]:
Dat_mt an_1_mt an_2_s an_3_t an_4_inch an_5_km
2 23 45 67 78 89 9000