Changing value of column in pandas chaining

Changing value of column in pandas chaining - pandas

I have a dataset like this:
year artist track time date.entered wk1 wk2
2000 Pac Baby 4:22 2000-02-26 87 82
2000 Geher The 3:15 2000-09-02 91 87
2000 three_DoorsDown Kryptonite 3:53 2000-04-08 81 70
2000 ATeens Dancing_Queen 3:44 2000-07-08 97 97
2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 84 62
2000 Aaliyah Try_Again 4:03 2000-03-18 59 53
2000 Yolanda Open_My_Heart 5:30 2000-08-26 76 76
My desired output is like this:
year artist track time date week rank
0 2000 Pac Baby 4:22 2000-02-26 1 87
1 2000 Pac Baby 4:22 2000-03-04 2 82
6 2000 ATeens Dancing_Queen 3:44 2000-07-08 1 97
7 2000 ATeens Dancing_Queen 3:44 2000-07-15 2 97
8 2000 Aaliyah I_Dont_Wanna 4:15 2000-01-29 1 84
Basically, I am tidying up the given billboard data.
Without pandas chaining I could do this easily like this:
df = pd.read_clipboard()
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'}))
df1['date'] = pd.to_datetime(df1['date']) + pd.to_timedelta((df1['week'] - 1) * 7, 'd')
df1 = df1.sort_values(by=['track', 'date'])
print(df1.head())
Question
Is there a way I can chain the df1['date'] = pd.to_datetime(...) part? So that the whole operation can fit into single chain?

Use assign:
df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week')
.reset_index()
.rename(columns={'date.entered': 'date', 'wk': 'rank'})
.assign(date = lambda x: pd.to_datetime(x['date']) +
pd.to_timedelta((x['week'] - 1) * 7, 'd'))
.sort_values(by=['track', 'date'])
)

Related

Not sure the order of melt/stacking/unstacking to morph my Data Frame

I have a multiindex column dataframe. I want to preserve the existing index, but move a level from the multindex columns to become a sublevel of the index instead.
I can't figure out the correct incantation of melt/stack/unstack/pivot to move from what i have to what i want. Unstacking() turned things into a series and lost the original date index.
names = ['mike', 'matt', 'dave']
details = ['bla', 'foo', ]
columns = pd.MultiIndex.from_tuples((n,d) for n in names for d in details)
index = pd.date_range(start="2022-10-30", end="2022-11-3" ,freq="d", )
have = pd.DataFrame(np.random.randint(0,100, size = (5,6)), index=index, columns=columns)
have
want_columns = details
want_index = pd.MultiIndex.from_product([index, names])
want = pd.DataFrame(np.random.randint(0,100, size = (15,2)), index=want_index, columns=want_columns)
want

Use DataFrame.stack with level=0:
print (have.stack(level=0))
bla foo
2022-10-30 dave 88 18
matt 49 55
mike 92 45
2022-10-31 dave 33 27
matt 53 41
mike 24 16
2022-11-01 dave 48 19
matt 94 75
mike 11 19
2022-11-02 dave 16 90
matt 14 93
mike 38 72
2022-11-03 dave 80 15
matt 97 2
mike 11 94

Creating a year-week column from distinct year and week columns in pandas dataframe

There's a data frame on which I want to perform time series analysis. The data frame contains distinct columns of year and week:
Year Week stateID values
2018 2 231 55
2010 3 231 92
2000 5 231 56
2018 2 321 55
2010 3 321 45
For performing analysis, I want to concatenate year week column to datetime object having the format Year-Week. i.e:
Year-Week stateID values
2018-2 231 55
2010-3 231 92
2000-5 231 56
2018-2 321 55
2010-3 321 45
How can I achieve it in pandas?

IIUC,
df['Year-Week'] = df['Year'].astype(str) + '-' + df['Week'].astype(str)
df['Period'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Week'].astype(str) + '-' +'0', format='%Y-%W-%w').dt.to_period('W')
Output:
Year Week stateID values Year-Week Period
0 2018 2 231 55 2018-2 2018-01-08/2018-01-14
1 2010 3 231 92 2010-3 2010-01-18/2010-01-24
2 2000 5 231 56 2000-5 2000-01-31/2000-02-06
3 2018 2 321 55 2018-2 2018-01-08/2018-01-14
4 2010 3 321 45 2010-3 2010-01-18/2010-01-24

Something like this should work:
df["Year-Week"] = df.Year.astype(str).str.cat(df.Week.astype(str), sep="-")

Excluding IDs with some value day after day

I have this df:
ID Date X Y
A 16-07-19 123 56
A 17-07-19 456 84
A 18-07-19 0 58
A 19-07-19 123 81
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
C 14-07-19 0 86
C 16-07-19 456 91
C 17-07-19 456 86
C 18-07-19 0 41
C 19-07-19 456 26
C 20-07-19 456 17
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94
I want to exclude IDs that have any value in X column (except for 0) day after day.
For example: A has the value 123 on 16-07-19, and 456 on 17-07-19. So all A's observations should be excluded.
Expected result:
ID Date X Y
B 19-07-19 456 70
B 21-07-19 789 46
B 22-07-19 0 19
B 23-07-19 0 91
D 06-07-19 789 98
D 08-07-19 789 90
D 09-07-19 0 94

Let's do this in a vectorized manner, to keep our code as efficient as possible
(meaning: we avoid using GroupBy.apply)
First we check if the difference in Date is equal to 1 day
We check if X column is not equal to 0
we create a temporary column m where we check if both conditions are True
We groupby on ID and remove all groups where any of the rows are True
# df['Date'] = pd.to_datetime(df['Date']) <- if Date is not datetime type
m1 = df['Date'].diff(1).eq(pd.Timedelta(1, unit='d'))
m2 = df['X'].ne(0)
df['m'] = m1&m2
df = df[~df.groupby('ID')['m'].transform('any')].drop(columns='m')
ID Date X Y
4 B 2019-07-19 456 70
5 B 2019-07-21 789 46
6 B 2019-07-22 0 19
7 B 2019-07-23 0 91
14 D 2019-06-07 789 98
15 D 2019-08-07 789 90
16 D 2019-09-07 0 94

Need assistance with below query

I'm getting this error:
Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
Code: import webbrowser
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
webbrowser.open(website)
league_frame = pd.read_clipboard()
And the above mentioned comes next.

I believe you need use read_html - returned all parsed tables and select Dataframe by position:
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
#select first parsed table
df1 = pd.read_html(website)[0]
print (df1.head())
Win % Wins Losses Year Team Comment
0 0.798 67 17 1882 Chicago White Stockings best pre-modern season
1 0.763 116 36 1906 Chicago Cubs best 154-game NL season
2 0.721 111 43 1954 Cleveland Indians best 154-game AL season
3 0.716 116 46 2001 Seattle Mariners best 162-game AL season
4 0.667 108 54 1975 Cincinnati Reds best 162-game NL season
#select second parsed table
df2 = pd.read_html(website)[1]
print (df2)
Win % Wins Losses Season Team \
0 0.890 73 9 2015–16 Golden State Warriors
1 0.110 9 73 1972–73 Philadelphia 76ers
2 0.106 7 59 2011–12 Charlotte Bobcats
Comment
0 best 82 game season
1 worst 82-game season
2 worst season statistically

how to concat corresponding rows value to make column name in pandas?

I have the below dataframe has in a messy way and I need to club row 0 and 1 to make that as columns and keep rest rows from 3 asis:
Start Date 2005-01-01 Unnamed: 3 Unnamed: 4 Unnamed: 5
Dat an_1 an_2 an_3 an_4 an_5
mt mt s t inch km
23 45 67 78 89 9000
change to below dataframe :
Dat_mt an_1_mt an_2 _s an_3_t an_4_inch an_5_km
23 45 67 78 89 9000

IIUC
df.columns=df.loc[0]+'_'+df.loc[1]
df=df.loc[[2]]
df
Out[429]:
Dat_mt an_1_mt an_2_s an_3_t an_4_inch an_5_km
2 23 45 67 78 89 9000

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Changing value of column in pandas chaining - pandas

Use assign: df1 = (pd.wide_to_long(df, 'wk', i=df.columns.values[:5], j='week') .reset_index() .rename(columns={'date.entered': 'date', 'wk': 'rank'}) .assign(date = lambda x: pd.to_datetime(x['date']) + pd.to_timedelta((x['week'] - 1) * 7, 'd')) .sort_values(by=['track', 'date']) )

Related

Not sure the order of melt/stacking/unstacking to morph my Data Frame

Creating a year-week column from distinct year and week columns in pandas dataframe

Excluding IDs with some value day after day

Need assistance with below query

how to concat corresponding rows value to make column name in pandas?

Categories

Resources