I have a data set
id Category Date
1 Sick 2016-10-10
12:10:21
2 Active 2017-09-08
11:09:06
3 Weak 2018-11-12
06:10:04
Now i want to add a new column which only has year in the data set using pandas?
You could do:
import pandas as pd
data = [[1, 'Sick ', '2016-10-10 12:10:21'],
[2, 'Active', '2017-09-08 11:09:06'],
[3, 'Weak ', '2018-11-12 06:10:04']]
df = pd.DataFrame(data=data, columns=['id', 'category', 'date'])
df['year'] = pd.to_datetime(df['date']).dt.year
print(df)
Output
id category date year
0 1 Sick 2016-10-10 12:10:21 2016
1 2 Active 2017-09-08 11:09:06 2017
2 3 Weak 2018-11-12 06:10:04 2018
you can just do df['year'] = pd.DatetimeIndex(df['Date']).year
Output:
id category Date year
0 1 Sick 2016-10-10 12:10:21 2016
1 2 Active 2017-09-08 11:09:06 2017
2 3 Weak 2018-11-12 06:10:04 2018
Related
Example:
year_month = ['201801','201801','201801','201801','201801','201802','201802','201802','201802','201802']
Services = ['23','67','23','67','23','23','23','4','4','67']
df = list(zip(year_month, Services)
df = pd.DataFrame(df, columns=['Date', 'Services'])
Help me!
My date column is already in the right format, and I`ve alread have the YYYTMM column from that.
I tried something like:
df2 = df.loc[:, ['YYYYMM', 'Services']]
df2 = df.groupby(['YYYYMM']).count().reset_index()
EXPECTED OUTPUT
Quantity of services per month/year.
year_month 4 23 67
201801 0 3 2
201801 2 2 1
enter image description here
out = df.groupby('Date', as_index=False).count()
out
Date Services
0 201801 5
1 201802 5
Update
finally i know desired output.
pd.crosstab(df['Date'], df['Services']).sort_index(axis=1, key=lambda x: x.astype('int'))
Services 4 23 67
Date
201801 0 3 2
201802 2 2 1
I have a dataframe 'df1' that has 2 columns and i need to shift the 2nd column down a row and then remove the entire top row of the df1.
My data looks like this:
year ER12
0 2017 -2.05
1 2018 1.05
2 2019 -0.04
3 2020 -0.60
4 2021 -99.99
And, I need it to look like this:
year ER12
0 2018 -2.05
1 2019 1.05
2 2020 -0.04
3 2021 -0.60
We can try this:
df = df.assign(ER12=df.ER12.shift()).dropna().reset_index(drop=True)
print(df)
year ER12
0 2018 -2.05
1 2019 1.05
2 2020 -0.04
3 2021 -0.60
This works on your example:
import pandas as pd
df = pd.DataFrame({'year':[2017,2018,2019,2020,2021], 'ER12':[-2.05,1.05,-0.04,-0.6,-99.99]})
df['year'] = df['year'].shift(-1)
df = df.dropna()
I am trying to subset a large dataframe (5000+ rows and 15 columns) based on unique values from two columns (both are dtype = object). I want to exclude rows of data that meet the following criteria:
A column called 'Record' equals "MO" AND a column called 'Year' equals "2017" or "2018".
Here is an example of the dataframe:
df = pd.DataFrame({'A': [1001,2002,3003,4004,5005,6006,7007,8008,9009], 'Record' : ['MO','MO','I','I','MO','I','MO','I','I'], 'Year':[2017,2019,2018,2020,2018,2018,2020,2019,2017]})
print(df)
A Record Year
0 1001 MO 2017
1 2002 MO 2019
2 3003 I 2018
3 4004 I 2020
4 5005 MO 2018
5 6006 I 2018
6 7007 MO 2020
7 8008 I 2019
8 9009 I 2017
I would like any row with both "MO" and "2017", as well as both "MO" and "2018" taken out of the dataframe.
Example where the right rows (0 and 4 in dataframe above) are deleted:
df = pd.DataFrame({'A': [2002,3003,4004,6006,7007,8008,9009], 'Record' : ['MO','I','I','I','MO','I','I'], 'Year':[2019,2018,2020,2018,2020,2019,2017]})
print(df)
A Record Year
0 2002 MO 2019
1 3003 I 2018
2 4004 I 2020
3 6006 I 2018
4 7007 MO 2020
5 8008 I 2019
6 9009 I 2017
I have tried the following code, but it does not work (I tried at first for just one year):
df = df[(df['Record'] != "MO" & df['Year'] != "2017")]
I believe you're just missing some parenthesis.
df = df[(df['Record'] != "MO") & (df['Year'] != "2017")]
Edit:
After some clarification:
df = df[~((df['Record']=='MO')&
(df['Year']=='2017')|
(df['Year']=='2018'))]
supose i have tested data like below:
import pandas as pd
data_dic = {
"day": ['2019-01-18', '2019-01-18', '2019-01-18', '2019-01-19',
'2019-01-19','2019-01-25', '2019-02-19', '2019-02-24'],
"data": [0, 1,3,3, 0, 1,2 ,5],
"col2": [10, 11,1,1, 10, 1,2, 5],
"col3": [5, 6,7,8, 9, 1,2, 5]
}
df = pd.DataFrame(data_dic)
df.index = pd.to_datetime(df.day)
df = df.drop(['day'], axis=1)
df.insert(0, 'day_name', df.index.weekday_name)
Result:
day_name data col2 col3
day
2019-01-18 Friday 0 10 5
2019-01-18 Friday 1 11 6
2019-01-18 Friday 3 1 7
2019-01-19 Saturday 3 1 8
2019-01-19 Saturday 0 10 9
2019-01-25 Friday 1 1 1
2019-02-19 Tuesday 2 2 2
2019-02-24 Sunday 5 5 5
Now i need to group this data by week and by max value from column 2. I done this by:
df = df.groupby(df.index.to_period("w")).agg({'col2':'max'})
Result:
col2
day
2019-01-14/2019-01-20 11
2019-01-21/2019-01-27 1
2019-02-18/2019-02-24 5
Question:
How to get day date on witch the max grouped value is occurred?
Expected result:
col2 day
day
2019-01-14/2019-01-20 11 2019-01-18
2019-01-21/2019-01-27 1 2019-01-25
2019-02-18/2019-02-24 5 2019-02-24
Thanks for Your time and effort.
Use DataFrameGroupBy.idxmax with changed GroupBy.agg - specify column name after groupby and pass tuples:
df1 = df.groupby(df.index.to_period("w"))['col2'].agg([('col2','max'), ('day','idxmax')])
print (df1)
col2 day
day
2019-01-14/2019-01-20 11 2019-01-18
2019-01-21/2019-01-27 1 2019-01-25
2019-02-18/2019-02-24 5 2019-02-24
Pandas 0.25+ solution:
df.groupby(df.index.to_period("w")).agg(col2=pd.NamedAgg(column='col2', aggfunc='max'),
day=pd.NamedAgg(column='col2', aggfunc='idxmax'))
I have two DataFrames in pandas. One of them has data every month, the other one has data every year. I need to do some computation where the yearly value is added to the monthly value.
Something like this:
df1, monthly:
2013-01-01 1
2013-02-01 1
...
2014-01-01 1
2014-02-01 1
...
2015-01-01 1
df2, yearly:
2013-01-01 1
2014-01-01 2
2015-01-01 3
And I want to produce something like this:
2013-01-01 (1+1) = 2
2013-02-01 (1+1) = 2
...
2014-01-01 (1+2) = 3
2014-02-01 (1+2) = 3
...
2015-01-01 (1+3) = 4
Where the value of the monthly data is added to the value of the yearly data depending on the year (first value in the parenthesis is the monthly data, second value is the yearly data).
Assuming your "month" column is called date in the Dataframe df, then you can obtain the year by using the dt member:
pd.to_datetime(df.date).dt.year
Add a column like that to your month DataFrame, and call it year. (See this for an explanation).
Now do the same to the year DataFrame.
Do a merge on the month and year DataFrames, specifying how=left.
In the resulting DataFrame, you will have both columns. Now just add them.
Example
month_df = pd.DataFrame({
'date': ['2013-01-01', '2013-02-01', '2014-02-01'],
'amount': [1, 2, 3]})
year_df = pd.DataFrame({
'date': ['2013-01-01', '2014-02-01', '2015-01-01'],
'amount': [7, 8, 9]})
month_df['year'] = pd.to_datetime(month_df.date).dt.year
year_df['year'] = pd.to_datetime(year_df.date).dt.year
>>> pd.merge(
month_df,
year_df,
left_on='year',
right_on='year',
how='left')
amount_x date_x year amount_y date_y
0 1 2013-01-01 2013 7 2013-01-01
1 2 2013-02-01 2013 7 2013-01-01
2 3 2014-02-01 2014 8 2014-02-01