week number from given date in pandas - pandas

I have a data frame with two columns Date and value.
I want to add new column named week_number that basically is how many weeks back from the given date
import pandas as pd
df = pd.DataFrame(columns=['Date','value'])
df['Date'] = [ '04-02-2019','03-02-2019','28-01-2019','20-01-2019']
df['value'] = [10,20,30,40]
df
Date value
0 04-02-2019 10
1 03-02-2019 20
2 28-01-2019 30
3 20-01-2019 40
suppose given date is 05-02-2019.
Then I need to add a column week_number in a way such that how many weeks back the Date column date is from given date.
The output should be
Date value week_number
0 04-02-2019 10 1
1 03-02-2019 20 1
2 28-01-2019 30 2
3 20-01-2019 40 3
how can I do this in pandas

First convert column to datetimes by to_datetime with dayfirst=True, then subtract from right side by rsub, convert timedeltas to days, get modulo by 7 and add 1:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['week_number'] = df['Date'].rsub(pd.Timestamp('2019-02-05')).dt.days // 7 + 1
#alternative
#df['week_number'] = (pd.Timestamp('2019-02-05') - df['Date']).dt.days // 7 + 1
print (df)
Date value week_number
0 2019-02-04 10 1
1 2019-02-03 20 1
2 2019-01-28 30 2
3 2019-01-20 40 3

Related

Write text in a column based on ascending dates. Pandas Python

There are three dates in a df Date column sorted in ascending order. How to write text 'Short' for nearest date, 'Mid' for next date, 'Long' for the farthest date in a new column adjacent to the Date column ? i.e. 2021-04-23 = Short, 2021-05-11 = Mid and 2021-10-08 = Long.
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU","CPU", "Speakers"],
"Unit_Price":[500,200, 5000.235, 10000.550, 10000.550, 250.50],
"No_Of_Units":[5,5, 10, 20, 20, 8],
"Available_Quantity":[5,6,10,1,3,2],
"Date":['11-05-2021', '23-04-2021', '08-10-2021','23-04-2021', '08-10-2021','11-05-2021']
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'], format = '%d-%m-%Y')
df = df.sort_values(by='Date')
Convert to_datetime and rank the dates, then map your values in the desired order:
df['New'] = (pd.to_datetime(df['Date']).rank(method='dense')
.map(dict(enumerate(['Short', 'Mid', 'Long'], start=1)))
)
Output:
product_name Unit_Price No_Of_Units Available_Quantity Date New
1 Mouse 200.000 5 6 2021-04-23 Short
3 CPU 10000.550 20 1 2021-04-23 Short
0 Keyboard 500.000 5 5 2021-05-11 Mid
5 Speakers 250.500 8 2 2021-05-11 Mid
2 Monitor 5000.235 10 10 2021-10-08 Long
4 CPU 10000.550 20 3 2021-10-08 Long

Leave the first TWO dates for each id

I have a dataframe of id number and dates:
import pandas as pd
df = pd.DataFrame([['1','01/01/2000'], ['1','01/07/2002'],['1', '04/05/2003'],
['2','01/05/2010'], ['2','08/08/2009'],
['3','12/11/2008']], columns=['id','start_date'])
df
id start_date
0 1 01/01/2000
1 1 01/07/2002
2 1 04/05/2003
3 2 01/05/2010
4 2 08/08/2009
5 3 12/11/2008
I am looking for a way to leave for each id the first TWO dates (i.e. the two earliest dates).
for the example above the output would be:
id start_date
0 1 01/01/2000
1 1 01/07/2002
2 2 08/08/2009
3 2 01/05/2010
4 3 12/11/2008
Thanks!
ensure timestamp
df['start_date']=pd.to_datetime(df['start_date'])
sort values
df=df.sort_values(by=['id','start_date'])
group and select first 2 only
df_=df.groupby('id')['id','start_date'].head(2)
Just group by id and then you can call head. Be sure to sort your values first.
df = df.sort_values(['id', 'start_date'])
df.groupby('id').head(2)
full code:
df = pd.DataFrame([['1','01/01/2000'], ['1','01/07/2002'],['1', '04/05/2003'],
['2','01/05/2010'], ['2','08/08/2009'],
['3','12/11/2008']], columns=['id','start_date'])
# 1. convert 'start_time' column to datetime
df['start_date'] = pd.to_datetime(df['start_date'])
# 2. sort the dataframe ascending by 'start_time'
df.sort_values(by='start_date', ascending=True, inplace=True)
# 3. select only the first two occurances of each id
df.groupby('id').head(2)
output:
id start_date
0 1 2000-01-01
1 1 2002-01-07
5 3 2008-12-11
4 2 2009-08-08
3 2 2010-01-05

pandas add datetime column [duplicate]

This question already has answers here:
Pandas: create timestamp from 3 columns: Month, Day, Hour
(2 answers)
How to combine multiple columns in a Data Frame to Pandas datetime format
(3 answers)
Closed 1 year ago.
I have a dataframe where year, month, day, hour is split into separate columns:
df.head()
Year Month Day Hour
0 2020 1 1 0
1 2020 1 1 1
2 2020 1 1 2
...
I'd like to add a proper datetime column to the dataframe so I end up with something along these lines:
df.head()
Year Month Day Hour datetime
0 2020 1 1 0 2020-01-01T00:00
1 2020 1 1 1 2020-01-01T01:00
2 2020 1 1 2 2020-01-01T02:00
...
I could add a loop that processes one row at a time, but that's not panda-esque.
Here are three things that don't work (not that I expected any of them to do so):
df['datetime'] = pd.to_datetime(datetime.datetime(df['Year'], df['Month'], df['Day'], df['Hour']))
df['datetime'] = pd.to_datetime(df['Year'], df['Month'], df['Day'], df['Hour'])
df['datetime'] = pd.datetime(df['Year'], df['Month'], df['Day'], df['Hour'])

Converting daily data to monthly and get months last value in pandas

I have data consists of daily data that belongs to specific month and year like this
I want to convert all daily data to monthly data and I want to get the last value of that month as a return value of that monthly data
for example:
AccoutId, Date, Return
1 2016-01 -4.1999 (Because this return value is last value of january 1/29/16)
1 2016-02 0.19 (Same here last value of february 2/29/16)
and so on
I've looked some of topics about converting daily data to monthly data but the problem is that after converting daily data to monthly data, they take the mean() or sum() of that month as a return value. Conversely, I want the last return value of that month as the return value.
You can groupby AccountId and the Year-Month. Convert to datetime first and then format as Year-Month as follows: df['Date'].dt.strftime('%Y-%m'). Then just use last():
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Sample data:
In[1]:
AccountId Date Return
0 1 1/7/16 15
1 1 1/29/16 10
2 1 2/1/16 25
3 1 2/15/16 20
4 1 2/28/16 30
df['Date'] = pd.to_datetime(df['Date'])
df = df.groupby(['AccountId', df['Date'].dt.strftime('%Y-%m')])['Return'].last().reset_index()
df
Out[1]:
AccountId Date Return
0 1 2016-01 10
1 1 2016-02 30

Selecting all the previous 6 months data records from occurrence of a particular value in a column in pandas

I want to select all the previous 6 months records for a customer whenever a particular transaction is done by the customer.
Data looks like:
Cust_ID Transaction_Date Amount Description
1 08/01/2017 12 Moved
1 03/01/2017 15 X
1 01/01/2017 8 Y
2 10/01/2018 6 Moved
2 02/01/2018 12 Z
Here, I want to see for the Description "Moved" and then select all the last 6 months for every Cust_ID.
Output should look like:
Cust_ID Transaction_Date Amount Description
1 08/01/2017 12 Moved
1 03/01/2017 15 X
2 10/01/2018 6 Moved
I want to do this in python. Please help.
Idea is created Series of datetimes filtered by Moved and shifted by MonthOffset, last filter by Series.map values less like this offsets:
EDIT: Get all datetimes for each Moved values:
df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'])
df = df.sort_values(['Cust_ID','Transaction_Date'])
df['g'] = df['Description'].iloc[::-1].eq('Moved').cumsum()
s = (df[df['Description'].eq('Moved')]
.set_index(['Cust_ID','g'])['Transaction_Date'] - pd.offsets.MonthOffset(6))
mask = df.join(s.rename('a'), on=['Cust_ID','g'])['a'] < df['Transaction_Date']
df1 = df[mask].drop('g', axis=1)
EDIT1: Get all datetimes for Moved with minimal datetimes per groups, another Moved per groups are removed:
print (df)
Cust_ID Transaction_Date Amount Description
0 1 10/01/2017 12 X
1 1 01/23/2017 15 Moved
2 1 03/01/2017 8 Y
3 1 08/08/2017 12 Moved
4 2 10/01/2018 6 Moved
5 2 02/01/2018 12 Z
#convert to datetimes
df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'])
#mask for filter Moved rows
mask = df['Description'].eq('Moved')
#filter and sorting this rows
df1 = df[mask].sort_values(['Cust_ID','Transaction_Date'])
print (df1)
Cust_ID Transaction_Date Amount Description
1 1 2017-01-23 15 Moved
3 1 2017-08-08 12 Moved
4 2 2018-10-01 6 Moved
#get duplicated filtered rows in df1
mask = df1.duplicated('Cust_ID')
#create Series for map
s = df1[~mask].set_index('Cust_ID')['Transaction_Date'] - pd.offsets.MonthOffset(6)
print (s)
Cust_ID
1 2016-07-23
2 2018-04-01
Name: Transaction_Date, dtype: datetime64[ns]
#create mask for filter out another Moved (get only first for each group)
m2 = ~mask.reindex(df.index, fill_value=False)
df1 = df[(df['Cust_ID'].map(s) < df['Transaction_Date']) & m2]
print (df1)
Cust_ID Transaction_Date Amount Description
0 1 2017-10-01 12 X
1 1 2017-01-23 15 Moved
2 1 2017-03-01 8 Y
4 2 2018-10-01 6 Moved
EDIT2:
#get last duplicated filtered rows in df1
mask = df1.duplicated('Cust_ID', keep='last')
#create Series for map
s = df1[~mask].set_index('Cust_ID')['Transaction_Date']
print (s)
Cust_ID
1 2017-08-08
2 2018-10-01
Name: Transaction_Date, dtype: datetime64[ns]
m2 = ~mask.reindex(df.index, fill_value=False)
#filter by between Moved and next 6 months
df3 = df[df['Transaction_Date'].between(df['Cust_ID'].map(s), df['Cust_ID'].map(s + pd.offsets.MonthOffset(6))) & m2]
print (df3)
Cust_ID Transaction_Date Amount Description
3 1 2017-08-08 12 Moved
0 1 2017-10-01 12 X
4 2 2018-10-01 6 Moved