Quickly fill cells with datetime based on column name in pandas? - pandas

I need to convert my cumbersome column headers into a datetime for every cell in that column. For example, I need the datetime "2001-10-06 6:00" from the column header 20011006_6_blah_blah_blah. I have a column of other datetimes that I will eventually be using to do some calculations.
Construction of an example df:
date_rng0=pd.date_range(start=datetime.date(2001,10,1),end=datetime.date(2001,10,7),freq='D')
date_rng1=pd.date_range(start=datetime.date(2001,10,5),end=datetime.date(2001,10,8),freq='D')
drstr0=[str(i.year)+str(i.month)+str(i.day)+'_blah' for i in date_rng0]
drstr1=[str(i.year)+str(i.month)+str(i.day)+'_blah' for i in date_rng1]
#make zero df
arr=np.zeros((len(date_rng0),len(date_rng1))) # all ones, mask out below
df=pd.DataFrame(arr,index=drstr0,columns=drstr1)
First I copy all the column names into the cells, column by column. This is very slow with my data:
for c in df.columns:
df[c]=c
Then I convert them to datetime using an atrocious looking lambda mess:
for c in df.columns:
df.loc[:,c]=df.loc[:,c].apply(lambda x: datetime.date(int(x.split('_')[0][:4]),int(x.split('_')[0][4:6]),int(x.split('_')[0][6:])))
Then I make a datetime column using a similar lambda function:
df['date_time']=df.index
df['date_time']=df.loc[:,'date_time'].apply(lambda x: datetime.date(int(x.split('_')[0][:4]),int(x.split('_')[0][4:6]),int(x.split('_')[0][6:])))
df.head()
gives:
2001105_blah 2001106_blah 2001107_blah 2001108_blah date_time
2001101_blah 2001-10-05 2001-10-06 2001-10-07 2001-10-08 2001-10-01
2001102_blah 2001-10-05 2001-10-06 2001-10-07 2001-10-08 2001-10-02
2001103_blah 2001-10-05 2001-10-06 2001-10-07 2001-10-08 2001-10-03
2001104_blah 2001-10-05 2001-10-06 2001-10-07 2001-10-08 2001-10-04
2001105_blah 2001-10-05 2001-10-06 2001-10-07 2001-10-08 2001-10-05
Then I can do a little math:
ndf=df.copy()
for c in df.columns:
ndf.loc[:,c]=df.loc[:,c]-df.loc[:,'date_time']
Which gives what I am ultimately after:
2001105_blah 2001106_blah 2001107_blah 2001108_blah date_time
2001101_blah 4 days 5 days 6 days 7 days 0 days
2001102_blah 3 days 4 days 5 days 6 days 0 days
2001103_blah 2 days 3 days 4 days 5 days 0 days
2001104_blah 1 days 2 days 3 days 4 days 0 days
2001105_blah 0 days 1 days 2 days 3 days 0 days
The problem is, this process has never completed using my 2,000 x 30,000 dataframe despite walking away for 30 min. I feel like I am doing something wrong. Any suggestions to improve the efficiency?

You can try with str.split, ' '.join, and pd.to_datetime
#add new column with values as the column names joined into a string
df['temp']=(' '.join(df.columns.astype(str)))
#expand the dataframe
temp=df['temp'].str.split(expand=True)
#rename the columns with original names
temp.columns=df.columns[:-1]
#parse the index to datetime
index=pd.to_datetime(df.index.str.split('_').str[0],format='%Y%m%d').to_numpy()
#substract the index to each column
newdf=temp.apply(lambda x: pd.to_datetime(x.str.split('_').str[0],format='%Y%m%d')-index)
#mask only the rows where all values are non-negative
newdf=newdf[newdf.apply(lambda x: x >= pd.Timedelta(0)).all(1)]
Output:
print(newdf)
2001105_blah 2001106_blah 2001107_blah 2001108_blah
2001101_blah 4 days 5 days 6 days 7 days
2001102_blah 3 days 4 days 5 days 6 days
2001103_blah 2 days 3 days 4 days 5 days
2001104_blah 1 days 2 days 3 days 4 days
2001105_blah 0 days 1 days 2 days 3 days

Related

How to calculate time difference in minutes from 2 data frame columns in Pandas

I have 2 Panda data frame columns in datetime format and want to get the time difference in minutes. with my code, I get output (output type is "timedelta64[ns]") with days and hours, etc. How can I get it in minutes? Thank you
df['TIME_TO_REPORT']= df['TEST_TIME'] - df['RECEIPT_DATE_TIME']
Output
0 0 days 05:58:00
1 0 days 03:46:00
2 0 days 05:25:00
3 0 days 05:24:00
4 0 days 05:24:00
Use total_seconds to get the time duration in seconds, and then divide by 60 to convert it to minutes
df['TIME_TO_REPORT']= (df['TEST_TIME'] - df['RECEIPT_DATE_TIME']).dt.total_seconds() / 60

pandas add datetime column [duplicate]

This question already has answers here:
Pandas: create timestamp from 3 columns: Month, Day, Hour
(2 answers)
How to combine multiple columns in a Data Frame to Pandas datetime format
(3 answers)
Closed 1 year ago.
I have a dataframe where year, month, day, hour is split into separate columns:
df.head()
Year Month Day Hour
0 2020 1 1 0
1 2020 1 1 1
2 2020 1 1 2
...
I'd like to add a proper datetime column to the dataframe so I end up with something along these lines:
df.head()
Year Month Day Hour datetime
0 2020 1 1 0 2020-01-01T00:00
1 2020 1 1 1 2020-01-01T01:00
2 2020 1 1 2 2020-01-01T02:00
...
I could add a loop that processes one row at a time, but that's not panda-esque.
Here are three things that don't work (not that I expected any of them to do so):
df['datetime'] = pd.to_datetime(datetime.datetime(df['Year'], df['Month'], df['Day'], df['Hour']))
df['datetime'] = pd.to_datetime(df['Year'], df['Month'], df['Day'], df['Hour'])
df['datetime'] = pd.datetime(df['Year'], df['Month'], df['Day'], df['Hour'])

week number from given date in pandas

I have a data frame with two columns Date and value.
I want to add new column named week_number that basically is how many weeks back from the given date
import pandas as pd
df = pd.DataFrame(columns=['Date','value'])
df['Date'] = [ '04-02-2019','03-02-2019','28-01-2019','20-01-2019']
df['value'] = [10,20,30,40]
df
Date value
0 04-02-2019 10
1 03-02-2019 20
2 28-01-2019 30
3 20-01-2019 40
suppose given date is 05-02-2019.
Then I need to add a column week_number in a way such that how many weeks back the Date column date is from given date.
The output should be
Date value week_number
0 04-02-2019 10 1
1 03-02-2019 20 1
2 28-01-2019 30 2
3 20-01-2019 40 3
how can I do this in pandas
First convert column to datetimes by to_datetime with dayfirst=True, then subtract from right side by rsub, convert timedeltas to days, get modulo by 7 and add 1:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df['week_number'] = df['Date'].rsub(pd.Timestamp('2019-02-05')).dt.days // 7 + 1
#alternative
#df['week_number'] = (pd.Timestamp('2019-02-05') - df['Date']).dt.days // 7 + 1
print (df)
Date value week_number
0 2019-02-04 10 1
1 2019-02-03 20 1
2 2019-01-28 30 2
3 2019-01-20 40 3

pandas: get third business day of month

i Have a daterange
pd.bdate_range("2001-01-01", "2018-01-01")
and want to find the third business day of the month (ignore holidays for now). How do I do that?
As you are already in business dates, you could resample to the start of the business month ('BMS') and add an offset of 3 business days::
>>> pd.Series(index=pd.bdate_range("2001-01-01",
"2018-01-01")).resample('BMS').index + pd.datetools.BDay(3)
DatetimeIndex(['2001-01-04', '2001-02-06', '2001-03-06', '2001-04-05',
'2001-05-04', '2001-06-06', '2001-07-05', '2001-08-06',
'2001-09-06', '2001-10-04',
...
'2017-04-06', '2017-05-04', '2017-06-06', '2017-07-06',
'2017-08-04', '2017-09-06', '2017-10-05', '2017-11-06',
'2017-12-06', '2018-01-04'],
dtype='datetime64[ns]', length=205, freq=None)
You'll find further details on how to work with dates in pandas in the documentation.
Plan: groupby year, month. Choose third with nth().
This example will be easier with a series:
dates = pd.Series(pd.bdate_range("2001-01-01", "2018-01-01"))
dates.groupby((dates.dt.year, dates.dt.month)).nth(3)
Partial output:
2001 1 2001-01-04
2 2001-02-06
3 2001-03-06
4 2001-04-05
5 2001-05-04
6 2001-06-06
7 2001-07-05
8 2001-08-06
9 2001-09-06
10 2001-10-04
11 2001-11-06
12 2001-12-06
2002 1 2002-01-04
2 2002-02-06
3 2002-03-06
4 2002-04-04

Tableau: How to get moving average with respect to day of week in last 4 weeks?

e.g: If I have the data as below:
Week 1 Week2 Week3
S M T W T F S S M T W T F S S M T W T F S
2 5 6 7 5 5 3 4 5 7 2 4 3 2 4 5 2 1 2 7 8
If today is Monday, my average will be (5+5+5)/3 which is 5. Tomorrow it will be (6+7+2)/3 which will be 5 again and day after it will be (7+2+1)/3 which will be 3.33
How to get this in Tableau?
First, you can use "Weekday" as a column or row (by rightclicking on the date).
Then you can simply add a Table Calculation "Moving Average" with a specific computing dimension "Week of [Date]"
=> Table Calculation Specifics <=
=> Result <=
Data source used-: Tableau Sample Superstore.
You can do the following-:
Columns-: Week(Order Date)
Rows-: Weekday(Order date)
Put Sales in text.
Right click sales>Quick Table Calculation>Moving Average
right click Sales>edit quick table calculation>
Set the following
Select Moving along-: "Table across"
Previous values-: 4