Date serial number and date need to convert in date format - pandas

when I am reading google spreadsheet in dataframe getting data in below format
42836
42837
42838
42844
42845
42846
42849
42850
42851
2/1/2018
2/2/2018
But i need to convert all in date format

IIUC setting up the origin date and using np.where, base on my experience
the origin in Excel is December 30, 1899.
s1=pd.to_datetime(pd.to_numeric(df.date,errors='coerce'),errors='coerce',origin='1899-12-30',unit='D')
s2=pd.to_datetime(df.date,errors='coerce')
df['new']=np.where(df.date.str.contains('/'),s2,s1)
df
Out[282]:
date new
0 42837 2017-04-12
1 42838 2017-04-13
2 42844 2017-04-19
3 42845 2017-04-20
4 42846 2017-04-21
5 42849 2017-04-24
6 42850 2017-04-25
7 42851 2017-04-26
8 2/1/2018 2018-02-01
9 2/2/2018 2018-02-02

Use datetime with timedelta.
base year is 1.1.1900 then add the days as timedelta.
the for loop just shows the first three of your dates.
if you need a different format use strftime("%Y-%m-%d %H:%M:%S", gmtime())
import datetime as dt
date = dt.datetime(1900,1,1)
dates = [42836, 42837, 42838]
for aDay in dates:<br>
print(date+dt.timedelta(days=aDay))

Related

Python: Convert string to datetime, calculate time difference, and select rows with time difference more than 3 days

I have a dataframe that contains two string date columns. First I would like to convert the two column into datetime and calculate the time difference. Then I would like to select rows with a time difference of more than 3 days.
simple df
ID Start End
234 2020-11-16 20:25 2020-11-18 00:10
62 2020-11-02 02:50 2020-11-15 21:56
771 2020-11-17 03:03 2020-11-18 00:10
desired df
ID Start End Time difference
62 2020-11-02 02:50:00 2020-11-15 21:56:00 13 days 19:06:00
Current input
df['End'] = pd.to_datetime(z['End'])
df['Start'] = pd.to_datetime(z['Start'])
df['Time difference'] = df['End'] - df['Start']
How can I select rows that has a time difference of more than 3 days?
Thanks in advance! I appreciate any help on this!!
Your just missing one line, convert to days then query
df[df['Time difference'].dt.days > 3]
ID Start End Time difference
62 2020-11-02 02:50:00 2020-11-15 21:56:00 13 days 19:06:00
df=df.set_index('ID').apply(lambda x: pd.to_datetime(x))#Set ID as index to allow coercing of dates to datetime
df=df.assign(Timedifference =df['End'].sub(df['Start'])).reset_index()#Calculate time difference and reset index
df[df['Timedifference'].dt.days.gt(3)]#Mask a bollean selection to filter youre desired

Pandas - Find difference based on two subsequent rows of Dataframe

I have a Dataframe that captures date when ticket was raised by a customer that is captured in column labelled date. If the ref_column for the current cell is same as the following cell then I need to find difference of aging based on date column current cell and the following cell for the same cust_id. if the ref_column is to the same then I need to find difference of date and ref_date of the same row.
Given below is how my data is:
cust_id,date,ref_column,ref_date
101,15/01/19,abc,31/01/19
101,17/01/19,abc,31/01/19
101,19/01/19,xyz,31/01/19
102,15/01/19,abc,31/01/19
102,21/01/19,klm,31/01/19
102,25/01/19,xyz,31/01/19
103,15/01/19,xyz,31/01/19
Expected output:
cust_id,date,ref_column,ref_date,aging(in days)
101,15/01/19,abc,31/01/19,2
101,17/01/19,abc,31/01/19,14
101,19/01/19,xyz,31/01/19,0
102,15/01/19,abc,31/01/19,16
102,21/01/19,klm,31/01/19,10
102,25/01/19,xyz,31/01/19,0
103,15/01/19,xyz,31/01/19,0
Aging(in days) is 0 for the last entry for a given cust_id
Here's my approach:
# convert dates to datetime type
# ignore if already are
df['date'] = pd.to_datetime(df['date'])
df['ref_date'] = pd.to_datetime(df['ref_date'])
# customer group
groups = df.groupby('cust_id')
# where ref_column is the same with the next:
same_ = df['ref_column'].eq(groups['ref_column'].shift(-1))
# update these ones
df['aging'] = np.where(same_,
-groups['date'].diff(-1).dt.days, # same ref as next row
df['ref_date'].sub(df['date']).dt.days) # diff ref than next row
# update last elements in groups:
last_idx = groups['date'].idxmax()
df.loc[last_idx, 'aging'] = 0
Output:
cust_id date ref_column ref_date aging
0 101 2019-01-15 abc 2019-01-31 2.0
1 101 2019-01-17 abc 2019-01-31 14.0
2 101 2019-01-19 xyz 2019-01-31 0.0
3 102 2019-01-15 abc 2019-01-31 16.0
4 102 2019-01-21 klm 2019-01-31 10.0
5 102 2019-01-25 xyz 2019-01-31 0.0
6 103 2019-01-15 xyz 2019-01-31 0.0

convert to date time format

I have column Monthyear(dtype = object), I want to convert to Date time formate.
I tried below this code, but it is not working.....
AGENT MONTHYEAR
45 SEP-2018
567 AUG-2017
432 APR-2018
Reatiers_Sales_Monthlywises_above_13['MONTHYEARS'] = Reatiers_Sales_Monthlywises_above_13['MONTHYEAR'].apply(lambda x: x.strftime('%B-%Y'))
Reatiers_Sales_Monthlywises_above_13
```
Pls support to convert this object dtype to DateTime
IF you want to keep it in year-month format, you need to convert it to period dtype.
pd.to_datetime(df.MONTHYEAR).dt.to_period('M')
Out[206]:
0 2018-09
1 2017-08
2 2018-04
Name: MONTHYEAR, dtype: period[M]
If you want it in Datetime dtype, it will be in the format of year-month-date
pd.to_datetime(df.MONTHYEAR)
Out[207]:
0 2018-09-01
1 2017-08-01
2 2018-04-01
Name: MONTHYEAR, dtype: datetime64[ns]
Note: strftime in your apply will convert it to string/object dtype, so I don't know whether that is your intention to use it.
Try using dateutil parser
It will convert string into date
NOTE: it adds 03 as a day because current day is 03
from dateutil import parser
df = pd.DataFrame(data={"AGENT":[45,567,432],
"MONTHYEAR":['SEP-2018','AUG-2017','APR-2018']})
df['MONTHYEAR'] = df['MONTHYEAR'].apply(lambda x :parser.parse(str(x)))
AGENT MONTHYEAR
0 45 2018-09-03
1 567 2017-08-03
2 432 2018-04-03

Filtering Pandas column with specific conditions?

I have a pandas dataframe that looks like
Start Time
0 2017-06-23 15:09:32
1 2017-05-25 18:19:03
2 2017-01-04 08:27:49
3 2017-03-06 13:49:38
4 2017-01-17 14:53:07
5 2017-06-26 09:01:20
6 2017-05-26 09:41:44
7 2017-01-21 14:28:38
8 2017-04-20 16:08:51
I want to filter out the ones with month == 06. So it would be the row 1 and 5.
I know how to filter it out for column that has only few categories, but in this case, if it's a date, I need to parse the date and check the month. But I am not sure how to do it with pandas. Please help.
Using
#df['Start Time']=pd.to_datetime(df['Start Time'])
df1=df[df['Start Time'].dt.month==6].copy()

Filtering and comparing dates with Pandas

I would like to know how to filter different dates at all the different time levels, i.e. find dates by year, month, day, hour, minute and/or day. For example, how do I find all dates that happened in 2014 or 2014 in the month of January or only 2nd January 2014 or ...down to the second?
So I have my date and time dataframe generated from pd.to_datetime
df
timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
2 2016-02-04 18:03:10
So if I filter by the year 2014 then I would have as output:
timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
Or as a different example I want to know the dates that happened in 2014 and at the 2nd of each month. This would also result in:
timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
But if I asked for a date that happened on the 2nd of January 2014
timeStamp
0 2014-01-02 21:03:04
How can I achieve this at all the different levels?
Also how do you compare dates at these different levels to create an array of boolean indices?
You can filter your dataframe via boolean indexing like so:
df.loc[df['timeStamp'].dt.year == 2014]
df.loc[df['timeStamp'].dt.month == 5]
df.loc[df['timeStamp'].dt.second == 4]
df.loc[df['timeStamp'] == '2014-01-02']
df.loc[pd.to_datetime(df['timeStamp'].dt.date) == '2014-01-02']
... and so on and so forth.
If you set timestamp as index and dtype as datetime to get a DateTimeIndex, then you can use the following Partial String Indexing syntax:
df['2014'] # gets all 2014
df['2014-01'] # gets all Jan 2014
df['01-02-2014'] # gets all Jan 2, 2014
I would just create a string series, then use str.contains() with wildcards. That will give you whatever granularity you're looking for.
s = df['timeStamp'].map(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))
print(df[s.str.contains('2014-..-.. ..:..:..')])
print(df[s.str.contains('2014-..-02 ..:..:..')])
print(df[s.str.contains('....-02-.. ..:..:..')])
print(df[s.str.contains('....-..-.. 18:03:10')])
Output:
timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
timeStamp
0 2014-01-02 21:03:04
1 2014-02-02 21:03:05
timeStamp
1 2014-02-02 21:03:05
2 2016-02-04 18:03:10
timeStamp
2 2016-02-04 18:03:10
I think this also solves your question about boolean indices:
print(s.str.contains('....-..-.. 18:03:10'))
Output:
0 False
1 False
2 True
Name: timeStamp, dtype: bool