Converting to correct data format - pandas

I need your help guys
I have information with wrong time format.
For example:
it shows 1245 or 1837 etc. I want them to be like in correct format:
12:45 PM or 6:37 PM.
How can I convert it?
Thanks!

I think you need convert to_datetime and then strftime or dt.time:
See also http://strftime.org/.
df = pd.DataFrame({'date':[1245, 1837]})
print (df)
date
0 1245
1 1837
print (pd.to_datetime(df['date'], format='%H%M'))
0 1900-01-01 12:45:00
1 1900-01-01 18:37:00
Name: date, dtype: datetime64[ns]
#for string output
print (pd.to_datetime(df['date'], format='%H%M').dt.strftime('%I:%M %p'))
0 12:45 PM
1 06:37 PM
Name: date, dtype: object
#for time output
print (pd.to_datetime(df['date'], format='%H%M').dt.time)
0 12:45:00
1 18:37:00
Name: date, dtype: object

Related

Python: Mixed date format in data frame column

I have a dataframe with mixed date formats across and within columns. When trying to convert them from object to datetime type, I get an error due to column date1 having a mixed format. I can't see how to fix it in this case. Also, how could I remove the seconds from both columns (date1 and date2)?
Here's the code I attempted:
df = pd.DataFrame(np.array([[10, "2021-06-13 12:08:52.311 UTC", "2021-03-29 12:44:33.468"],
[36, "2019-12-07 12:18:02 UTC", "2011-10-15 10:14:32.118"]
]),
columns=['col1', 'date1', 'date2'])
df
>>
col1 date1 date2
0 10 2021-06-13 12:08:52.311 UTC 2021-03-29 12:44:33.468
1 36 2019-12-07 12:18:02 UTC 2011-10-15 10:14:32.118
# Converting from object to datetime
df["date1"]= pd.to_datetime(df["date1"], format="%Y-%m-%d %H:%M:%S.%f UTC")
df["date2"]= pd.to_datetime(df["date2"], format="%Y-%m-%d %H:%M:%S.%f")
>>
ValueError: time data '2019-12-07 12:18:02 UTC' does not match format '%Y-%m-%d %H:%M:%S.%f UTC' (match)
for conversion to datetime, i found the infer_datetime_format to be helpful.
could not get it to work on the complete dataframe, it is able to convert one column at a time.
In [19]: pd.to_datetime(df["date1"], infer_datetime_format=True)
Out[19]:
0 2021-06-13 12:08:52.311000+00:00
1 2019-12-07 12:18:02+00:00
Name: date1, dtype: datetime64[ns, UTC]
In [20]: pd.to_datetime(df["date2"], infer_datetime_format=True)
Out[20]:
0 2021-03-29 12:44:33.468
1 2011-10-15 10:14:32.118
Name: date2, dtype: datetime64[ns]
If atleast all formats start with this format "%Y-%m-%d %H:%M" , then you can just slice all strings till that point and use them
In [32]: df['date1'].str.slice(stop=16)
Out[32]:
0 2021-06-13 12:08
1 2019-12-07 12:18
Name: date1, dtype: object
for getting rid of the seconds in your datetime values, instead of simply getting rid of those values, you can use round , you can also check floor and ceil whatever suits your use case better.
In [28]: pd.to_datetime(df["date1"], infer_datetime_format=True).dt.round('T')
Out[28]:
0 2021-06-13 12:09:00+00:00
1 2019-12-07 12:18:00+00:00
Name: date1, dtype: datetime64[ns, UTC]
In [29]: pd.to_datetime(df["date2"], infer_datetime_format=True).dt.round('T')
Out[29]:
0 2021-03-29 12:45:00
1 2011-10-15 10:15:00
Name: date2, dtype: datetime64[ns]

change multiple date time formats to single format in pandas dataframe

I have a DataFrame with multiple formats as shown below
0 07-04-2021
1 06-03-1991
2 12-10-2020
3 07/04/2021
4 05/12/1996
What I want is to have one format after applying the Pandas function to the entire column so that all the dates are in the format
date/month/year
What I tried is the following
date1 = pd.to_datetime(df['Date_Reported'], errors='coerce', format='%d/%m/%Y')
But it is not working out. Can this be done? Thank you
try with dayfirst=True:
date1=pd.to_datetime(df['Date_Reported'], errors='coerce',dayfirst=True)
output of date1:
0 2021-04-07
1 1991-03-06
2 2020-10-12
3 2021-04-07
4 1996-12-05
Name: Date_Reported, dtype: datetime64[ns]
If needed:
date1=date1.dt.strftime('%d/%m/%Y')
output of date1:
0 07/04/2021
1 06/03/1991
2 12/10/2020
3 07/04/2021
4 05/12/1996
Name: Date_Reported, dtype: object

Date object and time integer to datetime

All, I have a dataframe with a date column and an hour column. I am trying to combine those into a single timestamp. I tried many solutions available using datetime.datetime.combine and just implicitly extracting month day and year and creating a datetime stamp with it but all lead to some error.
idOnController date eventTime Energy hour
0 5014 2018-05-31 2018-05-31 01:00:00 26.619 0
2 5014 2018-06-02 2018-06-02 02:00:00 29.251 0
3 5014 2018-06-03 2018-06-03 03:00:00 30.635 0
The datatypes are as follows
idOnController int64
date object
eventTime datetime64[ns]
Energy float64
hour int64
dtype: object
I am looking to combine date and hour into a timestamp that looks like eventTime and then replace eventTime with that value.
You can do:
df['new_date'] = pd.to_datetime(df['date']) + df['hour'] * pd.to_timedelta('1H')
Output of df.dtypes:
idOnController int64
date object
eventTime datetime64[ns]
Energy float64
hour int64
new_date datetime64[ns]
dtype: object
If you want to have the string timestamps you can do
df['new_date'] = df['new_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
Another way of doing this would be (a bit more verbose though!):
df['date'] = pd.to_datetime(df['date'])
df['year'] = df.date.dt.year
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day
df['date'] = pd.to_datetime(df[['year','month','day','hour']])

Add random datetimes to timestamps

I have a column of timestamps that span over 24 hours. I want to convert these to differentiate between days. I've done this by converting to timedelta. The result is displayed below.
The question I have is, can these be converted or re-arranged again to provide random datetimes. e.g. dd:mm:yyyy hh:mm:ss.
import pandas as pd
df = pd.DataFrame({
'Time' : ['8:00','18:00','28:00'],
})
df['Time'] = [x + ':00' for x in df['Time']]
df['Time'] = pd.to_timedelta(df['Time'])
Out:
Time
0 0 days 08:00:00
1 0 days 18:00:00
2 1 days 04:00:00
Intended Output:
Time
0 1/01/1904 08:00:00 AM
1 1/01/1904 18:00:00 PM
2 2/01/1904 04:00:00 AM
The input timestamps will never go over more than 2 days. Is there a package that can achieve this or would a dummy start and end dates.
After you convert the Time just adding the date part
df.Time+pd.to_datetime('1904-01-01')
0 1904-01-01 08:00:00
1 1904-01-01 18:00:00
2 1904-01-02 04:00:00
Name: Time, dtype: datetime64[ns]

Combine date column and time column into datetime

I have two columns (both text objects), one date, the other hour-ending.
df = pd.DataFrame({'Date' : ['2018-10-01', '2018-10-01', '2018-10-01'],
'Hour_Ending': ['1.0', '2.0', '3.0']})
How do I add the two columns together to get a datetime object that looks like this?
2018-10-01 01:00
As a bonus, how do I change Hour_Ending to Hour_Starting?
Using to_datetime and Timedelta
pd.to_datetime(df.Date)+pd.to_timedelta(df.Hour_Ending.astype('float'), unit='h')
Out[122]:
0 2018-10-01 01:00:00
1 2018-10-01 02:00:00
2 2018-10-01 03:00:00
dtype: datetime64[ns]