Panda DF Convert All Dates to YYYY-MM-DD format - pandas

i have data that looks like this stored in a DF and I'm trying to convert the "DATE" column so that all the dates are in the format of yyyy-mm-dd format instead of yyyy-dd-mm as you can see when the date changes by the "TIME" column to a new day (some of the dates not shown are already set to the YYYY-MM-DD format but I'm trying to change all of them to the YYYY-MM-DD format):
DATE TIME BAFFIN BAY GATUN II GATUN I KLONDIKE IIIG \
8778 2016-01-01 1900 8.926278 8.046583 7.649784 7.333993
8779 2016-01-01 2000 8.817666 4.395097 4.748931 6.672631
8780 2016-01-01 2100 8.704014 6.384826 7.128692 6.115349
8781 2016-01-01 2200 8.496358 8.261933 8.166153 6.242737
8782 2016-01-01 2300 8.434297 4.656991 5.894877 5.781445
8783 2016-02-01 0000 8.528372 3.056838 3.086056 5.023564
8784 2016-02-01 0100 8.783731 4.614589 4.894076 5.042875
8785 2016-02-01 0200 8.572500 3.860174 4.641366 5.174426
8786 2016-02-01 0300 8.279557 2.076971 2.644479 5.492729
8787 2016-02-01 0400 8.378920 3.562210 2.806703 5.356025
I'm trying to set it the "DATE" column to a datetime column with specifying the format but it does nothing:
df2['DATE'] = pd.to_datetime(df2['DATE'],format='%Y-%m-%d')
thank you in advance for your help!

Can you try this
pd.to_datetime(df['TIME'], dayfirst=True)
0 2016-01-01
1 2016-01-01
2 2016-01-01
3 2016-01-01
4 2016-01-01
5 2016-01-02
6 2016-01-02
7 2016-01-02
8 2016-01-02
9 2016-01-02

consider joining 'DATE' and 'TIME' to get a complete datetime column. Assuming both columns are of dtype obj (string), you can combine them using the + operator and then call pd.to_datetime with a specified format. Ex:
import pandas as pd
df = pd.DataFrame({'DATE': ['2016-01-01', '2016-02-01'],
'TIME': ['1900', '0000']})
df['DateTime'] = pd.to_datetime(df['DATE']+df['TIME'], format='%Y-%d-%m%H%M')
# df['DateTime']
# 0 2016-01-01 19:00:00
# 1 2016-01-02 00:00:00
# Name: DateTime, dtype: datetime64[ns]

Related

How to convert numbers in an hour column to actual hours

I have an 'hour' column in a pandas dataframe that is simply a list of numbers from 0 to 23 representing hours. How can I convert them to an hour format such as 01:00 when the numbers are single digit ( like 1 ) and double digit (like 18)? The single digit numbers need to have a leading zero, a colon and two trailing zeros. The double digit numbers need only a colon and two trailing zeros. How can this be accomplished in a dataframe? Also, I have a 'date' column that needs to merge with the hour column after the hour column is converted.
e.g. date hour
2018-07-01 0
2018-07-01 1
2018-07-01 3
...
2018-07-01 21
2018-07-01 22
2018-07-01 23
Needs to look like:
date
2018-07-01 01:00
...
2018-07-01 23:00
The source of the data is a .csv file.
Thanks for your consideration. I'm new to pandas and I can't find in their documentation how to do this considering the single and double digit numbers.
Convert hours to timedeltas by to_timedelta and add to datetimes converted by to_datetime if necessary:
df['date'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')
print (df)
date hour
0 2018-07-01 00:00:00 0
1 2018-07-01 01:00:00 1
2 2018-07-01 03:00:00 3
3 2018-07-01 21:00:00 21
4 2018-07-01 22:00:00 22
5 2018-07-01 23:00:00 23
If need also remove hour column use DataFrame.pop
df['date'] = pd.to_datetime(df['date']) + pd.to_timedelta(df.pop('hour'), unit='h')
print (df)
date
0 2018-07-01 00:00:00
1 2018-07-01 01:00:00
2 2018-07-01 03:00:00
3 2018-07-01 21:00:00
4 2018-07-01 22:00:00
5 2018-07-01 23:00:00

PANDAS: Converting all datetimes in column to another format

I have a Pandas dataframe containing a datetime column, in which all the values are formatted like this:
25/09/15 12:00:00. I'd like to reformat this field in all the rows, in order to match this format: 25.09.15 12:00.
Here some sample data:
Date | Value
25/08/15 12:00:00 | 49.0
25/08/15 13:00:00 | 49.5
The date column datatype is string.
Thank you in advance
Use Series.dt.strftime to format datetime
df
Date Value
0 2015-08-25 12:00:00 49.0
1 2015-08-25 13:00:00 49.5
df['Date'] = df['Date'].dt.strftime('%Y.%m.%d %H:%M')
df
Date Value
0 2015.08.25 12:00 49.0
1 2015.08.25 13:00 49.5
if column type is str than you need to convert first to datetime
df.Date = pd.to_datetime(df.Date)

Pandas: How to convert datetime convert to %H:%H and stays as datetime format?

I have a dataframe in 1 column with all different times.
Time
-----
10:00
11:30
12:30
14:10
...
I need to do a quantile range on this dataframe with the code below:
df.quantile([0,0.5,1],numeric_only=False)
Following the link below, the quantile does work.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html
As my column as in object, I need to convert to pd.datetime or pd.Timestamp.
When I convert to pd.datetime, I will have all my time inserted with dates too.
If I format it to %H:%M, the column turns back to object which cannot work with quantile under numeric_only mode.
How can I convert to datetime format in %H:%M and still stick to datetime format?
Below was the code I used:
df = pd.DataFrame({"Time":["10:10","09:10","12:00","13:23","15:23","17:00","17:30"]})
df['Time2'] = pd.to_datetime(df['Time']).dt.strftime('%H:%M')
df['Time2'] = df['Time2'].astype('datetime64[ns]')
How can I convert to datetime format in %H:%M and still stick to datetime format?
Impossible in pandas, maybe closer is use timedeltas:
df = pd.DataFrame({"Time":["10:10","09:10","12:00","13:23","15:23","17:00","17:30"]})
df['Time2'] = pd.to_timedelta(df['Time'].add(':00'))
print (df)
Time Time2
0 10:10 10:10:00
1 09:10 09:10:00
2 12:00 12:00:00
3 13:23 13:23:00
4 15:23 15:23:00
5 17:00 17:00:00
6 17:30 17:30:00

Pandas Compare - How to compare 2 date columns in 2 separate dataframes

I have once csv with missing dates, I have created a new df of that same date range, without the missing dates. I want to compare the two csvs and place an NaN wherever there are blank dates in the original csv:
Example:
DateTime Measurement Dates
0 2016-10-09 00:00:00 1021.9 2016-10-09
1 2016-10-11 00:00:00 1019.9 2016-10-10
2 2016-10-12 00:00:00 1015.8 2016-10-11
3 2016-10-13 00:00:00 1013.2 2016-10-12
4 2016-10-14 00:00:00 1005.9 2016-10-13
so I want the new table to be:
DateTime Measurement Dates
0 2016-10-09 00:00:00 1021.9 2016-10-09
1 Nan 00:00:00 Nan 2016-10-10
2 2016-10-11 00:00:00 1015.8 2016-10-11
3 2016-10-12 00:00:00 1013.2 2016-10-12
4 2016-10-13 00:00:00 1005.9 2016-10-13
And then I will remove the DateTime column so the final df is a complete list of dates with the missing measurements.
The code I have used thus far:
new_dates = pandas.date_range(start = '2016-10-09 00:00:00', end = '2017-10-09 00:00:00')
merged = pandas.merge(measurements, updated_dates,left_index=True, right_index=True)
If I understand your correctly you want to resample your DateTime column to a daily frequency and fill the gaps with NaN:
# Use this line if your DateTime column is not datetime type yet
# df['DateTime'] = pd.to_datetime(df['DateTime'])
dates = pd.date_range(df['DateTime'].min(), df['DateTime'].max(), freq='D')
df = df.set_index('DateTime').reindex(dates).reset_index()
Output
index Measurement
0 2016-10-09 1021.9
1 2016-10-10 NaN
2 2016-10-11 1019.9
3 2016-10-12 1015.8
4 2016-10-13 1013.2
5 2016-10-14 1005.9
If you have unique dates, you can use resample as well. If your dates are not unique it would aggregate them and take the mean of two dates:
df.set_index('DateTime').resample('D').mean()
Output
DateTime Measurement
0 2016-10-09 1021.9
1 2016-10-10 NaN
2 2016-10-11 1019.9
3 2016-10-12 1015.8
4 2016-10-13 1013.2
5 2016-10-14 1005.9

pandas reindex fill in missing dates

I have a dataframe with an index of dates. Each data is the first of the month. I want to fill in all missing dates in the index at a daily level.
I thought this should work:
daily=pd.date_range('2016-01-01', '2018-01-01', freq='D')
df=df.reindex(daily)
But it's returning NA in rows that should have data in (1st of the month dates) Can anyone see the issue?
Use reindex with parameter method='ffill' or resample with ffill for more general solution, because is not necessary create new index by date_range:
df = pd.DataFrame({'a': range(13)},
index=pd.date_range('2016-01-01', '2017-01-01', freq='MS'))
print (df)
a
2016-01-01 0
2016-02-01 1
2016-03-01 2
2016-04-01 3
2016-05-01 4
2016-06-01 5
2016-07-01 6
2016-08-01 7
2016-09-01 8
2016-10-01 9
2016-11-01 10
2016-12-01 11
2017-01-01 12
daily=pd.date_range('2016-01-01', '2018-01-01', freq='D')
df1 = df.reindex(daily, method='ffill')
Another solution:
df1 = df.resample('D').ffill()
print (df1.head())
a
2016-01-01 0
2016-01-02 0
2016-01-03 0
2016-01-04 0
2016-01-05 0