Reading ISO 8601 in CSV using Pandas - pandas

This CSV has dates in ISO 8601 time.
0 2014-01-01T00:00:00.000
1 2014-01-01T00:46:43.000
2 2014-01-01T01:33:26.001
I want to select the rows up until January 2. I'm not sure how to do this. I thought including
parse_dates=True
would allow me to refer to the date/time values directly like this:
sat0 = sat0[sat0['epoch']<2014-01-02T00:00:00.000]
but it's not working.

You can use pd.to_datetime() in converting the target date string column by its appropriate format. In this case, you can use '%Y-%m-%dT%H:%M:%S.%f' as the date format.
Here's the implementation in your case:
import pandas as pd
sat0 = pd.DataFrame({
'epoch': [
'2014-01-01T00:00:00.000',
'2014-01-01T00:46:43.000',
'2014-03-01T01:33:26.001',
]})
date_fmt = '%Y-%m-%dT%H:%M:%S.%f'
sat0['epoch'] = pd.to_datetime(sat0['epoch'], format=date_fmt)
sat0 = sat0[sat0['epoch'] < '2014-01-02T00:00:00.000']
Which outputs
epoch
0 2014-01-01 00:00:00
1 2014-01-01 00:46:43

Related

pandas date conversion with same format of dd/mm/yyyy [duplicate]

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like:
12/01/2012
30/01/2012
The examples above represent 12 January 2012 and 30 January 2012.
When I import this data with pandas version 0.11.0 I applied the following transformation:
import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)
but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.
After looking at this question I tried:
cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')
but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?
You can use the parse_dates option from read_csv to do the conversion directly while reading you data.
The trick here is to use dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html
When your dates have to be the index:
>>> import pandas as pd
>>> from StringIO import StringIO
>>> s = StringIO("""date,value
... 12/01/2012,1
... 12/01/2012,2
... 30/01/2012,3""")
>>>
>>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True)
value
date
2012-01-12 1
2012-01-12 2
2012-01-30 3
Or when your dates are just in a certain column:
>>> s = StringIO("""date
... 12/01/2012
... 12/01/2012
... 30/01/2012""")
>>>
>>> pd.read_csv(s, parse_dates=[0], dayfirst=True)
date
0 2012-01-12 00:00:00
1 2012-01-12 00:00:00
2 2012-01-30 00:00:00
I think you are calling it correctly, and I posted this as an issue on github.
You can just specify the format to to_datetime directly, for example:
In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])
In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00
dtype: datetime64[ns]
Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True (which works with NaN too):
s.apply(pd.to_datetime, dayfirst=True)
Worth noting that have to be careful using dayfirst (which is easier than specifying the exact format), since dayfirst isn't strict.

How can convert string to date which only contains year number?

Create a dataframe whose first column is a text.
import pandas as pd
values = {'dates': ['2019','2020','2021'],
'price': [11,12,13]
}
df = pd.DataFrame(values, columns = ['dates','price'])
Check the dtypes:
df.dtypes
dates object
price int64
dtype: object
Convert type in the column dates to type dates.
df['dates'] = pd.to_datetime(df['dates'], format='%Y')
df
dates price
0 2019-01-01 11
1 2020-01-01 12
2 2021-01-01 13
I want to convert the type in dates column to date and the dates in the following format----contains only year number:
dates price
0 2019 11
1 2020 12
2 2021 13
How can achieve the target?
If you choose to have the datetime format for your columns, it is likely to benefit from it. What you see in the column ("2019-01-01") is a representation of the datetime object. The realquestion here is, why do you need to have a datetime object?
Actually, I don't care about datetime type:
Use a string ('2019'), or preferentially an integer (2019) which will enable you to perform sorting, calculations, etc.
I need the datetime type but I really want to see only the year:
Use style to format your column while retaining the underlying type:
df.style.format({'dates': lambda t: t.strftime('%Y')})
This will allow you to keep the type while having a clean visual format

convert pandas datetime64[ns] to julian day

I am confused by the number of data type conversions and seemingly very different solutions to this, none of which I can get to work.
What is the best way to convert a pandas datetime column (datetime64[ns] eg 2017-01-01 03:15:00) to another column in the same pandas dataframe, converted to julian day eg 2458971.8234259?
Many thanks
Create DatetimeIndex and convert to julian dates:
df = pd.DataFrame({'dates':['2017-01-01 03:15:00','2017-01-01 03:15:00']})
df['dates'] = pd.to_datetime(df['dates'])
df['jul1'] = pd.DatetimeIndex(df['dates']).to_julian_date()
#if need remove times
df['jul2'] = pd.DatetimeIndex(df['dates']).floor('d').to_julian_date()
print (df)
dates jul1 jul2
0 2017-01-01 03:15:00 2.457755e+06 2457754.5
1 2017-01-01 03:15:00 2.457755e+06 2457754.5
Because:
df['jul'] = df['dates'].dt.to_julian_date()
AttributeError: 'DatetimeProperties' object has no attribute 'to_julian_date'

Format time data pandas

I have dates in this format: 2015-02-02 14:19:00.
I use this code:
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y/%m/%d %H:%M:%S')
df = pd.read_csv('3df_uniti.csv', parse_dates=True, index_col='date', date_parser=dateparse)
df.head()
but it doesn't work because it gives me the follow error:
time data does not match format
Can you help me to set the right format?
Your format uses / instead of -. Try changing it to %Y-%m-%d %H:%M:%S.

pandas string to date type conversion in proper format

I am getting date data in string format in pandas like 10-Oct,11-Oct but i want to make it date data type like this format 2019-10-10,2019-10-11
is there any easy way available in pandas?
Use to_datetime with added year and parameter format:
df = pd.DataFrame({'date':['10-Oct', '11-Oct']})
df['date'] = pd.to_datetime(df['date'] + '-2019', format='%d-%b-%Y')
print (df)
date
0 2019-10-10
1 2019-10-11