Xarray datetime to ordinal - pandas

In pandas there is a toordinal function to convert the datetime to ordinal, such as:Convert date to ordinal python? or Pandas datetime column to ordinal. I have an xarray dataarray with time coordinate that I want to convert it to ordinal. Is there similar panda's toordinal to do it in xarray?
sample:
Coordinates:
time
array(['2019-07-31T10:00:00.000000000', '2019-07-31T10:15:00.000000000',
'2019-07-31T10:30:00.000000000', '2019-07-31T10:45:00.000000000',
'2019-07-31T11:00:00.000000000', '2019-07-31T11:15:00.000000000',
'2019-07-31T11:30:00.000000000', '2019-07-31T11:45:00.000000000',
'2019-07-31T12:00:00.000000000'], dtype='datetime64[ns]')

I didn't find a xarray-native way to do it.
But, you can work around it by converting the time values to datetime objects, on which you can then use toordinal:
import pandas as pd
import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
time_ordinal = [pd.to_datetime(x).toordinal() for x in ds.time.values]
print(time_ordinal[:5])
# [734869, 734869, 734869, 734869, 734870]

Related

how can i get mean value of str type in a dataframe in Pandas

I have a DataFrame from pandas:
i want to get a mean value of "stop_duration" for each "violation_raw".
How can i do it if column "stop_duration" is object type
df = enter code herepd.read_csv('police.csv', parse_dates=['stop_date'])
df[['stop_date', 'violation_raw','stop_duration']]
My table:
the table
Use to_datetime function to convert object to datetime. Also specifying a format to match your data.
import pandas as pd
df["column"] = pd.to_datetime(df["column"], format="%M-%S Min")

dataframe python converting to weekday from year, month, day

I am trying to add new Dataframe column by manipulating other cols.
import pandas as pd
import numpy as np
from pandas import DataFrame, read_csv
from pandas import read_csv
import datetime
df = pd.read_csv('PRSA_data_2010.1.1-2014.12.31.csv')
df.head()
When I am trying to manipulate
df['weekday']= np.int(datetime.datetime(df.year, df.month, df.day).weekday())
I am keep getting error cannot convert the series to class 'int'.
Can anyone tell me a reason behind this and how I can fix it?
Thanks in advance!
Convert columns to datetimes and then to weekdays by Series.dt.weekday:
df['weekday'] = pd.to_datetime(df[['year', 'month', 'day']].dt.weekday
Or convert columns to datetime column in read_csv:
df = pd.read_csv('PRSA_data_2010.1.1-2014.12.31.csv',
date_parser=lambda y,m,d: y + '-' + m + '-' + d,
parse_dates={'datetimes':['year','month','day']})
df['weekday'] = df['datetimes'].dt.weekday

How do you append a column and drop a column with pandas dataframes? Can't figure out why it won't print the dataframe afterwards

The DataFrame that I am working with has a datetime object that I changed to a date object. I attempted to append the date object to be the last column in the DataFrame. I also wanted to drop the datetime object column.
Both the append and drop operations don't work as expected. Nothing prints out afterwards. It should print the entire DataFrame (shortened it is long).
My code:
import pandas as pd
import numpy as np
df7=pd.read_csv('kc_house_data.csv')
print(df7)
mydates = pd.to_datetime(df7['date']).dt.date
print(mydates)
df7.append(mydates)
df7.drop(['date'], axis=1)
print(df7)
Why drop/append? You can overwrite
df7['date'] = pd.to_datetime(df7['date']).dt.date
import pandas as pd
import numpy as np
# read csv, convert column type
df7=pd.read_csv('kc_house_data.csv')
df7['date'] = pd.to_datetime(df7['date']).dt.date
print(df7)
Drop a column using df7.drop('date', axis=1, inplace=True).
Append a column using df7['date'] = mydates.

convert numpy.datetime64 into epoch time

I am trying to convert my numpy array new_feat_dt containing numpy.datetime64 into epoch time. I want to make sure when the conversion happens the date stays in utc format?
I am using numpy 1.16.4 and python3.6
I have tried two ways of conversion as shown in code below.
import numpy as np
new_feat_dt = [np.datetime64('2019-07-25T14:23:01'), np.datetime64('2019-07-25T14:25:01'), np.datetime64('2019-07-25T14:27:01')]
final= [(x - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's') for x in new_feat_dt]
print (final)
print(type(final[0]))
final2= [np.datetime64(x,'s').astype(int) for x in new_feat_dt]
print (final2)
print(type(final2[0]))
Output of the above code:
[1564064581.0, 1564064701.0, 1564064821.0]
<class 'numpy.float64'>
[1564064581, 1564064701, 1564064821]
<class 'numpy.int32'>
The above is happening because the times in new_feat_dt array is considered as GMT. I want it to be considered as my local time which is ('US/Eastern').
The correct conversion should be this:
[1564078981,1564079101,1564079221]
The numpy.datetime64 is a timezone naive datetime type. To add timezone information into the datetime, try use python's datetime with pytz module.
import numpy as np
import pytz
from datetime import datetime
new_feat_dt = [np.datetime64('2019-07-25T14:23:01'), np.datetime64('2019-07-25T14:25:01'), np.datetime64('2019-07-25T14:27:01')]
eastern = pytz.timezone('US/Eastern')
final = [int(eastern.localize(dt.astype(datetime)).timestamp()) for dt in new_feat_dt]
print(final)
The output:
[1564078981, 1564079101, 1564079221]
It's probably better to initialize all your new_feat_dt using datetime.datetime.

Getting usable dates from Axes.get_xlim() in a pandas time series plot

I'm trying to get the xlimits of a plot as a python datetime object from a time series plot created with pandas. Using ax.get_xlim() returns the axis limits as a numpy.float64, and I can't figure out how to convert the numbers to a usable datetime.
import pandas
from matplotlib import dates
import matplotlib.pyplot as plt
from datetime import datetime
from numpy.random import randn
ts = pandas.Series(randn(10000), index=pandas.date_range('1/1/2000',
periods=10000, freq='H'))
ts.plot()
ax = plt.gca()
ax.set_xlim(datetime(2000,1,1))
d1, d2 = ax.get_xlim()
print "%s(%s) to %s(%s)" % (d1, type(d1), d2, type(d2))
print "Using matplotlib: %s" % dates.num2date(d1)
print "Using datetime: %s" % datetime.fromtimestamp(d1)
which returns:
262968.0 (<type 'numpy.float64'>) to 272967.0 (<type 'numpy.float64'>)
Using matplotlib: 0720-12-25 00:00:00+00:00
Using datetime: 1970-01-03 19:02:48
According to the pandas timeseries docs, pandas uses the numpy.datetime64 dtype. I'm using pandas version '0.9.0'.
I am using get_xlim() instead directly accessing the pandas series because I am using the xlim_changed callback to do other things when the user moves around in the plot area.
Hack to get usable values
For the above example, the limits are returned in hours since the Epoch. So I can convert to seconds since the Epoch and use time.gmtime() to get somewhere usable, but this still doesn't feel right.
In [66]: d1, d2 = ax.get_xlim()
In [67]: time.gmtime(d1*60*60)
Out[67]: time.struct_time(tm_year=2000, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=1, tm_isdst=0)
The current behavior of matplotlib.dates:
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25. The helper functions date2num(), num2date() and drange() are used to facilitate easy conversion to and from datetime and numeric ranges.
pandas.tseries.converter.PandasAutoDateFormatter() seems to build on this, so:
x = pandas.date_range(start='01/01/2000', end='01/02/2000')
plt.plot(x, x)
matplotlib.dates.num2date(plt.gca().get_xlim()[0])
gives:
datetime.datetime(2000, 1, 1, 0, 0, tzinfo=<matplotlib.dates._UTC object at 0x7ff73a60f290>)
# First convert to pandas Period
period = pandas.tseries.period.Period(ordinal=int(d1), freq=ax.freq)
# Then convert to pandas timestamp
ts = period.to_timestamp()
# Then convert to date object
dt = ts.to_datetime()