How to plot unix timestam - numpy

I have a time serie determined by sec.nsec (unix time?) where a signal is either 0 or 1 and I want to plot it to have a square signal. Currently I have the following code:
from matplotlib.pyplot import *
time = ['1633093403.754783918', '1633093403.755350983', '1633093403.760918965', '1633093403.761298577', '1633093403.761340378', '1633093403.761907443']
data = [1, 0, 1, 0, 1, 0]
plot(time, data)
show()
This plots:
Is there any conversion needed for the time before plotting? I cannot have date:time as this points might have ns to ms between them
Thank you.
EDIT: The values of the list for time are strings

To convert unix timestamp strings to datetime64 you need to fist convert to float, and then convert to datetime64 with the correct units:
time = ['1633093403.754783918', '1633093403.755350983', '1633093403.760918965', '1633093403.761298577', '1633093403.761340378', '1633093403.761907443']
time = (np.asarray(time).astype(float)).astype('datetime64[s]')
print(time.dtype)
print(time)
yields:
datetime64[s]
['2021-10-01T13:03:23' '2021-10-01T13:03:23' '2021-10-01T13:03:23'
Note the nanoseconds have been stripped. If you want to keep those...
time = (np.asarray(time).astype(float)*1e9).astype('datetime64[ns]')
yields:
datetime64[ns]
['2021-10-01T13:03:23.754783744' '2021-10-01T13:03:23.755351040'
'2021-10-01T13:03:23.760918784' '2021-10-01T13:03:23.761298688'
'2021-10-01T13:03:23.761340416' '2021-10-01T13:03:23.761907456']
This all works because datetime64 has the same "epoch" or zero as unix timestamps (1970-01-01T00:00:00.000000)
Once you do this conversion, plotting should work fine.

Related

How to set xticks for the index of string with hvplot

I have a dataframe region_cumulative_df_sel as below:
Month-Day regions RAIN_PERCENTILE_25 RAIN_PERCENTILE_50 RAIN_PERCENTILE_75 RAIN_MEAN RAIN_MEDIAN
07-01 1 0.0611691028 0.2811064720 1.9487996101 1.4330813885 0.2873695195
07-02 1 0.0945720226 0.8130480051 4.5959815979 2.9420840740 1.0614821911
07-03 1 0.2845511734 1.1912839413 5.5803232193 3.7756001949 1.1988518238
07-04 1 0.3402922750 3.2274529934 7.4262523651 5.2195668221 3.2781836987
07-05 1 0.4680584669 5.2418060303 8.6639881134 6.9092760086 5.3968687057
07-06 1 2.4329853058 7.3453550339 10.8091869354 8.7898645401 7.5020875931
... ...
... ...
... ...
06-27 1 382.7809448242 440.1162109375 512.6233520508 466.4956665039 445.0971069336
06-28 1 383.8329162598 446.2222900391 513.2116699219 467.9851379395 451.1973266602
06-29 1 385.7786254883 449.5384826660 513.4027099609 469.5671691895 451.2281188965
06-30 1 386.7952270508 450.6524658203 514.0201416016 471.2863159180 451.2484741211
The index "Month-Day" is a type of String indicating the first day and the last day of a calendar year instead of type of datetime.
I need to use hvplot to develop an interactive plot.
region_cumulative_df_sel.hvplot(width=900)
It is hard to view the labels on the x axis. How can change the xticks to show only 1st of each month, e.g. "07-01", "08-01", "09-01", ... ..., "06-01"?
I tried #Redox code as below:
region_cumulative_df_sel['Month-Day'] = pd.to_datetime(region_cumulative_df_sel['Month-Day'],format="%m-%d") ##Convert to datetime
from bokeh.models.formatters import DatetimeTickFormatter
## Set format for showing x-axis ... you only need days, but in case counts change
formatter = DatetimeTickFormatter(days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
region_cumulative_df_sel.plot(x='Month-Day', xformatter=formatter, y=['RAIN_PERCENTILE_25','RAIN_PERCENTILE_50','RAIN_PERCENTILE_75','RAIN_MEAN','RAIN_MEDIAN'], width=900, ylabel="Rainfall (mm)",
rot=90, title="Cumulative Rainfall")
This is what I have generated.
How can I shift the xticks on the x-axis to align with the Month-Day values. Also the popup window shows "1900" as year for Month-Day column. Can the year segment be removed?
The x-axis data is in string format. So, holoviews thinks this is categorical and plotting every row. You need to convert it to datetime and this will allow the plotting to be in the format you need. I am taking a simple example and showing how to do this... should work in your case as well...
##My month-day column is string - 07-01 07-02 07-03 07-04 ... 12-31
df['Month-Day']=pd.to_datetime(df['Month-Day'],format="%m-%d") ##Convert to datetime
df['myY']=np.random.randint(100, size=(len(df))) ##Random Y data
from bokeh.models.formatters import DatetimeTickFormatter
## Set format for showing x-axis ... you only need days, but in case counts change
formatter = DatetimeTickFormatter(days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
##Plot graph
df.plot(x='Month-Day',xformatter=formatter)#.opts(xticks=4, xrotation=90)
#Redox is on the right track here. The issue is with the way the Month-Day column is converted to a datetime; pandas is assuming the year is 1900 for every row.
Essentially you need to attach a year to the Month-Day in some way.
See the example below, this takes the first month-day string, prepends "2022-" and generates sequential daily values for every row (but there are a few ways of doing this).
code:
import pandas as pd
import numpy as np
import hvplot.pandas
from bokeh.models.formatters import DatetimeTickFormatter
dates = pd.date_range("2021-07-01", "2022-06-30", freq="D")
df = pd.DataFrame({
"md": dates.strftime("%m-%d"),
"ign": np.cumsum(np.random.normal(10, 5, len(dates))),
"sup": np.cumsum(np.random.normal(20, 10, len(dates))),
"imp": np.cumsum(np.random.normal(30, 15, len(dates))),
})
df["time"] = pd.date_range("2021-" + df.md[0], periods=len(df.index), freq="D")
formatter = DatetimeTickFormatter(
days=["%m-%d"], months=["%m-%d"], years=["%m-%d"])
df.hvplot(x='time', xformatter=formatter, y=['ign', 'sup', 'imp'],
width=900, ylabel="Index", rot=90, title="Cumulative ISI")

numpy equivalent of pandas.Timestamp.floor('15min')

I try to calculate floor of datetime64 type pandas series to obtain equivalent of pandas.Timestamp.round('15min') for '1D', '1H', '15min', '5min', '1min' intervals.
I can do it if I convert datetime64 to pandas Timestamp directly:
pd.to_datetime(df.DATA_CZAS.to_numpy()).floor('15min')
But how to do that without conversion to pandas (which is quite slow) ?
Remark, I can't convert datetime64[ns] to int as :
df.time_variable.astype(int)
>>> cannot astype a datetimelike from [datetime64[ns]] to [int32]
type(df.time_variable)
>>> pandas.core.series.Series
df.time_variable.dtypes
>>> dtype('<M8[ns]')
Fortunately, Numpy allows to convert between datetime of different
resolutions and also integers.
So you can use the following code:
result = (a.astype('datetime64[m]').astype(int) // 15 * 15)\
.astype('datetime64[m]').astype('datetime64[s]')
Read the above code in the following sequence:
a.astype('datetime64[m]') - convert to minute resolution (the
number of minutes since the Unix epoch).
.astype(int) - convert to int (the same number of minutes, but as int).
(... // 15 * 15) - divide by 15 with rounding down and multiply
by 15. Just here the rounding appears.
.astype('datetime64[m]') - convert back to datetime (minute
precision).
.astype('datetime64[s]') - convert to the original (second)
presicion (optional).
To test the code I created the following array:
a = np.array(['2007-07-12 01:12:10', '2007-08-13 01:15:12',
'2007-09-14 01:17:16', '2007-10-15 01:30:00'], dtype='datetime64')
The result of my rounding down is:
array(['2007-07-12T01:00:00', '2007-08-13T01:15:00',
'2007-09-14T01:15:00', '2007-10-15T01:30:00'], dtype='datetime64[s]')

multiplying difference between two dates in days by float vectorized form

I have a function which calculates the difference between two dates and then multiplies that by a rate. i would like to use this in a one off example, but also apply to a pd.Series in a vectorized format for large scale calculations. currently it is getting hung up at
(start_date - end_date).days
AttributeError: 'Series' object has no attribute 'days'
pddt = lambda x: pd.to_datetime(x)
def cost(start_date, end_date, cost_per_day)
start_date=pddt(start_date)
end_date=pddt(end_date)
total_days = (end_date-start_date).days
cost = total_days * cost_per_day
return cost
a={'start_date': ['2020-07-01','2020-07-02'], 'end_date': ['2020-07-04','2020-07-10'],'cost_per_day': [2,1.5]}
df = pd.DataFrame.from_dict(a)
costs = cost(a.start_date, a.end_date, a.cost_per_day)
cost_adhoc = cost('2020-07-15', '2020-07-22',3)
if i run it with the series i get the following error
AttributeError: 'Series' object has no attribute 'days'
if I try to correct it by adding .dt.days then when I only use a single input i get the following error
AttributeError: 'Timestamp' object has no attribute 'dt'
you can change the function
total_days = (end_date-start_date) / np.timedelta64(1, 'D')
Assuming both variables are datetime objects, the expression (end_date-start_date) gives you a timedelta object [docs]. It holds time difference as days, seconds, and microseconds. To convert that to days for example, you would use (end_date-start_date).total_seconds()/(24*60*60).
For the given question, the goal is to multiply daily costs with the total number of days. pandas uses a subclass of timedelta (timedelta64[ns] by default) which facilitates getting the total days (no total_seconds() needed), see frequency conversion. All you need to do is change the timedelta to dtype timedelta64[D] (D for daily frequency):
import pandas as pd
df = pd.DataFrame({'start_date': ['2020-07-01', '2020-07-02'],
'end_date': ['2020-07-04', '2020-07-10'],
'cost_per_day': [2, 1.5]})
# make sure dtype is datetime:
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])
# multiply cost/d with total days: end_date-start_date converted to days
df['total_cost'] = df['cost_per_day'] * (df['end_date']-df['start_date']).astype('timedelta64[D]')
# df['total_cost']
# 0 6.0
# 1 12.0
# Name: total_cost, dtype: float64
Note: you don't need to use a pandas.DataFrame here, working with pandas.Series also does the trick. However, since pandas was created for these kind of operations, it brings a lot of convenience. Especially here, you don't need to do any iteration in Python; it's done for you in fast C code.

Future Warning: Passing datetime64-dtype data to TimedeltaIndex is deprecated

I have a dataset of measured values and their corresponding timestamps in the format hh:mm:ss, where hh can be > 24 h.
For machine learning tasks, the data need to be interpolated since there are multiple measured values with different timestamps, respectively.
For resampling and interpolation, I figuered out that the dtype of the index should be in the datetime-format.
For further data-processing and machine learning tasks, I would need the timedelta format again.
Here is some code:
Res_cont = Res_cont.set_index('t_a') #t_a is the column of the timestamps for the measured variable a from a dataframe
#Then, I need to change datetime-format for resampling and interpolation, otherwise timedate are not like 00:15:00, but like 00:15:16 for example
Res_cont.index = pd.to_datetime(Res_cont.index)
#first, upsample to seconds, then interpolate linearly and downsample to 15min steps, lastly
Res_cont = Res_cont.resample('s').interpolate(method='linear').resample('15T').asfreq().dropna()
Res_cont.index = pd.to_timedelta(Res_cont.index) #Here is, where the error ocurred
Unfortunatly, I get the following Error message:
FutureWarning: Passing datetime64-dtype data to TimedeltaIndex is
deprecated, will raise a TypeError in a future version Res_cont =
pd.to_timedelta(Res_cont.index)
So obviously, there is a problem with the last row of my provided code. I would like to know, how to change this code to prevent a Type Error in a future version. Unfortunatly, I don't have any idea how to fix it.
Maybe you can help?
EDIT: Here some arbitrary sample data:
t_a = ['00:00:26', '00:16:16', '00:25:31', '00:36:14', '25:45:44']
a = [0, 1.3, 2.4, 3.8, 4.9]
Res_cont = pd.Series(data = a, index = t_a)
You can use DatetimeIndex.strftime for convert output datetimes to HH:MM:SS format:
t_a = ['00:00:26', '00:16:16', '00:25:31', '00:36:14', '00:45:44']
a = [0, 1, 2, 3, 4]
Res_cont = pd.DataFrame({'t_a':t_a,'a':a})
print (Res_cont)
t_a a
0 00:00:26 0
1 00:16:16 1
2 00:25:31 2
3 00:36:14 3
4 00:45:44 4
Res_cont = Res_cont.set_index('t_a')
Res_cont.index = pd.to_datetime(Res_cont.index)
Res_cont=Res_cont.resample('s').interpolate(method='linear').resample('15T').asfreq().dropna()
Res_cont.index = pd.to_timedelta(Res_cont.index.strftime('%H:%M:%S'))
print (Res_cont)
a
00:15:00 0.920000
00:30:00 2.418351
00:45:00 3.922807

Getting usable dates from Axes.get_xlim() in a pandas time series plot

I'm trying to get the xlimits of a plot as a python datetime object from a time series plot created with pandas. Using ax.get_xlim() returns the axis limits as a numpy.float64, and I can't figure out how to convert the numbers to a usable datetime.
import pandas
from matplotlib import dates
import matplotlib.pyplot as plt
from datetime import datetime
from numpy.random import randn
ts = pandas.Series(randn(10000), index=pandas.date_range('1/1/2000',
periods=10000, freq='H'))
ts.plot()
ax = plt.gca()
ax.set_xlim(datetime(2000,1,1))
d1, d2 = ax.get_xlim()
print "%s(%s) to %s(%s)" % (d1, type(d1), d2, type(d2))
print "Using matplotlib: %s" % dates.num2date(d1)
print "Using datetime: %s" % datetime.fromtimestamp(d1)
which returns:
262968.0 (<type 'numpy.float64'>) to 272967.0 (<type 'numpy.float64'>)
Using matplotlib: 0720-12-25 00:00:00+00:00
Using datetime: 1970-01-03 19:02:48
According to the pandas timeseries docs, pandas uses the numpy.datetime64 dtype. I'm using pandas version '0.9.0'.
I am using get_xlim() instead directly accessing the pandas series because I am using the xlim_changed callback to do other things when the user moves around in the plot area.
Hack to get usable values
For the above example, the limits are returned in hours since the Epoch. So I can convert to seconds since the Epoch and use time.gmtime() to get somewhere usable, but this still doesn't feel right.
In [66]: d1, d2 = ax.get_xlim()
In [67]: time.gmtime(d1*60*60)
Out[67]: time.struct_time(tm_year=2000, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=5, tm_yday=1, tm_isdst=0)
The current behavior of matplotlib.dates:
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25. The helper functions date2num(), num2date() and drange() are used to facilitate easy conversion to and from datetime and numeric ranges.
pandas.tseries.converter.PandasAutoDateFormatter() seems to build on this, so:
x = pandas.date_range(start='01/01/2000', end='01/02/2000')
plt.plot(x, x)
matplotlib.dates.num2date(plt.gca().get_xlim()[0])
gives:
datetime.datetime(2000, 1, 1, 0, 0, tzinfo=<matplotlib.dates._UTC object at 0x7ff73a60f290>)
# First convert to pandas Period
period = pandas.tseries.period.Period(ordinal=int(d1), freq=ax.freq)
# Then convert to pandas timestamp
ts = period.to_timestamp()
# Then convert to date object
dt = ts.to_datetime()