The best approach to log total values with RRDtool? - graphing

I am developing a graph system for energy consumption. The script is pulling the current Kwh of an energymeter and is then pushed to a rrd database with DS COUNTER. I have tried to use VDEF to count the total of an DEF with average but I cant seem to find a good way to make "steps" with this approach.
My goal is to to be able to fetch data for one week from database and then calculate the total of each day. I can not seem to find the best way to do this. Is RRDtool the propel way to do this? If so, what is the correct approach?

RRDTool works best when recording a rate. Therefore, by using a COUNTER DS type, you can change your periodic kWh total to a point-in-time W value in the database.
For example, if you are sampling every minute, then you can multiply your kWh by 3600000 (to get Watt-seconds, or Joules) and have RRDTool use a COUNTER type to convert this to a per-second rate, or Watts (it is always better to store the data in SI units for clarity and simplicity).
You can then display or calculate this by multiplying by the step size.
For example, if you wish to see a graph of kWh used per day, you need to have a 1-day granularity RRA defined. Then, you can fetch the data from this (either using rrdtool fetch, or rrdtool graph with a specific data granularity) and have a calculated value obtained by multiplying the average Watts by the Interval, and then dividing by 3600000 to get kWh.
This creates an RRD file with a 1-minute sample rate, to store watts.
rrdtool create power.rrd --step 60 \
DS:watts:COUNTER:120:0: \
RRA:AVERAGE:0.5:1:86400 \
RRA:AVERAGE:0.5:1440:400
This will update the RRD, given your latest sample of kWh
JOULES=`expr $KWH \* 3600000`
rrdtool update power.rrd N:$JOULES
This will graph the daily usage, in kWh, over a month
rrdtool graph graph.png \
--end now --start "end-1month" --step 86400 \
--title "Power usage per day" --vertical-label "kWh" \
--units-exponent 0 \
DEF:watts=power.rrd:watts:AVERAGE:step=86400 \
CDEF:kwh=watts,STEPWIDTH,*,3600,/,1000,/ \
VDEF:totkwh=watts,TOTAL,3600000,/ \
GRPINT:totkwh:"Total kWh this month: %.0lf kWh"
You can, of course, modify this to your needs. You may want different RRAs, or graphing steps, or summarisation.

Related

Difficulty With Forecasting Using Weekly Data With fbprophet

I have a CSV with time-series datapoints at 10-minute intervals for the entire month of November that I am looking to train a FB Prophet model with. The data is very seasonal, on a weekly basis, and Saturday and Sunday consistently have lower values than their weekday counterparts. This is what the data (always non-negative) looks like with the fbprophet y_hat values:
This is what I am doing to get the model:
df = pd.read_csv('./input.csv');
m = Prophet(weekly_seasonality=False)
m.add_seasonality('weekly', period=7, fourier_order=3)
m.fit(df)
future = m.make_future_dataframe(periods=10)
forecast = m.predict(future)
fig = m.plot(forecast).savefig('out.png')
However, the 10-period forecast that is generated looks terrible, especially considered there are negative values in the training data:
I have tried adjusting the period= and fourier_order= values, as well as various changepoint_prior_scale= values, but the forecasts are nowhere near the training data.
With Prophet(changepoint_prior_scale=0.50):
With m.add_seasonality('weekly', period=5, fourier_order=5):
What should I begin to try next in order to get predictions that are not wildly different than the trained data? I believe that the issue is based in the fact that I have a 10-minute interval for my data points. So, when I try to increase the period to the number of 10-minute periods in a 7-day week (1,008), I get the following:
I am looking for more information related to how I can use Prophet to more-accurately forecast this data, but I believe I am missing some information when it comes to how to train the model. Any help would be greatly appreciated!

How can I combine two time-series datasets with different time-steps?

I want to train a Multivariate LSTM model by using data from 2 datasets MIMIC-1.0 and MIMIC-3. The problem is that the vital signs recorded in the first data set is minute by minute while in MIMIC-III the data is recorded hourly. There is a interval difference between recording of data in both data sets.
I want to predict diagnosis from the vital signs by giving streams/sequences of vital signs to my model every 5 minutes. How can I merge both data sets for my model?
You need to be able to find a common field using which you can do a merge. For e.g. patient_ids or it's like. You can do the same with ICU episode identifiers. It's a been a while since I've worked on the MIMIC dataset to recall exactly what those fields were.
Dataset
Granularity
Subsampling for 5-minutely
MIMIC-I
Minutely
Subsample every 5th reading
MIMIC-III
Hourly
Interpolate the 10 5-minutely readings between each pair of consecutive hourly readings
The interpolation method you choose to get the between hour readings could be as simple as forward-filling the last value. If the readings are more volatile, a more complex method may be appropriate.

Preparing time series data for building an RNN

I am preparing time series data to build an RNN model (LSTM). The data is collected from sensors installed in a mechanical plant. Consider I have data for input and output temperature of a compressor along with the time stamps.
Like this there is data for around 20 parameters recorded along with their time stamps. Problem is there is a difference in the time stamps at which data is collected.
So how do I ideally match the time stamps to create a single dataframe with all the parameters and a single time stamp?
Since an RNN doesn't know anything about time deltas but only about time steps, you will need to quantify / interpolate your data.
Find the smallest time delta Δt in all of your series
Resample all of your 20 series to Δt/2* or smaller (Nyquist-Theorem)
* Actually you'd need to do a Fourier transform and then use twice the cutoff frequency as sampling rate. Δt/2 might IMHO be a good approximation.

Correct way to feed timestamps into RNN?

I built an RNN that predicts query execution time for an unseen query. I want to add a timestamp as a feature, as it probably helps to estimate whether the server is busy or not. How can I combine a date/time variable with my query vector and feed it into my RNN model?
Yes, I could calculate the time delta by hand and feed it as a float, but that feels like cheating.
Regardless of the model you are using, your goal is to translate date-time stamps into numerical features that can give some insight into when the server is busy.
If you have periodic server usage, then you might want to create a periodic numerical feature. E.g. Hour # (0-23), or minutes, or maybe even week day # (0-6). If you have a linear trend over time (think server usage is slowly going up on average), then you might want to also translate the date-time stamps into a correctly scaled feature of "time since ...". E.g. number of days since first observation, or # of weeks, etc...
I hope that helps.

Using Torch for Time Series prediction using LSTMs

My main problem is how should I pre-process my dataset that is basically a 60 minutely sequenced numbers inputs that will result in a 1 hourly output. Knowing that each input vector every minute is producing some output, but unfortunately this output can't be observed until 1 hour is passed.
I thought about considering putting 60 inputs as one big input vector which corresponds to 1 hourly output on a normal ML classfier, hence having 1 sample at a time. But I don't think it would be time series anymore.
How can I represent that to be doable in an LSTM environment?