How to force a time series to plot on the whole year, when data is only a few months - pandas

I have multiple time series each having a different beginning and end time. When I plot them using pandas and matplotlib I get nice graphs beginning from t0 and ending at tx for each individual series. I know that I cannot plot different length series in one plot, but i would like to at least view them with the months lining up.
For example, say I have two series: 1, begins April and ends September, 2 begins February and ends December.
How do visualize them so that each series is plotted on a yearly graph (Jan to Dec) even though the data does not span those dates? I want to see them one above the other they lining up according to months.
I have it like this so far, with xlim=('jan', 'dec'), but I just get blank plots
for dfl in dfl_list[0:2]:
dfl.plot(x='DateTime', y=['VWCmax', 'VWCmin'],
ax=p1, fontsize=15, xlim=('Jan', 'Dec'))
p1.set_title('Time vs VWC', fontsize=15)
p1.set_ylabel('VWC (%) ' + '{}'.format(imei), fontsize=15)
p1.set_xlabel('Time Stamp', fontsize=15)
I've also tried xticks instead of xlim, but I also get blank plots.

The problem that I was having was that I thought that the argument for xlim could be be the strings 'Jan', and 'Dec', this ended up returning blank graphs because pyplot did not know how to fit a graph on string type. the solution is that xlim has to be passed datetime arguments:
for dfl in dfl_list[0:2]:
dfl.plot(x='DateTime', y=['VWCmax', 'VWCmin'],
ax=p1, fontsize=15, xlim=(datetime(2017,1,1), datetime(2017,12,31))
p1.set_title('Time vs VWC', fontsize=15)
p1.set_ylabel('VWC (%) ' + '{}'.format(imei), fontsize=15)
p1.set_xlabel('Time Stamp', fontsize=15)

Related

Use Row of Data Frame as X-Axis in Plotly Line Chart

I am trying to make a plotly line chart that shows team member progression with the following excel data:
For the life of me I cannot figure out how to set the team member names as the color for the lines, the x-axis as the months, and the y-axis as the numeric values. Closest I've gotten is a blank graph, and I've tried about 400 combinations of parameters for
px.line(
df,
color="Team_Member",
x = "the row of months... something with iloc maybe?",
y = df.columns,
title="Average Estimated Daily Working Hours by Team Member"
)

How can I draw Yearly series using monthly data from a DateTimeIndex in Matplotlib?

I have monthly data of 6 variables from 2014 until 2018 in one dataset.
I'm trying to draw 6 subplots (one for each variable) with monthly X axis (Jan, Feb....) and 5 series (one for each year) with their legend.
This is part of the data:
I created 5 series (one for each year) per variable (30 in total) and I'm getting the expected output but using MANY lines of code.
What is the best way to achieve this using less lines of code?
This is an example how I created the series:
CL2014 = data_total['Charity Lottery'].where(data_total['Date'].dt.year == 2014)[0:12]
CL2015 = data_total['Charity Lottery'].where(data_total['Date'].dt.year == 2015)[12:24]
This is an example of how I'm plotting the series:
axCL.plot(xvals, CL2014)
axCL.plot(xvals, CL2015)
axCL.plot(xvals, CL2016)
axCL.plot(xvals, CL2017)
axCL.plot(xvals, CL2018)
There's no need to litter your namespace with 30 variables. Seaborn makes the job very easy but you need to normalize your dataframe first. This is what "normalized" or "unpivoted" looks like (Seaborn calls this "long form"):
Date variable value
2014-01-01 Charity Lottery ...
2014-01-01 Racecourse ...
2014-04-01 Bingo Halls ...
2014-04-01 Casino ...
Your screenshot is a "pivoted" or "wide form" dataframe.
df_plot = pd.melt(df, id_vars='Date')
df_plot['Year'] = df_plot['Date'].dt.year
df_plot['Month'] = df_plot['Date'].dt.strftime('%b')
import seaborn as sns
plot = sns.catplot(data=df_plot, x='Month', y='value',
row='Year', col='variable', kind='bar',
sharex=False)
plot.savefig('figure.png', dpi=300)
Result (all numbers are randomly generated):
I would try using .groupby(), it is really powerful for parsing down things like this:
for _, group in data_total.groupby([year, month])[[x_variable, y_variable]]:
plt.plot(group[x_variables], group[y_variables])
So here the groupby will separate your data_total DataFrame into year/month subsets, with the [[]] on the end to parse it down to the x_variable (assuming it is in your data_total DataFrame) and your y_variable, which you can make any of those features you are interested in.
I would decompose your datetime column into separate year and month columns, then use those new columns inside that groupby as the [year, month]. You might be able to pass in the dt.year and dt.month like you had before... not sure, try it both ways!

How to avoid initial data changing when plotting additional data in plot

I want to plot two data series in one plot, but when plotting both data series, one of the series are changing. Matplotlib draws lines between the wrong data.
Firsty_values and secondy_values are lists of timestamps sorted and stretching one 24h interval.
Firstx_values and secondx_values are values in the range 18-21.
The first plot shows the two series together while the last plot shows one of the series alone.
#Firsty_values and secondy_values looks like this:
#['2019-05-04 00:00:03',
# '2019-05-04 00:02:03',
# ...
# '2019-05-04 23:56:03',
# '2019-05-04 23:58:02']
#Firstx_values and secondx_values looks like this:
#[18.32,18.34 ..... 19.32,19.31]
plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length more than 1053
plt.show()
#plt.plot(firsty_values,firstx_values,'b')
plt.plot(secondy_values, secondx_values, 'g')
plt.ylabel('Temperature [C]')
plt.xlabel('Time')
plt.legend(['SA1_563_04_RT601A', 'SA1_563_04_RT601B'])
plt.xticks([100,604,1053]) #length less than 1053
plt.show()
Output:
Output with both data series :
Output with one data series :
First plot draw lines between data points that does not lie next to each other. The problem seems to be that some of the data points from the second series are put out of order after the points from the first series. This is reflected by the "xticks" showing three lables when ploting both and two lables when ploting one series.

matplotlib candlestick chart bar output error - seems to be plotting more than one timeframe on single bar

I am attempting to plot a candlestick chart using matplotlib, with hourly candles. However my output looks strange and it seems to be plotting multiple "hours" on one candle.
My code is as follows:
cursor = conx.cursor()
query= 'SELECT ticker,date,time,open,low,high,close FROM eurusd WHERE date > "2014-01-28"'
cursor.execute(query)
for line in cursor:
#appendLine in correct format for candlesticks - date,open,close,high,low
date=date2num(line[1])
open=(line[3])
high=(line[5])
low=(line[4])
close=(line[6])
appendLine = date,open,close,high,low
candleAr.append(appendLine)
fig = plt.figure()
ax1 = plt.subplot(1,1,1)
candlestick(ax1, candleAr, width=0.6, colorup='g', colordown='r')
ax1.grid(True)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
And my output looks like the following:
Do I have to manipulate the "date2num" function to account for the fact that my data is hourly and not daily?
Managed to answer my own question - It was due to the date2num output having repeated values and was cramming all the days hourly bars into one. I had to add my date and time together to get a datetime, and then use the date2num on the date time (rather than date)
date=[]
open=[]
low=[]
high=[]
close=[]
candleAr=[]
for line in cursor:
time1=datetime.time(0,0)
time=datetime.datetime.combine(line[1],time1)
time=time+line[2]
appendLine = date2num(time),line[3],line[6],line[5],line[4]

horizontally centered xlabels for pandas timeseries plotting

When plotting a Series with a PeriodIndex, pandas always locates the xlabels on the beginning of a Period:
DATA = pd.Series(np.random.randn(120), index=pd.period_range("2013-01", "2012-12", freq="M"))
DATA.plot(ax=plt.gca())
So in this case, the annual labels (2003, ... 2012) are located at the first of January of each year. How can I have the annual labels centered horintally, while keeping the xticks at their places?
So in my example, I want the major_xticks located on each Jan 1st, but the label "2012" be centered between 2012-01-01 and 2013-01-01.