Calculate the min, max and mean windspeeds and standard deviations - pandas

Calculate the min, max and mean windspeeds and standard deviations of the windspeeds
: across all locations for each week (assume that the first week starts on January 2 1961) for the first 52 weeks.
get data
https://github.com/prataplyf/Wind-DateTime/blob/master/wind_data.csv
not understad how to solve
weekly average of each location
RTP VAL ....... . ..... .. .. . . .. . . .. ... BEL MAL
1961-1-1
1961-1-8
1961-1-15

Load the data:
df = pd.read_csv('wind_data.csv')
Convert date to datetime and set as the index
df.date = pd.to_datetime(df.date)
df.set_index('date', drop=True, inplace=True)
Create a DateFrame for 1961
df_1961 = df[df.index < pd.to_datetime('1962-01-01')]
Resample for statistical calculations
df_1961.resample('W').mean()
df_1961.resample('W').min()
df_1961.resample('W').max()
df_1961.resample('W').std()
Plot the data for 1961:
fix, axes = plt.subplots(12, 1, figsize=(15, 60), sharex=True)
for name, ax in zip(df_1961.columns, axes):
ax.plot(df_1961[name], label='Daily')
ax.plot(df_1961_mean[name], label='Weekly Mean Resample')
ax.plot(df_1961_min[name], label='Weekly Min')
ax.plot(df_1961_max[name], label='Weekly Max')
ax.set_title(name)
ax.legend()

Related

How can we limit the `ax.axhline` using time when the x-axis is indexed by `DateTimeIndex`?

Suppose we have the following dataframe and we have a local maximum after 2 pm. How can we use a horizontal line segment that is limited within 15 minutes from the actual point?
fig, ax = plt.subplots(figsize=(20, 4))
index = pd.date_range('7-20-2022', '7-21-2022', freq='min')
np.random.seed(0)
df = pd.DataFrame(np.cumsum(np.random.randn(len(index))), index=index)
df_limited = df.loc['7-20-2022'].between_time('14:00', '15:00')
idx = df_limited.idxmax()
max = df_limited.loc[idx]
df.plot(ax=ax)
ax.axhline(max.values, color='r', xmin=0.57, xmax=0.6)
plt.show()
Currently, the local maximizer is at 2022-07-20 14:06:00 and max is at -46.80, I want to plot a horizontal line from 13:51-14:21 where 14:06 is at the center. My solution is hard coded and I do not know how I can get xmin or xmax if I change 15 minutes to 20 minutes, also, I do not know if the position of max changes, how I can get the associated xmin and xmax?
Question
Write a function that takes idx, delta time dt, i.e., 15 minutes and returns xmin and xmax that is suited for the x-axis which is indexed by the DateTimeIndex.
You can use hlines instead of axhline to use datetime index:
offset = pd.Timedelta('15T')
ax.hlines(max.values, xmin=max.index[0]-offset, xmax=max.index[0]+offset, color='r')
Output:

Plotting dates by weekdays and groups

I would like to compare the values from different weeks in different groups. Something like daily sales for two team members by week to demonstrate the effect of one person being off/a holiday etc. The time of the sale within each day needs to be ordered within the day but the x axis should be labeled by day.
Example is arbitrary.
Example data and output
stringsAsFactors =FALSE
library(lubridate)
library(tidyverse)
library(magrittr)
#=======================
# Week on week comparison of days by a group
#=======================
# Generate DF
Date <- data.frame(Date = rep(seq(as.Date("2020-04-01"),as.Date("2020-04-14"),by="days"),4))
Time <- data.frame(Time = c(rep("00:00:01",nrow(Date)/2),rep("00:00:02",nrow(Date)/2)))
Type <- data.frame(Type = rep(c(rep("a",nrow(Date)/4),rep("b",nrow(Date)/4)),2))
df <- cbind(Date,Time,Type)
# Add random values to plot
df %<>% mutate(values = runif(nrow(.),1,10))
# Create a groups for weeks, orders for days and labels as weekdays (char strings).
df %<>% mutate(weekLevel = week(Date),
dayLevel = wday(Date),
Day = as.character(weekdays(Date)),
orderVar = paste0(dayLevel, Time))
ggplot(df %>% arrange(orderVar), aes(x = orderVar, y = values,group = interaction(Type,weekLevel),colour=Type))+
geom_line()+
scale_x_discrete(breaks =df$orderVar , labels = df$Day) +
theme(axis.text.x = element_text(angle = 90, hjust=1))
This works but the day is repeated because the breaks are set to a more granular level than the labels. It also feels a bit hacky.
Any and all feedback is appreciate :)

How to plot only business hours and weekdays in pandas

I have hourly stock data.
I need a) to format it so that matplotlib ignores weekends and non-business hours and b) an hourly frequency.
The problem:
Currently, the graph looks crammed and I suspect it is because matplotlib is taking into account 24 hours instead of 8, and 7 days a week instead of business days.
How do I tell pandas to only take into account business hours, M- F?
How I am graphing the data:
I am looping through a list of price data dataframes, graphing each data frame:
mm = 0
for ii in df:
Ddate = ii['Date']
Pprice = ii['Price']
d = Ddate.to_list()
p = Pprice.to_list()
dates = make_dt(d)
prices = unstring(p)
plt.figure()
plt.plot(dates,prices)
plt.title(stocks[mm])
plt.grid(True)
plt.xlabel('Dates')
plt.ylabel('Prices')
mm += 1
the graph:
To fetch business days, you can use below function:
df["IsBDay"] = bool(len(pd.bdate_range(df['date'], df['date'])))
//Above line should add a new column into the DF as IsBday.
//You can also use Lambda expression to check and have new column for BDay.
df['IsBDay'] = df['date'].apply(lambda x: 'True' if bool(len(pd.bdate_range(x, x))) else 'False')
Now create a new DF that will have only True IsBday column value and other columns.
df[df.IsBday != 'False']
Now your DF is ready for ploting.
Hope this helps.

Pandas df histo, format my x ticker and include empty

I got this pandas df:
index TIME
12:07 2019-06-03 12:07:28
10:04 2019-06-04 10:04:25
11:14 2019-06-09 11:14:25
...
I use this command to do an histogram to plot how much occurence for each 15min periods
df['TIME'].groupby([df["TIME"].dt.hour, df["TIME"].dt.minute]).count().plot(kind="bar")
my plot look like this:
How can I get x tick like 10:15 in lieu of (10, 15) and how manage to add x tick missing like 9:15, 9:30... to get a complet time line??
You can resample your TIME column to 15 mins intervalls and count the number of rows. Then plot a regular bar chart.
df = pd.DataFrame({'TIME': pd.to_datetime('2019-01-01') + pd.to_timedelta(pd.np.random.rand(100) * 3, unit='h')})
df = df[df.TIME.dt.minute > 15] # make gap
ax = df.resample('15T', on='TIME').count().plot.bar(rot=0)
ticklabels = [x.get_text()[-8:-3] for x in ax.get_xticklabels()]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
(for details about formatting datetime ticklabels of pandas bar plots see this SO question)

plot score against timestamp in pandas

I have a dataframe in pandas:
date_hour score
2019041822 -5
2019041823 0
2019041900 6
2019041901 -5
where date_hour is in YYYYMMDDHH format, and score is an int.
when I plot, there is a long line connecting 2019041823 to 2019041900, treating all the values in between as absent (ie. there is no score relating to 2019041824-2019041899, because there is no time relating to that).
Is there a way for these gaps/absetvalues to be ignored, so that it is continuous (Some of my data misses 2 days, so I have a long line which is misleading)
The red circles show the gap between nights (ie. between Apr 18 2300 and Apr 19 0000).
I used:
fig, ax = plt.subplots()
x=gpb['date_hour']
y=gpb['score']
ax.plot(x,y, '.-')
display(fig)
I believe it is because the date_hours is an int, and tried to convert to str, but was met with errors: ValueError: x and y must have same first dimension
Is there a way to plot so there are no gaps?
Try to convert date_hour to timestamp: df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H') before plot.
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
df.date_hour = pd.to_datetime(df.date_hour, format='%Y%m%d%H')
df.plot(x='date_hour', y='score')
plt.show()
Output:
If you don't want to change your data, you can do
df = pd.DataFrame({'date_hour':[2019041822, 2019041823, 2019041900, 2019041901],
'score':[-5,0,6,-5]})
plt.plot(pd.to_datetime(df.date_hour, format='%Y%m%d%H'), df.score)
which gives: