How to average data for a variable over a number of timesteps - dataframe

I was wondering if anyone could shed some light into how I can average this data:
I have a .nc file with data (dimensions: 2029,64,32) which relates to time, latitude and longitude. Using these commands I can plot individual timesteps:
timestep = data.variables['precip'][0]
plt.imshow(timestep)
plt.colorbar()
plt.show()
Giving a graph in this format for the 0th timestep:
I was wondering if there was any way to average this first dimension (the snapshots in time).

If you are looking to take a mean over all times, try using np.mean where you use the axis keyword to say which axis you want to average.
time_avaraged = np.mean(data.variables['precip'], axis = 0)
If you have NaN values then np.mean will give NaN for that lon/lat point. If you'd rather ignore them then use np.nanmean.
If you want to do specific times only, e.g. the first 1000 time steps, then you could do
time_avaraged = np.mean(data.variables['precip'][:1000,:,:], axis = 0)

I think if you're using pandas and numpy this may help you.Look for more details
import pandas as pd
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7])
d = pd.Series(data)
print(d.rolling(4).mean())

Related

Enforcing Incoming X-Axis Data to map with Static X-Axis - Plotly

I am trying to plot a multi-axes line graph in Plotly and my data is based on the percentage (y-axis) v/s date (x-axis).
X and Y-axis coming from the database via pandas
Now since Plotly doesn't understand the order of string date in the x-axis it adjusted it automatically.
I am looking for something where my x-axis remains static for dates and in order and graph plots on top of that mapping based on their dates matching parameter.
static_x_axis = ['02-11-2021', '03-11-2021', '04-11-2021', '05-11-2021', '06-11-2021', '07-11-2021', '08-11-2021', '09-11-2021', '10-11-2021', '11-11-2021', '12-11-2021', '13-11-2021', '14-11-2021', '15-11-2021', '16-11-2021', '17-11-2021', '18-11-2021', '19-11-2021', '20-11-2021', '21-11-2021', '22-11-2021', '23-11-2021']
and the above list determines the x-axis mapping.
I tried using range but seems that does not support static mapping or either map all graphs from the 0th point.
Overall I am looking for a way that either follows a static date range or either does not break the current order of dates like what happened in the above graph.
Thanks in advance for your help.
from your question your data:
x date as a string representation (i.e. categorical)
y a number between 0 and 1 (a precentage)
three traces
you describe that x is unordered as source. Require it to be sorted in the x-axis
below simulates a figure in this way
then applies categorical axis sorting
import pandas as pd
import numpy as np
import plotly.graph_objects as go
s = pd.Series(pd.date_range("2-nov-2021", periods=40).strftime("%d-%m-%Y"))
fig = go.Figure(
[
go.Scatter(
x=s.sample(10).sort_index().values,
y=np.linspace(n/4, n/3, 10),
mode="lines+markers+text",
)
for n in range(1,4)
]
).update_traces(texttemplate="%{y:.2f}", textposition="top center")
fig.show()
fig.update_layout(xaxis={"categoryorder": "array", "categoryarray": s.values})
fig.show()

How to show min and max values at the end of the axes

I generate plots like below:
from pylab import *
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker
import matplotlib.ticker as ticker
rcParams['axes.linewidth'] = 2 # set the value globally
rcParams['font.size'] = 16# set the value globally
rcParams['font.family'] = ['DejaVu Sans']
rcParams['mathtext.fontset'] = 'stix'
rcParams['legend.fontsize'] = 24
rcParams['axes.prop_cycle'] = cycler(color=['grey','b','g','r','orange'])
rc('lines', linewidth=2, linestyle='-',marker='o')
rcParams['axes.xmargin'] = 0
rcParams['axes.ymargin'] = 0
t = arange(0,21,1)
v = 2.0
s = v*t
plt.figure(figsize=(12, 4))
plt.plot(t,s,label='$s=%1.1f\cdot t$'%v)
plt.title('Wykres drogi w czasie $s=v\cdot t$')
plt.xlabel('Czas $t$, s')
plt.ylabel('Droga $s$, m')
plt.autoscale(enable=True, axis='both', tight=None)
legend(loc='best')
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.grid()
plt.show()
When I am changing the value t = arange(0,21,1) for example to t = arange(0,20,1) which gives me for example on the x axis max value= 19.0 my max value dispirs from the x axis. The same situation is of course with y axis.
My question is how to force matplotlib to produce always plots where on the axes are max values just at the end of the axes like should be always for my purposes or should be possible to chose like an option?
Imiage from my program in Fortan I did some years ago
Matplotlib is more efficiens that I use it but there should be an opition like that (the picture above).
In this way I can always observe max min in text windows or do take addiional steps to make sure about max min values. I would like to read them from axes and the question is ...Are there such possibilites in mathplotlib ??? If not I will close the post.
Axes I am thinking about more or less
I see two ways to solve the problem.
Set the axes automatic limit mode to round numbers
In the rcParams you can do this with
rcParams['axes.autolimit_mode'] = 'round_numbers'
And turn off the manual axes limits with min and max
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
This will produce the image below. Still, the extreme values of the axes are shown at the nearest "round numbers", but the user can approximately catch the data range limits. If you need the exact value to be displayed, you can see the second solution which cannot be directly used from the rcParams.
or – Manually generate axes ticks
This solution implies explicitly asking for a given number of ticks. I guess there is a way to automatize it depending on the axes size etc. But if you are dealing with more or less every time the same graph size, you can decide a fixed number of ticks manually. This can be done with
plt.xlim(min(t),max(t))
plt.ylim(min(s),max(s))
plt.xticks(np.linspace(t.min(), t.max(), 7)) # arbitrary chosen
plt.yticks(np.linspace(s.min(), s.max(), 5)) # arbitrary chosen
generated the image below, quite similar to your image example.

Changed frequency of ticks in Pandas '.bar' plot, but messed up the actual bars

how's your self-isolation going on?
Mine rocks, as I'm drilling through visualization in Python. Recently, however, I've ran into an issue.
I figured that .plot.bar() in Pandas has an uncommon formatting of x-axis (which kinda confirms that I read before I ask). I had price data with monthly frequency, so I applied a fix to display only yearly ticks in a bar chart:
fig, ax = plt.subplots()
ax.bar(btc_returns.index, btc_returns)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
Where btc_returns is a Series object with datetime in index.
The output I got was weird. Here are the screenshots of what I expected vs the end result.
I tried to find a solution to this, but no luck. Can you guys please give me a hand? Thanks! Criticism is welcome as always :)
And my solution is like this:
fig, ax = plt.subplots(figsize=(15,7))
ax.bar(btc_returns.index, btc_returns.returns.values, width = 1)
Where btc_returns is a DataFrame with the returns of BTC. I figured that .values makes the bar plot read the datetime input correctly. For the 'missing' bars - their resolution was just way too small, so I set the width to '1'.
Using the stock value data from Yahoo Finance: Bitcoin USD
Technically, you can do pd.to_datetime(btc.Date).dt.date at the beginning, but resample won't work, which is why btc_monthly.index.date is done as a second step.
resample can happen over different periods (e.g. 2M = every two months)
Load and transform the data
import pandas as pd
import matplotlib.pyplot as plt
# load data
btc = pd.read_csv('data/BTC-USD.csv')
# Date to datetime
btc.Date = pd.to_datetime(btc.Date)
# calculate daily return %
btc['return'] = ((btc.Close - btc.Close.shift(1))/btc.Close.shift(1))*100
# resample to monthly and aggregate by sum
btc_monthly = btc.resample('M', on='Date').sum()
# set the index to be date only (no time)
btc_monthly.index = btc_monthly.index.date
Plot
btc_monthly.plot(y='return', kind='bar', figsize=(15, 8))
plt.show()
Plot Bimonthly
btc_monthly = btc.resample('2M', on='Date').sum() # instead of 'M'
btc_monthly.index = btc_monthly.index.date
btc_monthly.plot(y='return', kind='bar', figsize=(15, 8), legend=False)
plt.title('Bitcoin USD: Bimonthly % Return')
plt.ylabel('% return')
plt.xlabel('Date')
plt.show()

Graph csv data that is represented horizontally rather than vertical - Python Pandas CSV

Context: I have combined numerous CSV's into one representing use case vs usage over a period of time.
The way the data is represented currently is attached.
What I am trying to do is, for each usecase, graph across row A(1, 1.1, 1.9, 4.0.11435, 4.1.11436 and so on...) - creating a linear plot to show progression over time
What I have so far:
import matplotlib.pyplot as plt
plot_df = pd.read_csv("results.csv")
milestones = plot_df.columns[1:]
row = plot_df.iloc[0]
row.plot(kind='line')
plt.show()
Any help is appreciated.
Thank you

Seaborn time series plotting: a different problem for each function

I'm trying to use seaborn dataframe functionality (e.g. passing column names to x, y and hue plot parameters) for my timeseries (in pandas datetime format) plots.
x should come from a timeseries column(converted from a pd.Series of strings with pd.to_datetime)
y should come from a float column
hue comes from a categorical column that I calculated.
There are multiple streams in the same series that I am trying to separate (and use the hue for separating them visually), and therefore they should not be connected by a line (like in a scatterplot)
I have tried the following plot types, each with a different problem:
sns.scatterplot: gets the plotting right and the labels right bus has problems with the xlimits, and I could not set them right with plt.xlim() using data.Dates.min and data.Dates.min
sns.lineplot: gets the limits and the labels right but I could not find a setting to disable the lines between the individual datapoints like in matplotlib. I tried the setting the markers and the dashes parameters to no avail.
sns.stripplot: my last try, plotted the datapoints correctly and got the xlimits right but messed the labels ticks
Example input data for easy reproduction:
dates = pd.to_datetime(('2017-11-15',
'2017-11-29',
'2017-12-15',
'2017-12-28',
'2018-01-15',
'2018-01-30',
'2018-02-15',
'2018-02-27',
'2018-03-15',
'2018-03-27',
'2018-04-13',
'2018-04-27',
'2018-05-15',
'2018-05-28',
'2018-06-15',
'2018-06-28',
'2018-07-13',
'2018-07-27'))
values = np.random.randn(len(dates))
clusters = np.random.randint(1, size=len(dates))
D = {'Dates': dates, 'Values': values, 'Clusters': clusters}
data = pd.DataFrame(D)
To each of the functions I am passing the same arguments:
sns.OneOfThePlottingFunctions(x='Dates',
y='Values',
hue='Clusters',
data=data)
plt.show()
So to recap, what I want is a plot that uses seaborn's pandas functionality, and plots points(not lines) with correct x limits and readable x labels :)
Any help would be greatly appreciated.
ax = sns.scatterplot(x='Dates', y='Values', hue='Clusters', data=data)
ax.set_xlim(data['Dates'].min(), data['Dates'].max())