Dataframe to Plotly graph with slider without reloading dataframe - pandas

I am plotting a 2D scatter plot from a dataframe in a dash app.
X axis is my index (price points)
Then I have a number of columns corresponding to Y values for different dates (columns 2020, 2021 etc).
And I’d like to add a slider to only show Y values from a specific year.
The trick is, I don’t want to reload the dataframe each time as it takes a long time to generate and I can’t set it as a global variable because will be used by several people at the time.
Is there a way to keep all of that in memory ? Or to avoid reloading the dataframe ?

Related

why is ggplot2 geom_col misreading discrete x axis labels as continuous?

Aim: plot a column chart representing concentration values at discrete sites
Problem: the 14 site labels are numeric, so I think ggplot2 is assuming continuous data and adding spaces for what it sees as 'missing numbers'. I only want 14 columns with 14 marks/labels, relative to the 14 values in the dataframe. I've tried assigning the sites as factors and characters but neither work.
Also, how do you ensure the y-axis ends at '0', so the bottom of the columns meet the x-axis?
Thanks
Data:
Sites: 2,4,6,7,8,9,10,11,12,13,14,15,16,17
Concentration: 10,16,3,15,17,10,11,19,14,12,14,13,18,16
You have two questions in one with two pretty straightforward answers:
1. How to force a discrete axis when your column is a continuous one? To make ggplot2 draw a discrete axis, the data must be discrete. You can force your numeric data to be discrete by converting to a factor. So, instead of x=Sites in your plot code, use x=as.factor(Sites).
2. How to eliminate the white space below the columns in a column plot? You can control the limits of the y axis via the scale_y_continuous() function. By default, the limits extend a bit past the actual data (in this case, from 0 to the max Concentration). You can override that behavior via the expand= argument. Check the documentation for expansion() for more details, but here I'm going to use mult=, which uses a multiplication to find the new limits based on the data. I'm using 0 for the lower limit to make the lower axis limit equal the minimum in your data (0), and 0.05 as the upper limit to expand the chart limits about 5% past the max value (this is default, I believe).
Here's the code and resulting plot.
library(ggplot2)
df <- data.frame(
Sites = c(2,4,6,7,8,9,10,11,12,13,14,15,16,17),
Concentration = c(10,16,3,15,17,10,11,19,14,12,14,13,18,16)
)
ggplot(df, aes(x=as.factor(Sites), y=Concentration)) +
geom_col(color="black", fill="lightblue") +
scale_y_continuous(expand=expansion(mult=c(0, 0.05))) +
theme_bw()

How to plot timeseries with many NaNs?

Originally I had a dataframe containing power consumption of some devices like this:
and I wanted to plot power consumption vs time for different devices, one plot per one of 6 possible dates. After grouping by date I got plots like this one (for each group = date):
Then I tried to create similar plot, but switch date and device roles so that it is grouped by device and colored by date. In order to do it I prepared this dataframe:
It is similar to the previous one, but has many NaN values due to differing measurement times. I thought it won't be a problem, but then after grouping by device, subplots look like this one (ex is just a name of sub-dataframe extracted from loop going through groups = devices):
This is the ex dataframe (mean lag between observations is around 20 seconds)
Question: What should I do to make plot grouped by device look like ones grouped by date? (I'd like to use ex dataframe but handle NaNs somehow.)
I found solution in answer to similar question: ex.interpolate(method='linear').plot(). This line will fill gaps between data points via interpolation between plotting. This is the result:
Another thing that can help is adding .plot(marker='o', ms = 3) which won't fill gaps between points, but at least will make points visible (previously some points, mainly the peaks in energy consumption were too small in scale of whole plot). This is the result:

Plot different Times Series Data in one Chart with shared x-Axes Pandas

I want to plot 5 different data frames in 1 plot. Containing the same measurement but done at different times. The plot should share the x-Axis for all measurement.
First thing i did was to calculate the time between the measurement points. It differs between 5-10 ms but sometimes also big gaps of 200 ms.
Then i calculated the running sum over this column. Then i set this column as the index (dtype "timedelta64[ns]")
Now i want to plot those 5 times.series in one plot which share the x-Axis (as time in ms)
But i don´t now how because they have almost no common index together. The plot should have one common x-Axis from 0-3 seconds containing the 5 measurements.
Thank you!
2 Example DataFrames:
example for measuremt01
example for measuremt02

Grouped errorbar with array of strings as data points

I have different measurements from two+ sensors. I want to compare the performance of each sensor for each measurement with errorbars (mean and std). I have no problems creating and formatting a standard errorbar plot for one y (sensor) and yerr per data point (measurement). But i'm trying to create a plot like this:
I can neither find the option to do this in the matplotlib documentation nor when i google it or search this site. The closest i found was this thread:
matplotlib: grouping error bars for each x-axes tick
But this solution doesn't work for me since my datapoints aren't numbers but a pandas dataframe index of strings.
So i found the solution in the Matplotlib documentation after all. Here's the link for people who might have the same question:
A bar plot with errorbars and height labels on individual bars.

Setting the axis custom limits matplotlib dataframe

Across a list of dataframes (dflist), each showing some sensor readings in a 24 hour window, I am setting the y axis limits for these readings in matplotlib.
axes[3].set_ylim(dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].min(),dflist[day]['AS_%s_WE_%d(mv)' %(gas,sensor)].max())
So for each df in my list, a graph is produced. Unfortunately the first 10 minutes of readings throws of the scale dramatically, and I can't interpret the readings.
Now, for each df, instead of setting the minimum sensor reading as the ymin, could I tell the df to ignore the first 10 minutes (which is the first 10 readings, as I have 1 minute a reading) and take the min in the rest of the data?
You can use a boolean mask in pandas that filters out undesired values.
You didn't provide the structure of your dataframe, so I'm just writing something that gives you the right idea:
dflist[day[day['minute'] > 10]]['AS_%s_WE_%d(mv)' %(gas,sensor)].min()
Essentially you are indexing each row of day with a boolean value that is mapped to the dataframe using a conditional expression.