I have a question regarding matplotlib and I already know that what I am doing is not statistically / mathematically correct in a way but I want visualize anyways using stacked line / area graphs.
The measurements I have do not use the same x axis as a basis. I mean the different lines does not have the same number of data points. I want to use time as x axis and the measurements taken are not related to exact same timestamps (think distributed systems).
I guess my question is: "can I do that in matplotlib without doing the interpolation myself?"
here some indeepth elaboration about what a stacked graph is:
http://www.leebyron.com/else/streamgraph/download.php?file=stackedgraphs_byron_wattenberg.pdf
Cheers,
Mark
I am probably not quiet understanding exactly what you want to do, but what about switching the axis so that your x-axis becomes the y-axis and then you can use something like whats suggested here Multiple overlapping plots with independent scaling in Matplotlib for multiple y-axis?
Related
TL;DR: can anyone help w/ geospatial things in R using geom_raster() in ggplot?
Basically it seems that my issue is stemming from the fact that I don't have perfectly gridded values (aka I get this message: "Warning: Raster pixels are placed at uneven vertical/horizontal intervals and will be shifted. Consider using geom_tile() instead."). So, if I switch to geom_tile, then I can control the size of the tiles, but then they look chunky and awful since there's no interpolation feature in geom_tile. If I use geom_raster, I think it gets confused what size it should plot the pixels since it's not perfectly gridded, so when I facet my ggplot, the output sometimes will have teeny tiny dots (or even no dots at all) on some facets, and pretty maps on the other facets. When I round the lat/lon coordinates to 0 decimal places, this fixes the teeny tiny / no dots problem, but then ends up with things just like geom_tile (chunky with no blending between).Any ideas on how to fix this? I think if there's a way to manually interpolate to fill in NA values so that I do have nice symmetrically-gridded data, then geom_raster should work fine. Like what needs to happen is for each situation where we're missing a value at a certain lat/lon, we need to take the mean value between the two closest neighboring points to fill in that missing lat/lon point on the grid. But I'm not sure how to do this (aka how to convert from my dataframe into all the different spatial classes and back again). Then again, this manual approach might be overcomplicating things and I'd love a simpler solution. (fingers crossed)
I'm building this in a shiny app with crazy long code, and a very large dataset, but happy to share additional info as needed!
example plot
example plot2
warning example
i am new to this so please be tolerant for mistakes i am making. i'll try to describe my problem as good as i can. feel free to give me advises on how to improve my description of the problem.
my goal: i do have this timeseries plot about different kinds of air quality values for 3 different locations. one graph for each pollutant.
in addition i do have a xarray dataarray with different variables from which i only use the one called 'kind'.
i was thinking of a horizontal bar parallelly attached to each graph inside the same plot, just indicating 3 different colors which depend on the value of the array 'kind' ('A', 'B', 'NaN') for every single timestamp.
for the timeseries i use matplotlib.
i am grateful for any help provided!
edit: this is what i'm looking for.. a horizontal bar linked to the same xaxis so it gives me additional info on my time series values. the additional information comes from another file (xarray.dataarray)
Suppose we have two similar plots.
Plot1 (already published in a paper)
Plot2 (calculated by using any software)
My question is: How can I compare my calculated plot (pdf, png, jpeg, etc) with the plot in the paper.
Thank You
To the best of my knowledge, there is currently no software that would enable you to re-convert images into their nominal data.
However, it's not that hard to write a piece of code that does it.
Here are the steps (at a high level):
extract the images from the pdf document (use iText)
separate out those images that look like a plot. You can train a neural network to do this, or you can simply look for images that contain a lot of white (assuming that's the background) and have some straight lines in black (assuming that's the foreground). Or do this step manually.
Once you have the image(s), extract axis information. I'll assume a simple lineplot. Extract the minimal x and y value, and the maximum x and y value.
separate out the colors of the lines in your lineplot, and get their exact pixel coordinates. Then, using the axis information, scale them back to their original datapoint.
apply some kind of smoothing algorithm. E.g. Savitzky-Golay
If you ever use this data in another paper, please mention that you gathered this data by approximation of their graph. Make it clear you did not use the original source data.
Reading material:
https://developers.itextpdf.com/examples
https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
https://docs.oracle.com/javase/tutorial/2d/images/index.html
I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.
Maybe someone know how to solve this problem:
I Have XY Plot with points on it (multi series). Now I would like to add to this plot a couple of reactangles to mark groups of points.
Is it possible ?
thanks for help
Ok it was easy:
Assuming someone has added XYSimleDataSet to the plot, one can add another dataset with other options (like drawing lines set to true).
This library is quite good that I can add lot of datasets to one plot (there is a difference between serie and dataset cause dataset can contain a lot of series).
It makes that library very flexible.