What parameter needs to be set in pyplot.plot function to ensure that graphs from big data will look smooth - matplotlib

I am trying to plot things in matplotlib but plot function creates a graph with sudden changes despite the fact that in excel it looks smoother. i don't know the attribute so I will just call it 'squary'. The differences are show in the picture (top - matplotlib, bottom - excel).
figure = pyplot.figure();
for channel in channels:
pyplot.plot(time[:len(channel)], channel);
pyplot.show();
The original data is not 'squary'. It is high-density data collected every 10 minutes for 67 days. The excel plot was not done with smoothing option. It was done with straight lines between successive data points confirming that the data is not 'squary' at all. I assume the problem is some parameter in the pyplot function that I don't know off.
I have realized that the problem was rounding to 2 decimals in Excel. Although the plot looks smooth in Excel when copying the data from the table, numbers were trimmed. Basically just set the rounding to more decimals and then the curve will look smooth in matplotlib as well.

I finally found the solution and it is not related to matplotlib but to Excel. I have realized that the problem was rounding to 2 decimals in Excel. Although the plot looks smooth in Excel when copying the data from the table, numbers were trimmed. Basically just set the rounding to more decimals and then the curve will look smooth in matplotlib as well.
So one should be careful to how the data will be affected after copy from Excel.

Related

Gauge/ Scatter charts showing horizontal lines for large data at one point(qliksense limitation)

While working on Qliksense, Gauge/Scatter chart is showing horizontal line for large data at one point. Currently, the limitation is if there are more than 1000 data values at any given point in the scatter chart will show as horizontal line at one point. Is there any alternative way to resolve this issue for large data in Qliksense

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Halcon - Extract straight edge from XLD

I have a XLD edge, like the one in red in the sample picture below.
I need to extract start/endpoint of straight lines that reppresent it. Hough lines sort of work for this, but the results are not really replicable. minor changes in the contour produce unexpected results.
How can the contours be extracted as straight lines? (blue) with start and finish coordinates?
lines shorter than a specified length should not be counted as separate line.
Contour needs to be converted to a polygon using the following function:
gen_polygons_xld (Object, Polygons, 'ramer', 25.0)
The only adjustable parameter is the alpha (25.0) which decides the approximation threshold.

How do I increase the the size of subplots in pair plot?

I've a dataset in which there are 15 different numeric columns and I would like to plot a pair plot using seaboard. However the image size of subplots is too small to make any inference from it.
I've tried using height and aspect with pair plot. However it doesn't seems to be working for me. The plot size keeps on reducing. The same goes for fig size.
plt.figure(figsize=(40,40))
sns.pairplot(df)
plt.show()
I'm expecting a a good enough size of all the pairs so that some inference can be made on the same. However I'm getting plots too small in size to even recognise the column name.
The command works for me.
I was not aware that in Jupyter notebook we can maximise the output to its actual size.
So essentially, below works just fine.
plt.figure(figsize=(100,100))
sns.pairplot(df)
plt.show()

Layered, not stacked column graph in Excel

I want to layer (superimpose) one column graph on another in Excel. So it would be like a stacked column graph, except that each column for a given category on the x-axis would have its origin at 0 on the y-axis. My data are before and after scores. By layering the columns instead of putting them side-by-side, it would be easier to visualize the magnitude and direction of the difference between the two scores. I've seen this done with R, but can't find examples in Excel. Anyone ever attempted this?
I tried the 3D suggestion and it worked. But the other answer I discovered was to choose a Clustered Column graph and click 'Format Data Series' and change the 'overlap' percentage to 100%. I'm using a Mac so it's slightly different, but the person who helped me with this was on a PC and I've used PC's mainly. What I ended up discovering is that using 90% looked quite nice, but 100% will achieve what you're looking for.
I did the same thing for my thesis presentation. It's a little tricky and I made it by myself. To do it, you have to create a 3D bar graph (not a stacked one), in which your columns are put in front of each other. You have to make sure that all the taller columns in each X cell are behind the shorter columns in that cell on the X axis.
Once you created that graph, you can rotate the 3D graph in a way that it looks like a 2D graph (by resetting the axes values to zero). Now you have a bar graph, in which every bar has different columns and all of the columns start at zero. ;)
Short answer: Change the post score to (post - pre), then you can proceed with making the stacked bar chart.
Long and correct answer: DO NOT DO THIS. Clustered bar chart is much better because:
The visual line for comparison is the same line anyway, you're not facilitating the understanding in any means.
Any kind of overlapping of the bars conceals the area of the post-score, which induces visual distortion. A pre-score of 10 and a post score of 20 should have a column area ratio of 1:2. But if you completely overlap them, it'd be reduced to 1:1. Partial overlapping is equally problematic.