It is possible to rasterize sns.distplot(,rug=true)? - pdf

In Matplotlib it is possible to plot a very long array A using rasterize=True, as in the following:
plt.plot(A, rasterise=True)
This typically lowers the memory usage.
It is possible to do the same when drawing a rugplot on the support axis in Seaborn's sns.distplot(see http://seaborn.pydata.org/generated/seaborn.distplot.html)? In fact, such a rugplot can consist of many points and consume lot of memory, too.
EDIT:
As noticed in the answer below, this does not lower memory RAM consumption, but we saving the plot on file in pdf format, can alter (i.e., decrease or even increase, under certain circumstances) the dimension of the file on disk.

Seaborn distplot, like many other seaborn plots, allows to pass keyword arguments to the underlying matplotlib functions.
In this case, distplot has a keyword argument rug_kws, which accepts a dictionary of keyword arguments to be passed to the rugplot. Those are again transfered to the underlying matplotlib axvline function.
As such, you can easily provide rasterized=True to axvline via
ax = sns.distplot(x, rug=True, hist=False, rug_kws=dict(rasterized=True))
However, I'm not sure if this has the desired effect of lowering memory consumption. In general, rasterization is applied when saving the figure, so the plot shown on the screen will not be affected at all.
During the process of saving, the rasterization has to be applied, which takes more time and memory than without rasterization.
While bitmap files like png are completely rasterized anyways and will not be affected at all, the generated vector files (like pdf, eps or svg) may even have a larger filesize compared to their unrasterized counterparts.
Rasterization will then only pay off when actually opening such a file (e.g. pdf in a viewer) or processing it (e.g. in latex) where having the rasterized part consumes much less memory and allowing for faster rendering on the screen or printing.

Related

How to redraw only the regions of an image that have changed using matplotlib

I have an application where only one row of a 1024x2048 pixel image changes at a rate of 100 times per second. I would like to display and update this image in real time using minimal resources.
However, matplotlib redraws the entire image every time I call the plt.draw() function. This is slow and processor intensive.
Is there a way to redraw only one line at a time?
I am not an expert on matplotlib internals, but I think it cannot be done in that way. Matplotlib was not designed for displaying large changing textures at a high frame rate, it is designed to offer a high level and be very easy to use API for displaying interactive plots.
Internally it is implemented in both python and c++ (for low level and high performance operation), and it uses Tcl / Tk as graphical user interface and widget toolkit (what allows the great cross-platform portability among OSs).
So, your 1024x2048 matrix has to be transformated several times before it is displayed.
If you do not need the extra features matplotlib gives (as autoscaling, axes, interactive zoom...) and your main goal is speed, I recommend you to use a more focused in performance python library / module for displaying.
There are lots of options: pyopencv, pySDL2...

What is the best method to optimize a PDF with cropped, rotated pictures?

I have noticed that with cropped and rotated pictures which can happen a lot with paperless office techniques, that no tool like Adobe Acrobat or Nitro Pro can optimize away all the removed parts of the photos. Attempting to use libraries to do so in C# is an interesting challenge but could potentially save many megabytes of blank image space, or space where multiple receipts for example were scanned then cropped into separate pages and the original images are always still stored.
So to the point, there is a potentially a solution with the rotate and crop and permanently save again (while adjusting offsets to not ruin alignment of OCR or other relative data). This can be quite good in some cases but rotation represents a loss of data.
So why would it look better in a PDF reader? Subpixels e.g. ClearText. The original document rotated actually increases to monitor scale color resolution when displaying on the screen. So it is practically cheaper disk space wise to store it unrotated then on display use more processing power to adjust it and use subpixel approximations.
So 2 proposals: 1) crop it without rotating it and adjust offsets - this is wasteful a little bit (worse case at 45 degree rotation) as its a rotated rectangle within a rectangle.
2) rotate and crop it while raising to a appropriate better and color resolution and using ClearText-style enhancement to do subpixel enhancement. This seems a bit involved and it would increase the resolution while decreasing the portion of the picture which could defeat the purpose.
Does anyone have a better strategy or any suggestions to address this interesting, and yet seemingly common problem. A very nice utility tool could be written perhaps doing the naïve and 2 proposals mentioned, but perhaps I am missing a yet easier and better idea? Cutting this waste is potentially beneficial and cost saving with minimal downside if one has properly finalized their cropping decisions in PDFs.

How to determine maximum scale for cytoscape.js png output?

I want users to be able to generate high quality images for publication from their cytoscape.js graphs (in fact, this is the major use case of my application). Since there's no export from cytoscape.js to a vector format, my documentation tells users to just zoom to a high level of resolution before exporting to png, and then my code passes in the current zoom level as the scale parameter to cy.png(), along with full:true. However, at sufficiently high values for scale, no image gets generated. This maximum scale value seems to be different for different graphs, so I assume it's based on some maximum physical dimensions. How can I determine what the maximum scale value should be for any given graph, so my code won't exceed it?
Also, if I have deleted or moved elements such that the graph takes up less physical space than it used to, cy.png() seems to include that blank area in the resulting image even though it no longer contains any elements -- is there any way to avoid that?
By default, cy.png() and cy.jpg() will export the current viewport to an image. To export the fitted graph, use cy.png({ full: true }) or cy.jpg({ full: true }).
The browser enforces exported overall file size limits. This depends on the OS and the browser vendor. Cytoscape.js can not influence nor can it calculate this limit. For large resolutions/scales, the quality difference between PNG and JPG is negligible. So, JPG tends to be a better option for large exports.
Aside: While bitmap formats scale with resolution, vector formats like (SVG) scale with the number of graph elements. For large graphs, it may be impossible to export SVG (even if such a feature were implemented).

How to create a 'vector movie' out of data?

What file formats and software could I use to represent vector images over time as an animation, without compromising the advantages of the vector format?
Say I generate data that is best represented as a single point in the plane, moving over time. I would like to make an animation showing the motion of this point. One way to do this is to make a sequence of 2D bitmap images and string these together into an AVI file. But this produces either huge files (orders of magnitude larger than the underlying dataset) or very low quality animations. A stack of raster images is a very inefficient representation of the data.
A much better representation would be a sequence of 2D vector images. Vector images combine very high fidelity with small file size. But is it possible to string such images into an animation? What kind of software could be used to do so, starting from the underlying dataset?
I imagine a tool such as Adobe Flash could be used here, but this seems akin to making scatterplots from scratch in Illustrator: sure, it can be done and will look nice, but this is not how you make scatterplots. You use R, Excel or MATLAB, and then perhaps retouch the plot in a graphics program. I'm looking for a similarly efficient solution, but for making dynamic visualizations rather than plots.

Improving Speed of Histogram Back Projection

I am currently using OpenCV's built-in patch-based histogram back projection (cv::calcBackProjectPatch()) to identify regions of a target material in an image. With an image resolution of 640 x 480 and a window size of 10 x 10, processing a single image requires ~1200 ms. While the results are great, this far too slow for a real-time application (which should have a processing time of no more than ~100 ms).
I have already tried reducing the window size and switching from CV_COMP_CORREL to CV_COMP_INTERSECT to speed up the processing, but have not seen any appreciable speed up. This may be explained by the OpenCV documentation (emphasis mine):
Each new image is measured and then
converted into an image image array
over a chosen ROI. Histograms are
taken from this image image in an area
covered by a “patch” with an anchor at
center as shown in the picture below.
The histogram is normalized using the
parameter norm_factor so that it may
be compared with hist. The calculated
histogram is compared to the model
histogram; hist uses The function
cvCompareHist() with the comparison
method=method). The resulting
output is placed at the location
corresponding to the patch anchor in
the probability image dst. This
process is repeated as the patch is
slid over the ROI. Iterative histogram
update by subtracting trailing pixels
covered by the patch and adding newly
covered pixels to the histogram can
save a lot of operations, though it is
not implemented yet.
This leaves me with a few questions:
Is there another library that supports iterative histogram updates?
How significant of a speed-up should I expect from using an iterative update?
Are there any other techniques for speeding up this type of operation?
As mentioned in OpenCV Integral Histograms will definitely improve speed.
Please take a look at a sample implementation in the following link
http://smsoftdev-solutions.blogspot.com/2009/08/integral-histogram-for-fast-calculation.html