Visualizing a large data series - matplotlib

I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?

Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.

First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.

There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.

I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.

Related

creating multi-faceted plot of large geospatial data using geom_raster()

TL;DR: can anyone help w/ geospatial things in R using geom_raster() in ggplot?
Basically it seems that my issue is stemming from the fact that I don't have perfectly gridded values (aka I get this message: "Warning: Raster pixels are placed at uneven vertical/horizontal intervals and will be shifted. Consider using geom_tile() instead."). So, if I switch to geom_tile, then I can control the size of the tiles, but then they look chunky and awful since there's no interpolation feature in geom_tile. If I use geom_raster, I think it gets confused what size it should plot the pixels since it's not perfectly gridded, so when I facet my ggplot, the output sometimes will have teeny tiny dots (or even no dots at all) on some facets, and pretty maps on the other facets. When I round the lat/lon coordinates to 0 decimal places, this fixes the teeny tiny / no dots problem, but then ends up with things just like geom_tile (chunky with no blending between).Any ideas on how to fix this? I think if there's a way to manually interpolate to fill in NA values so that I do have nice symmetrically-gridded data, then geom_raster should work fine. Like what needs to happen is for each situation where we're missing a value at a certain lat/lon, we need to take the mean value between the two closest neighboring points to fill in that missing lat/lon point on the grid. But I'm not sure how to do this (aka how to convert from my dataframe into all the different spatial classes and back again). Then again, this manual approach might be overcomplicating things and I'd love a simpler solution. (fingers crossed)
I'm building this in a shiny app with crazy long code, and a very large dataset, but happy to share additional info as needed!
example plot
example plot2
warning example

How to have continuous features for a shapefile and get rid off the wrapdateline?

How to have continuous features for a shapefile ?
I mean NOT cut by the dateline to respect [-180:180] longitude excursion that I do not
want to respect.
Here is an example where I display the Russia shapefile in a leaflet map.
In fact I would like to have continuous continent.
Shapefile comes from
https://gadm.org/about.html
Any command from gdal or ogr2ogr to merge separated features ?
Thanks
If you load the GADM level-0 layer into QGIS and toggle Show Feature Count, you'll see that, even though the shape seems split, the actual layer only has a single feature:
Your shape gets cut off because the polygon crosses the boundary in the projection you are using and gets wrapped around. This doesn't mean the features get actually split.
If you want to display it as a continuous feature, you need to specify an appropriate projection. For instance, using the example here gives me this:
This is just one way, there might be different projections that fit your purpose better. Also, getting this done in leaflet is a different question.

Restore full dynamic range to a subsection of a colormap

I wish to simultaneously scatterplot two distributions on the same plot, so that I can see at a glance each distribution, as well as the relationship between them.
https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html shows:
... so if I take a chunk out of Viridis and a chunk out of Plasma, (e.g. using how to extract a subset of a colormap as a new colormap in matplotlib?) I should be good to go.
But then I'm losing the full dynamic range.
Is there any "hack" to restore this dynamic range?
The full "mathematically aesthetic" solution may be to dig into the generating code for the colormaps and regenerate from scratch, but I suspect this is a deep dive.
How do you expect to take a subset of a colormap but still have the full dynamic range of the colormap?
That is not how colormaps work.
One way you can solve this is by simply using two colormaps that look vastly different.
My CMasher package provides a large set of scientific colormaps, which were all designed to be perceptually uniform sequential and unique in appearance.
You can easily find two colormaps that are very different there.

Tiled Instance Normalization

I am currently implementing a few image style transfer algorithms for Tensorflow, but I would like to do it in tiles, so I don't have to run the entire image through the network. Everything works fine, however each image is normalized differently, according to its own statistics, which results in tiles with slightly different characteristics.
I am certain that the only issue is instance normalization, since if I feed the true values (obtained from the entire image) to each tile calculation the result is perfect, however I still have to run the entire image through the network to calculate these values. I also tried calculating these values using a downsampled version of the image, but resolution suffers a lot.
So my question is: is it possible to estimate mean and variance values for instance normalization without feeding the entire image through the network?
You can take a random sample of the pixels of the image, and use the sample mean and sample variance to normalize the whole image. It will not be perfect, but the larger the sample, the better. A few hundred pixels will probably suffice, maybe even less, but you need to experiment.
Use tf.random_uniform() to get random X and Y coordinates, and then use tf.gather_nd() to get the pixel values at the given coordinates.

transform a path along an arc

Im trying to transform a path along an arc.
My project is running on osX 10.8.2 and the painting is done via CoreAnimation in CALayers.
There is a waveform in my project which will be painted by a path. There are about 200 sample points which are mirrored to the bottom side. These are painted 60 times per second and updated to a song postion.
Please ignore the white line, it is just a rotation indicator.
What i am trying to achieve is drawing a waveform along an arc. "Up" should point to the middle. It does not need to go all the way around. The waveform should be painted along the green circle. Please take a look at the sketch provided below.
Im not sure how to achieve this in a performant manner. There are many points per second that need coordinate correction.
I tried coming up with some ideas of my own:
1) There is the possibility to add linear transformations to paths, which, i think, will not help me here. The only thing i can think of is adding a point, rotating the path with a transformation, adding another point, rotating and so on. But this would be very slow i think
2) Drawing the path into an image and bending it would surely lead to image-artifacts.
3) Maybe the best idea would be to precompute sample points on an arc, then save save a vector to the center. Taking the y-coordinates of the waveform, placing them on the sample points and moving them along the vector to the center.
But maybe i am just not seeing some kind of easy solution to this problem. Help is really appreciated and fresh ideas very welcome. Thank you in advance!
IMHO, the most efficient way to go (in terms of CPU usage) would be to use some form of pre-computed approach that would take into account the resolution of the display.
Cleverly precomputed values
I would go for the mathematical transformation (from linear to polar) and combine two facts:
There is no need to perform expansive mathematical computation
There is no need to render two points that are too close from each other
I have no ready-made algorithm for you, but you could use a pre-computed sin or cos table, and match the data range to the display size in order to work with integers.
For instance imagine we have some data ranging from 0 to 1E6 and we need to display the sin value of each point in a 100 pix height rectangle. We can use a pre-computed sin table and work with integers. This way displaying the sin value of a point would be much quicker. This concept can be refined to get a nicer result.
Also, there are some ways to retain only significant points of a curve so that the displayed curve actually looks like the original (see the Ramer–Douglas–Peucker algorithm on wikipedia). But I found it to be inefficient for quickly displaying ever-changing data.
Using multicore rendering
You could compute different areas of the curve using multiple cores (can be tricky)
Or you could use pre-computing using several cores, and one core to do finish the job.