how to extract data from plot produced by easy.py in libsvm-3.17 - libsvm

I just downloaded libsvm-3.17 abt two weeks ago. I tried heart_scale (dataset provided in the libsvm-3.17 package) with easy.py. An image or plot is produced (from gnuplot) to illustrate the best c and best gamma. I cannot post the image here because I am new here and do not have enough reputation.
I would like to ask from the many colors curves in the plot, how to extract from the plot that the best log2(c)=11 (which gives c=2048) and the best log2(gamma)=-13 (which gives gamma = 0.0001220703125).
Thank you very much.

the chosen parameters are reported by easy.py (cannot run it now, but you will find them). the plot is just a visual aid to manually verify the parameter neighborhood. with some experience you can interpret the diagram. without experience simply trust easy.py

Related

Create subplots from interactive python plot

I have some LVIS Lidar data in hdf5 format.
The data has Lat and Long co-ordinates, so I have been able to visualise them on a map using Basemap:
f = h5py.File('ILVIS1B_GA2016_0304_R1701_043591.h5','r')
LONG = f['/LON0/']
LAT = f['/LAT0/']
X = LONG[...]
Y = LAT[...]
m = Basemap(projection='merc',llcrnrlat=-0.5,urcrnrlat=0.5,\
llcrnrlon=9,urcrnrlon=10,lat_ts=0.25,resolution='i')
m.drawcoastlines()
m.drawcountries()
parallels = np.arange(-9.,10.,0.5)
m.drawparallels(parallels,labels=[False,True,True,False])
meridians = np.arange(-1.,1.,0.5)
m.drawmeridians(meridians,labels=[True,False,False,True])
m.drawmapboundary(fill_color='white')
x,y = m(X, Y)
scatter = plt.scatter(x,y)
m.scatter(x,y)
plt.show()
This gets me this, where the orange bands are very dense points:
The hdf5 file also has the full waveform data for each mapped point (each datapoint is a reflection detected at the sensor, as a function of time) so that each of the orange points has data associated with it like:
Ultimately, I would like to be able to click on any of the orange points and for the subsequent waveform to be displayed. I have looked into interactive plots for this and have come across a number of libraries (mpl3d, plotly etc).
I'm having some trouble getting my head around some of these and how I can get my data into the examples - my python isn't up to this level. Does anyone have any advice on where to start? Which libraries would be best suited to this? A little help to understand the basics would be appreciated.
Apologies there is no direct question here, I'm just after some info from the knowledgable community.
The question seems to be: How do I tackle a task I have no clue how to solve?
Step 1: Search for a possible solution. It may happen that someone else has already solved your problem. This will mostly not be the case, but you may be lucky.
Step 2: Abstract the task. What would be the general problem that a lot of people might have and for which there might be a solution? Does it need to be hdf5 files? No. Is georeferencing important? Maybe, but one could neglect for the moment. Which requirements are really important, which not?
Step 3: Search again. You will have more success now for finding similar or related problems.
Step 4: Look at the tools in use. Make a list of possible tools and check against your requirements. Interactivity, Application or web-based, accuracy etc.
Step 5: Decide for one tool and go for it. Start with a general case study. Can I plot a map on the left and a graph on the right side using this tool? If not, find out why - maybe there is a general problem with this, maybe there is just an implementation detail missing. At this point you may ask a question about the case study problem, specifying the tool in use and providing the code that gives the problem. Do not think about your actual problem until this is solved.
Step 6: Proceed and try to add interactivity. Can I get something to happen when clicking? Again treat this independent of the actual problem. Search for solutions and if there none, ask a question about it.
Step 7: Proceed further up to the point where you're truely stuck. Now is the time to finally ask a question here, but with all the details that have brought you down to step 7 inside the question.

How can we compare two plots?

Suppose we have two similar plots.
Plot1 (already published in a paper)
Plot2 (calculated by using any software)
My question is: How can I compare my calculated plot (pdf, png, jpeg, etc) with the plot in the paper.
Thank You
To the best of my knowledge, there is currently no software that would enable you to re-convert images into their nominal data.
However, it's not that hard to write a piece of code that does it.
Here are the steps (at a high level):
extract the images from the pdf document (use iText)
separate out those images that look like a plot. You can train a neural network to do this, or you can simply look for images that contain a lot of white (assuming that's the background) and have some straight lines in black (assuming that's the foreground). Or do this step manually.
Once you have the image(s), extract axis information. I'll assume a simple lineplot. Extract the minimal x and y value, and the maximum x and y value.
separate out the colors of the lines in your lineplot, and get their exact pixel coordinates. Then, using the axis information, scale them back to their original datapoint.
apply some kind of smoothing algorithm. E.g. Savitzky-Golay
If you ever use this data in another paper, please mention that you gathered this data by approximation of their graph. Make it clear you did not use the original source data.
Reading material:
https://developers.itextpdf.com/examples
https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter
https://docs.oracle.com/javase/tutorial/2d/images/index.html

Getting started with data visualization. What is a good 'hello world' type of project?

I have been gaining interest in data visualization lately. I especially enjoy articles with narrative driven data-viz like the ones in http://polygraph.cool/ for example.
What would be a great 'hello world' project to learn about conveying information effective through data viz? I'm not sure where to start.
Thanks!
Two subreddits come to mind. Here you can find some nice applications of data visualizations, and here you can keep up to date datasets that get published. Put those two together and you can come up with some novel ideas. Looking forward to seeing your stuff in /r/dataisbeautiful!
How about starting with a data density app?
If you search on my name and "data density" you'll find some routines on the web, but that would be cheating. The way the system works is to take reciprocal of squared distance plus a fudge factor to prevent 1/d when the sample pixel point is very close to a data point. So you get the density of a 2D scatterplot.
You then need a nice visual representation of a linear scale, using colours to represent value changes. I'll give you those, I have several colour palettes at
http://www.malcolmmclean.site11.com/www/datadensity/colourschemes.c
http://www.malcolmmclean.site11.com/www/datadensity/colourschemes.h

Visualizing a large data series

I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.

Drawing cartograms with Matplotlib?

In case somebody doesn't know: A cartogram is a type of map where some country/region-dependent numeric property scales the respective regions so that that property's density is (close to) constant. An example is
from worldmapper.org. In this example, countries are scaled according to their population, resulting in near-constant population density.
Needless to say, this is really cool. Does anyone know of a Matplotlib-based library for drawing such maps? The method used at worldmapper.org is described in (1), so it would surprise me if no one has implemented this yet...
I'm also interested in hearing about other cartogram libraries, even if they're not made for Matplotlib.
(1) Michael T. Gastner and M. E. J. Newman,
Diffusion-based method for producing density-equalizing maps,
Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004). Available at arXiv.
There's this, though it's based and a different algorithm (and though it's on the ESRI site, it doesn't require ArcGIS). Of course, once you have the cartogram you can plot it in matplotlib.
Here is a Javascript plugin to make cartograms using D3. It is a good, simple solution if you are not too concerned about the regions being sized accurately. If accuracy is important, there are other options available that give you more freedom to play with the algorithm's parameters to get to a more accurate result.
Here are two great standalone programs I know of:
Scapetoad
Carto3F
Scapetoad is very easy to use. Just give it a shapefile, tell it which attribute to use for the scaling, and set a few accuracy parameters. If there is any doubt, this post describes the process.
Carto3F is more complex and allows for greater accuracy, though it is a bit trickier to figure out - lots of parameter settings without much documentation explaining them.
There is also a QGIS cartogram plugin, written in Python. Though I have not been able to get it to work, so cannot comment on that one.
In short, no. But Newman has an excellent little implementation of his and Gastner's method on his website. Installing it is easy and it works from the command line. Here's an example of a workflow using this software that worked for me.
Compute a grid of density estimates over some region, e.g. in Python. Store it as a matrix of numbers.
Run the cart program with your density matrix as input from the command line or from as subprocess in Python.
The program returns a list of new coordinates for each grid point.
Pipe your shapefile points through the interp program and into a new shapefile to get the transformed map.
There are nice instructions on the main page.
The geoplot.cartogram function in
Geoplot: geospatial data visualization — geoplot 0.2.0
says it is a high-level Python geospatial plotting library, and an extension to cartopy and matplotlib.
Try this library if you are using geopandas, it is quick and doesnt require much customization. https://github.com/mthh/cartogram_geopandas