mplcursors on multiaxis graph - matplotlib

In my program, im using mplcursors on a matplotlib graph so I can identify certain points precisely.
mplcursors.cursor(multiple=True).connect("add", lambda sel: sel.annotation.draggable(False))
Now I made a complex graph with multiple axis:
first = 1
offset = 60
for x in range(len(cat_list)):
if "Time" not in cat_list[x]:
if first and not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].axis["right"].set_visible(True)
parasites[x].set_ylabel(cat_list[x])
parasites[x].axis["right"].major_ticklabels.set_visible(True)
parasites[x].axis["right"].label.set_visible(True)
p_plot, = parasites[x].plot(t, t_num_list[x], label = cat_list[x])
#parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
parasites[x].axis["right"].label.set_color(p_plot.get_color())
first = 0
elif not cat_list[x].startswith("EngineSpeed"):
parasites[x] = ParasiteAxes(host, sharex = host)
host.parasites.append(parasites[x])
parasites[x].set_ylabel(cat_list[x])
new_axisline = parasites[x].get_grid_helper().new_fixed_axis
parasites[x].axis["right"+str(x+1)] = new_axisline(loc = "right",
axes = parasites[x],
offset = (offset, 0))
p_plot, = parasites[x].plot(t, t_num_list[x])
parasites[x].axis["right"+str(x+1)].label.set_color(p_plot.get_color())
offset = offset + 60
host.legend()
fig.add_axes(host)
plt.show()
This code results in the following graph:
https://i.stack.imgur.com/Wl7yC.png
Now I have to somehow be able to select certain points by selecting which axis im using. How do I make a selection menu for choosing an active axis and how do I then use mplcursors to select my points?
Thanks,
Ziga

Related

bokeh could not set initial ranges

I am a begineer in plotting graphs in bokeh. So please forgive me if this is a stupid question.
I am trying to plot a line grpah, where my data is in a dataframe and I have provided the x and y axis as lists.
But some of my data in y axis has nonetype objects in it.
when it is nonetype in "datapoints" column the corresponding "datapoint_count" has a list like [1]. Otherwise the "dataponts" colums dhould have a list of 20 floats and corresponding datapoint_count column should have a list of 1-20 digits.
So basically I want the x axis of the graph to show a range of 1-20y axis should plot the datapoints whichill range between 90.0 - 180.0
When I am running the code there is no python error but if I go to the browser and check developer's tool it says that the bokeh could not set initial ranges.
data=df
random_figure = figure(title='random', x_axis_label="Index", y_axis_label="random [ms]",
plot_width=800, plot_height=400, output_backend="webgl")
random_figure.add_tools(random_hover)
id_values = data['testcase_id'].drop_duplicates()
data_temp= data[['id', 'datapoints']].copy()
data_temp['datapoint_count'] = None
data_temp['datapoint_count'] = data_temp['datapoint_count'].astype(object)
for indexes, item in data_temp.iterrows():
if item['datapoints'] is None or str(item['datapoints']) == '[]': # this has nonetype or strings
item['datapoints'] = [0]
else:
item['datapoints'] = [float(x) for x in item['datapoints'].strip('[').strip(']').split(',')]
iter_nr = 0
raw_data_count = []
for each in item['datapoints']:
iter_nr += 1
datapoint_count.append(iter_nr)
data_temp.at[indexes, 'datapoint_count'] = datapoint_count
name_dict_random = {'name': [], 'legend': [], 'label': []}
logging.info('START OF DRAWINGS')
for ind, id in enumerate(id_values):
it_color = Turbo256[random.randint(0, 255)]
name_glyph_random = random_figure.line(x='datapoint_count',
y='datapoints',
line_width=2,
legend_label=str(id),
source=data_temp.where(
data_temp['id'] == id).dropna(),
color=it_color)
name_dict_random['name'].append(name_glyph_random)
name_dict_random['label'].append(str(id))
logging.info('AFTER DRAWINGS LOOP')
for label in range(len(data.id.unique())):
name_dict_random['legend'].append(random_figure.legend.items[label])
initial_value = []
options = list(data.id.unique())
for i, name in enumerate(options):
options[i] = str(name)
for i in range(len(options)):
if name_dict_random['label'][i] in initial_value:
name_dict_random['name'][i].visible = True
name_dict_random['legend'][i].visible = True
else:
name_dict_random['name'][i].visible = False
name_dict_random['legend'][i].visible = False
I have solved it now.
Actually though the dataframe showed that the rows content arrays they were actually categorized as objects.
Bokeh could not understand what to do with object in the axis.
So now I have referred them wih iloc:
x = data[data['id'] == id]['datapoint_count'].iloc[0]
y = data[data['id'] == id]['datapoint'].iloc[0]
name_glyph_handover = handover_figure.line(x=x, y=y, line_width=2,
legend_label=str(id), color=it_color)

Confidence Interval for large dataset

I would like to get a confidence interval for very large datasets. It is composed by around 700,000 points for x and y. I also tried to use less data, like 200 points, and with that it is possible to plot. But, when it comes to my specific datasets, it does not show the confidence interval.
For that, my code is based on:
x_x = np.array(y_test[:, 0]) #about 700,000 points
y_y = np.array(y_pred[:, 0]) #about 700,000 points
sns.set(style = 'whitegrid')
p = sns.FacetGrid(d, size = 4, aspect = 1.5)
p.map(plt.scatter, 'x_x', 'y_y', color = 'red')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 95,
fit_reg = True, color = 'blue')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 0,
fit_reg = True, color = 'darkgreen')
And also the Figure so far:

matplotlib x-axis ticks dates formatting and locations

I've tried to duplicate plotted graphs originally created with flotr2 for pdf output with matplotlib. I must say that flotr is way easyer to use... but that aside - im currently stuck at trying to format the dates /times on x-axis to desired format, which is hours:minutes with interval of every 2 hours, if period on x-axis is less than one day and year-month-day format if period is longer than 1 day with interval of one day.
I've read through numerous examples and tried to copy them, but outcome remains the same which is hours:minutes:seconds with 1 to 3 hour interval based on how long is the period.
My code:
colorMap = {
'speed': '#3388ff',
'fuel': '#ffaa33',
'din1': '#3bb200',
'din2': '#ff3333',
'satellites': '#bfbfff'
}
otherColors = ['#00A8F0','#C0D800','#CB4B4B','#4DA74D','#9440ED','#800080','#737CA1','#E4317F','#7D0541','#4EE2EC','#6698FF','#437C17','#7FE817','#FBB117']
plotMap = {}
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure(figsize=(22, 5), dpi = 300, edgecolor='k')
ax1 = fig.add_subplot(111)
realdata = data['data']
keys = realdata.keys()
if 'speed' in keys:
speed_index = keys.index('speed')
keys.pop(speed_index)
keys.insert(0, 'speed')
i = 0
for key in keys:
if key not in colorMap.keys():
color = otherColors[i]
otherColors.pop(i)
colorMap[key] = color
i += 1
label = u'%s' % realdata[keys[0]]['name']
ax1.set_ylabel(label)
plotMap[keys[0]] = {}
plotMap[keys[0]]['label'] = label
first_dates = [ r[0] for r in realdata[keys[0]]['data']]
date_range = first_dates[-1] - first_dates[0]
ax1.xaxis.reset_ticks()
if date_range > datetime.timedelta(days = 1):
ax1.xaxis.set_major_locator(dates.WeekdayLocator(byweekday = 1, interval=1))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
ax1.xaxis.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%H:%M'))
ax1.xaxis.grid(True)
plotMap[keys[0]]['plot'] = ax1.plot_date(
dates.date2num(first_dates),
[r[1] for r in realdata[keys[0]]['data']], colorMap[keys[0]], xdate=True)
if len(keys) > 1:
first = True
for key in keys[1:]:
if first:
ax2 = ax1.twinx()
ax2.set_ylabel(u'%s' % realdata[key]['name'])
first = False
plotMap[key] = {}
plotMap[key]['label'] = u'%s' % realdata[key]['name']
plotMap[key]['plot'] = ax2.plot_date(
dates.date2num([ r[0] for r in realdata[key]['data']]),
[r[1] for r in realdata[key]['data']], colorMap[key], xdate=True)
plt.legend([value['plot'] for key, value in plotMap.iteritems()], [value['label'] for key, value in plotMap.iteritems()], loc = 2)
plt.savefig(path +"node.png", dpi = 300, bbox_inches='tight')
could someone point out why im not getting desired results, please?
Edit1:
I moved the formatting block after the plotting and seem to be getting better results now. They are still now desired results though. If period is less than day then i get ticks after every 2 hours (interval=2), but i wish i could get those ticks at even hours not uneven hours. Is that possible?
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Edit2:
This seemed to give me what i wanted:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Alan
You are making this way harder on your self than you need to. matplotlib can directly plot against datetime objects. I suspect your problem is you are setting up the locators, then plotting, and the plotting is replacing your locators/formatters with the default auto versions. Try moving that block of logic about the locators to below the plotting loop.
I think that this could replace a fair chunk of your code:
d = datetime.timedelta(minutes=2)
now = datetime.datetime.now()
times = [now + d * j for j in range(500)]
ax = plt.gca() # get the current axes
ax.plot(times, range(500))
xax = ax.get_xaxis() # get the x-axis
adf = xax.get_major_formatter() # the the auto-formatter
adf.scaled[1./24] = '%H:%M' # set the < 1d scale to H:M
adf.scaled[1.0] = '%Y-%m-%d' # set the > 1d < 1m scale to Y-m-d
adf.scaled[30.] = '%Y-%m' # set the > 1m < 1Y scale to Y-m
adf.scaled[365.] = '%Y' # set the > 1y scale to Y
plt.draw()
doc for AutoDateFormatter
I achieved what i wanted by doing this:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.

Error when generating pdf using script in R

I'm using R to loop through the columns of a data frame and make a graph of the resulting analysis. I don't get any errors when the script runs, but it generates a pdf that cannot be opened.
If I run the content of the script, it works fine. I wondered if there is a problem with how quickly it is looping through, so I tried to force it to pause. This did not seem to make a difference. I'm interested in any suggestions that people have, and I'm also quite new to R so suggestions as to how I can improve the approach are welcome too. Thanks.
for (i in 2:22) {
# Organise data
pop_den_z = subset(pop_den, pop_den[i] != "0") # Remove zeros
y = pop_den_z[,i] # Get y col
x = pop_den_z[,1] # get x col
y = log(y) # Log transform
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Write to File
a = c(i, inter, slop)
write(a, file = "C:/pop_den_coef.txt", ncolumns = 3, append = TRUE, sep = ",")
## Setup pdf
string = paste("C:/LEED/results/Images/R_graphs/Pop_den", paste(i-2), "City.pdf")
pdf(string, height = 6, width = 9)
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Population Density [log(people/km)]",
xlim = x_range,
main = "Analysis of Cities"
)
# geom_abline(intercept,slope)
p + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
Sys.sleep(5)
### close the PDF file
dev.off()
}
The line should be
print(p + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1))
In pdf devices, ggplot (and lattice) only writes to file when explicitly printed.