Visualising results from 'effects' package - data-visualization

Any ideas on how to control the tick-label size e.g. on the following code i'd like to control the font size of the words 'Male' and 'Female' on the x axis and the units of 'volunteer' on the Y axis.
Any ideas on how to plot prediction intervals rather than confidence intervals would also be much appreciated.
code from here.
library(effects);
library(stats);
mod.cowles <- glm(volunteer ~ sex + neuroticism*extraversion, data=Cowles, family=binomial);
eff.cowles <- allEffects(mod.cowles, xlevels=list(neuroticism=0:24, extraversion=seq(0, 24, 6)));
plot(eff.cowles, par.strip.text = list(cex = 1.2), xlab=list(cex=2.8), cex=2.5 #par.settings=list(scales=list(cex=1.4),#doesn't work.
#par.scales=list(cex=1.4),#doesn't work.
#scales=list(cex=1.4),#doesn't work
#pscales=list(cex=1.5)#doesn't work.
)

On your first question, I think you're looking for
plot(eff.cowles, cex.lab=.4)
The cex.lab argument is probably what you need.
I don't know an easy way to do prediction intervals rather than confidence intervals, although I'm sure it's possible.

Related

How do I display all of my x axis labels in my bar plot in ggplot2?

So I have my graph made which is really great. I am now just trying to get each x=axis label to be there (patient column).Is there a way for me to do this in ggplot? (please note patient is just 1-67). I have tried a couple of other methods like lars or +theme(legend.key.size=unit(1,"in"),legend.title=element_text(size=22),legend.text = element_text(size=17))+axis(1,at=midpts,labels=names(Patient)) but neither have worked. Any advice is appreciated!
[![][1]][1]
ggplot(data=V5.ACE2.double.replacement.and.redo.of.AUC.calculation.CSV.file, mapping=aes(x=Patient,y=Fluorescent.sum.over.240.min,fill=Top.20.),las=2)+geom_bar(stat='identity')+theme(panel.grid.major=element_blank(),panel.grid.minor=element_blank(),panel.background=element_blank())+labs(x="\nPatient Sample")+labs(y="\nFluorescence sum over 240 min")+theme(axis.title.x=element_text(size=26))+theme(axis.title.y=element_text(size=26))+theme(axis.text=element_text(size=18))+theme(legend.key.size=unit(1,"in"),legend.title=element_text(size=22),legend.text = element_text(size=17))
Your x-axis is the numeric patient ID and it's getting configured as a continuous scale. It sounds like you want a categorical scale. Turn the Patient column into a factor with something like this:
library(tidyverse)
V5.ACE2.double.replacement.and.redo.of.AUC.calculation.CSV.file <- V5.ACE2.double.replacement.and.redo.of.AUC.calculation.CSV.file %>% mutate(Patient = as_factor(Patient))
And you'll get a categorical axis.

how fix the y-axis's rate in plot

I am using a line to estimate the slope of my graphs. the data points are in the same size. But look at these two pictures. the first one seems to have a larger slope but its not true. the second one has larger slope. but since the y-axis has different rate, the first one looks to have a larger slope. is there any way to fix the rate of y-axis, then I can see with my eye which one has bigger slop?
code:
x = np.array(list(range(0,df.shape[0]))) # = array([0, 1, 2, ..., 3598, 3599, 3600])
df1[skill]=pd.to_numeric(df1[skill])
fit = np.polyfit(x, df1[skill], 1)
fit_fn = np.poly1d(fit)
df['fit_fn(x)']=fit_fn(x)
df[['Hodrick-Prescott filter',skill,'fit_fn(x)']].plot(title=skill + date)
Two ways:
One, use matplotlib.pyplot.axis to get the axis limits of the first figure and set the second figure to have the same axis limits (using the same function) (could also use get_ylim and set_ylim, which are specific to the y-axis but require directly referencing the Axes object)
Two, plot both in a subplots figure and set the argument sharey to True (my preferred, depending on the desired use)

"Zoom in" on a violinplot whilst keeping accurate quartile lines (matplotlib/seaborn)

TL;DR: How can I get a subrange of a violinplot whilst keeping accurate quartile lines?
I am using seaborn violinplots to make static charts for a report, but as far as I can tell, there's no way to redraw a particular area between limits whilst retaining the 25/median/75 quartile lines of the original dataset.
Here's my example dataset as a violin. The 25/median/75 values are left side: 1.0/5.0/9.0; right side: 2.0/5.0/9.0
My data has such a long tail that all the useful info is scrunched up into a tiny area. I want to ignore (but not throw away) the tail and show a closer look at the interesting bit.
I tried to reset the ylim using ax.set(ylim=(0, upp)), but the resultant graph is not great: it's jaggy and the inner lines don't meet the violin edge.
Is there a way to reset the y-axis limits but get a better quality result?
Next I tried to cut off the tail by dropping values from the dataset. I dropped anything over the 97th centile. The violin looks way better, but the quartile lines have been recalculated for this new dataset. They're showing a median of about 4, not 5 as per the original dataset.
I'm using inner="quartile", so the code that gets called in Seaborn is _ViolinPlotter::draw_quartiles
def draw_quartiles(self, ax, data, support, density, center, split=False):
"""Draw the quartiles as lines at width of density."""
q25, q50, q75 = np.percentile(data, [25, 50, 75])
self.draw_to_density(ax, center, q25, support, density, split,
linewidth=self.linewidth,
dashes=[self.linewidth * 1.5] * 2)
As you can see, it assumes (understandably) that one wants to draw the quartile lines at percentiles 25, 50 and 75. It'd be amazeballs if there was a way I could call draw_to_density with my own values (is there?).
At the moment, I am attempting to manually adjust the position of the lines. It's trivial to figure out & set the y-values:
for l in ax.lines:
l.set_ydata(<get correct quartile value from original dataset>)
but I'm finding it hard to figure out the limits for x, i.e. the density of the distribution at the quartiles. It seems to involve gaussian kde, and tbh it's getting hacky and inelegant at this point. Is there an easy way to calculate how long each line should be?
What do you suggest?
Thanks for your help
Lnr
W/ Thanks to #JohanC.
added gridsize=1000 to the params of the violinplot and used ax.set(ylim=(0, upp)) to resize the y-axis to show the range from 0 to upp where upp is the upper limit. Much prettier lookin' graph:

Constructing a bubble trellis plot with lattice in R

First off, this is a homework question. The problem is ex. 2.6 from pg.26 of An Introduction to Applied Multivariate Analysis. It's laid out as:
Construct a bubble plot of the earthquake data using latitude and longitude as the scatterplot and depth as the circles, with greater depths giving smaller circles. In addition, divide the magnitudes into three equal ranges and label the points in your bubble plot with a different symbol depending on the magnitude group into which the point falls.
I have figured out that symbols, which is in base graphics does not work well with lattice. Also, I haven't figured out if lattice has the functionality to change symbol size (i.e. bubble size). I bought the lattice book in a fit of desperation last night, and as I see in some of the examples, it is possible to symbol color and shape for each "cut" or panel. I am then working under the assumption that symbol size could then also be manipulated, but I haven't been able to figure out how.
My code looks like:
plot(xyplot(lat ~ long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude",
panel = function(x,y){
grid.circle(x,y,r=sqrt(quakes$depth),draw=TRUE)
}
))
Where I attempt to use the grid package to draw the circles, but when this executes, I just get a blank plot. Could anyone please point me in the right direction? I would be very grateful!
Here is the some code for creating the plot that you need without using the lattice package. I obviously had to generate my own fake data so you can disregard all of that stuff and go straight to the plotting commands if you want.
####################################################################
#Pseudo Data
n = 20
latitude = sample(1:100,n)
longitude = sample(1:100,n)
depth = runif(n,0,.5)
magnitude = sample(1:100,n)
groups = rep(NA,n)
for(i in 1:n){
if(magnitude[i] <= 33){
groups[i] = 1
}else if (magnitude[i] > 33 & magnitude[i] <=66){
groups[i] = 2
}else{
groups[i] = 3
}
}
####################################################################
#The actual code for generating the plot
plot(latitude[groups==1],longitude[groups==1],col="blue",pch=19,ylim=c(0,100),xlim=c(0,100),
xlab="Latitude",ylab="Longitude")
points(latitude[groups==2],longitude[groups==2],col="red",pch=15)
points(latitude[groups==3],longitude[groups==3],col="green",pch=17)
points(latitude[groups==1],longitude[groups==1],col="blue",cex=1/depth[groups==1])
points(latitude[groups==2],longitude[groups==2],col="red",cex=1/depth[groups==2])
points(latitude[groups==3],longitude[groups==3],col="green",cex=1/depth[groups==3])
You just need to add default.units = "native" to grid.circle()
plot(xyplot(lat ~ long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude",
panel = function(x,y){
grid.circle(x,y,r=sqrt(quakes$depth),draw=TRUE, default.units = "native")
}
))
Obviously you need to tinker with some of the settings to get what you want.
I have written a package called tactile that adds a function for producing bubbleplots using lattice.
tactile::bubbleplot(depth ~ lat*long | cut(mag, 3), data=quakes,
layout=c(3,1), xlab="Longitude", ylab="Latitude")

Interpolate in one direction

I have sampled data and plot it with imshow():
I would like to interpolate just in horizontal axis so that I can easier distinguish samples and spot features.
Is it possible to make interpolation just in one direction with MPL?
Update:
SciPy has whole package with various interpolation methods.
I used simplest interp1d, as suggested by tcaswell:
def smooth_inter_fun(r):
s = interpolate.interp1d(arange(len(r)), r)
xnew = arange(0, len(r)-1, .1)
return s(xnew)
new_data = np.vstack([smooth_inter_fun(r) for r in data])
Linear and cubic results:
As expected :)
This tutorial covers a range of interpolation available in numpy/scipy. If you want to just one direction, I would work on each row independently and then re-assemble the results. You might also be interested is simply smoothing your data (exmple, Python Smooth Time Series Data, Using strides for an efficient moving average filter).
def smooth_inter_fun(r):
#what ever process you want to use
new_data = np.vstack([smooth_inter_fun(r) for r in data])