R: Is it possible to render more precise outline weights in ggplot2? - ggplot2

Create a simple stacked bar chart:
require(ggplot2)
g <- ggplot(mpg, aes(class)) +
geom_bar(aes(fill = drv, size=ifelse(drv=="4",2,1)),color="black",width=.5)
I can change the size values in size=ifelse(drv=="4",2,1) to lots of different values and I still get the same two line weights. Is there a way around this? (2,1) produces the same chart as (1.1,1) and (10,1). Ideally the thicker weight should be just a bit thicker than the standard outline, rather than ~10x bigger.
As added background, you can set size outside of aes() and have outline width scale as you'd expect, but I can't can't assign the ifelse condition outside of aes() without getting an error.

The two size values you give as aesthetics are just just arbitrary levels that are not taken at face value by ggplot.
You can use
ggplot(mpg, aes(class)) +
geom_bar(
aes(fill = drv, size = drv == "4"),
color = "black", width = .5) +
scale_size_manual(values = c(1, 2))
Where the values parameter allows you to specify the precise sizes for each level you define in the asesthetics.

Related

create separate legend for each row of facet_grid

I am using nested_facet() to plot a large number of experiments, yielding a 9x6 array. Each panel in the array is a stacked bargraph with variables indicated by color, same set of variables common to each row.
My code...
ggplot(data, aes(x=enzyme_drug, y=counts, fill = species)) +
geom_bar(position = "fill", stat = "identity") +
theme(axis.text.x = element_text(angle = 90)) +
guides(fill=guide_legend(ncol=1)) +
facet_grid(pH ~ combo, scales = "free")
The only problem with putting them all together into a facet grid is that the number of variables indicated by color in the figure legend is too high, resulting in adjacent colors that are hard to differentiate.
An easy way out of this would be to create a separate legend for each row, with the small number of variables for each row being much easier to differentiate.
Any ideas on how to do this?
The other alternative I realise is to loop the creation of separate facet_grids into a list and then put them together with ggarrange() - this yields unwieldy x-axis labels repeated for each row, although I could remove the x-axes manually for each row in the arrangement there must be a simpler method?
Thanks in advance...

how fix the y-axis's rate in plot

I am using a line to estimate the slope of my graphs. the data points are in the same size. But look at these two pictures. the first one seems to have a larger slope but its not true. the second one has larger slope. but since the y-axis has different rate, the first one looks to have a larger slope. is there any way to fix the rate of y-axis, then I can see with my eye which one has bigger slop?
code:
x = np.array(list(range(0,df.shape[0]))) # = array([0, 1, 2, ..., 3598, 3599, 3600])
df1[skill]=pd.to_numeric(df1[skill])
fit = np.polyfit(x, df1[skill], 1)
fit_fn = np.poly1d(fit)
df['fit_fn(x)']=fit_fn(x)
df[['Hodrick-Prescott filter',skill,'fit_fn(x)']].plot(title=skill + date)
Two ways:
One, use matplotlib.pyplot.axis to get the axis limits of the first figure and set the second figure to have the same axis limits (using the same function) (could also use get_ylim and set_ylim, which are specific to the y-axis but require directly referencing the Axes object)
Two, plot both in a subplots figure and set the argument sharey to True (my preferred, depending on the desired use)

How to fill a line in 2D image along a given radius with the data in a given line image?

I want to fill a 2D image along its polar radius, the data are stored in a image where each row or column corresponds to the radius in target image. How can I fill the target image efficiently? Such as with iradius or some functions? I do not prefer a pix-pix operation.
Are you looking for something like this?
number maxR = 100
image rValues := realimage("I(r)",4,maxR)
rValues = 10 + trunc(100*random())
image plot :=realimage("Ring",4,2*maxR,2*maxR)
rValues.ShowImage()
plot.ShowImage()
plot = rValues.warp(iradius,0)
You might also want to check out the relevant example code from the F1 help documentation of GMS itself:
Explaining warp a bit:
plot = rValues.warp(iradius,0)
Assigns values to plot based on a value-lookup in rValues.
For each pixel in plot a coordinate position in rValues is computed, and the value is simply looked up. If the computed coordinate is non-integer, bilinear interpolation between the 4 closest points is used.
In the example, the two 'formulas' for the coordinate calculation are simple x' = iradius and y' = 0 where iradius is an expression computed from the coordinate in plot, for convenience.
You can feed any expression into the parameters for warp( ) and the command is closely related to just using the square bracket notation of addressing values. In fact, the only difference is that warp performs the bilinear interpolation of values instead of truncating the coordinates to integer values.

How to start with the first axis tick at the origin in ggplot

I would like to always start with the first axis tick at the origin, such as in this image:
Playing around with xlim, expand_limits or scale_x_continuous doesn't really work - there's always a little spacing between first axis tick and origin. There should be an easy way to do this. Many thanks in advance.
Unfortunately, ggplot2 is set up expand the axes and thus pad the axis ticks, and there's not a simple function to force them to be flush at the origin. I find the best way to get axis ticks at the origin is to use coord_cartesian() to turn off axis expansion and manually specify limits
require(ggplot2)
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
theme_classic() +
coord_cartesian(expand = FALSE, #turn off axis expansion (padding)
xlim = c(1, 6), ylim = c(10, 35)) #manually set limits
I prefer to define limits in coord_cartesian() rather than in scale_y_...() and scale_x_...() because coord_cartesian() crops the view but does not alter the underlying data. scale...() functions remove data outside the limits, which could mess up fitted lines (stat_smooth()) or summary stats if you crop out an outlier.
The solution that worked for me is to set the option expand=c(0,0) together with the limits option, e.g.:
scale_y_continuous(expand = c(0, 0), limits = c(10,NA))
Y axis then starts at the origin with the first tick with label 10.

pyplot scatter plot marker size

In the pyplot document for scatter plot:
matplotlib.pyplot.scatter(x, y, s=20, c='b', marker='o', cmap=None, norm=None,
vmin=None, vmax=None, alpha=None, linewidths=None,
faceted=True, verts=None, hold=None, **kwargs)
The marker size
s:
size in points^2. It is a scalar or an array of the same length as x and y.
What kind of unit is points^2? What does it mean? Does s=100 mean 10 pixel x 10 pixel?
Basically I'm trying to make scatter plots with different marker sizes, and I want to figure out what does the s number mean.
This can be a somewhat confusing way of defining the size but you are basically specifying the area of the marker. This means, to double the width (or height) of the marker you need to increase s by a factor of 4. [because A = WH => (2W)(2H)=4A]
There is a reason, however, that the size of markers is defined in this way. Because of the scaling of area as the square of width, doubling the width actually appears to increase the size by more than a factor 2 (in fact it increases it by a factor of 4). To see this consider the following two examples and the output they produce.
# doubling the width of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*4**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Notice how the size increases very quickly. If instead we have
# doubling the area of markers
x = [0,2,4,6,8,10]
y = [0]*len(x)
s = [20*2**n for n in range(len(x))]
plt.scatter(x,y,s=s)
plt.show()
gives
Now the apparent size of the markers increases roughly linearly in an intuitive fashion.
As for the exact meaning of what a 'point' is, it is fairly arbitrary for plotting purposes, you can just scale all of your sizes by a constant until they look reasonable.
Edit: (In response to comment from #Emma)
It's probably confusing wording on my part. The question asked about doubling the width of a circle so in the first picture for each circle (as we move from left to right) it's width is double the previous one so for the area this is an exponential with base 4. Similarly the second example each circle has area double the last one which gives an exponential with base 2.
However it is the second example (where we are scaling area) that doubling area appears to make the circle twice as big to the eye. Thus if we want a circle to appear a factor of n bigger we would increase the area by a factor n not the radius so the apparent size scales linearly with the area.
Edit to visualize the comment by #TomaszGandor:
This is what it looks like for different functions of the marker size:
x = [0,2,4,6,8,10,12,14,16,18]
s_exp = [20*2**n for n in range(len(x))]
s_square = [20*n**2 for n in range(len(x))]
s_linear = [20*n for n in range(len(x))]
plt.scatter(x,[1]*len(x),s=s_exp, label='$s=2^n$', lw=1)
plt.scatter(x,[0]*len(x),s=s_square, label='$s=n^2$')
plt.scatter(x,[-1]*len(x),s=s_linear, label='$s=n$')
plt.ylim(-1.5,1.5)
plt.legend(loc='center left', bbox_to_anchor=(1.1, 0.5), labelspacing=3)
plt.show()
Because other answers here claim that s denotes the area of the marker, I'm adding this answer to clearify that this is not necessarily the case.
Size in points^2
The argument s in plt.scatter denotes the markersize**2. As the documentation says
s : scalar or array_like, shape (n, ), optional
size in points^2. Default is rcParams['lines.markersize'] ** 2.
This can be taken literally. In order to obtain a marker which is x points large, you need to square that number and give it to the s argument.
So the relationship between the markersize of a line plot and the scatter size argument is the square. In order to produce a scatter marker of the same size as a plot marker of size 10 points you would hence call scatter( .., s=100).
import matplotlib.pyplot as plt
fig,ax = plt.subplots()
ax.plot([0],[0], marker="o", markersize=10)
ax.plot([0.07,0.93],[0,0], linewidth=10)
ax.scatter([1],[0], s=100)
ax.plot([0],[1], marker="o", markersize=22)
ax.plot([0.14,0.86],[1,1], linewidth=22)
ax.scatter([1],[1], s=22**2)
plt.show()
Connection to "area"
So why do other answers and even the documentation speak about "area" when it comes to the s parameter?
Of course the units of points**2 are area units.
For the special case of a square marker, marker="s", the area of the marker is indeed directly the value of the s parameter.
For a circle, the area of the circle is area = pi/4*s.
For other markers there may not even be any obvious relation to the area of the marker.
In all cases however the area of the marker is proportional to the s parameter. This is the motivation to call it "area" even though in most cases it isn't really.
Specifying the size of the scatter markers in terms of some quantity which is proportional to the area of the marker makes in thus far sense as it is the area of the marker that is perceived when comparing different patches rather than its side length or diameter. I.e. doubling the underlying quantity should double the area of the marker.
What are points?
So far the answer to what the size of a scatter marker means is given in units of points. Points are often used in typography, where fonts are specified in points. Also linewidths is often specified in points. The standard size of points in matplotlib is 72 points per inch (ppi) - 1 point is hence 1/72 inches.
It might be useful to be able to specify sizes in pixels instead of points. If the figure dpi is 72 as well, one point is one pixel. If the figure dpi is different (matplotlib default is fig.dpi=100),
1 point == fig.dpi/72. pixels
While the scatter marker's size in points would hence look different for different figure dpi, one could produce a 10 by 10 pixels^2 marker, which would always have the same number of pixels covered:
import matplotlib.pyplot as plt
for dpi in [72,100,144]:
fig,ax = plt.subplots(figsize=(1.5,2), dpi=dpi)
ax.set_title("fig.dpi={}".format(dpi))
ax.set_ylim(-3,3)
ax.set_xlim(-2,2)
ax.scatter([0],[1], s=10**2,
marker="s", linewidth=0, label="100 points^2")
ax.scatter([1],[1], s=(10*72./fig.dpi)**2,
marker="s", linewidth=0, label="100 pixels^2")
ax.legend(loc=8,framealpha=1, fontsize=8)
fig.savefig("fig{}.png".format(dpi), bbox_inches="tight")
plt.show()
If you are interested in a scatter in data units, check this answer.
You can use markersize to specify the size of the circle in plot method
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randn(20)
x2 = np.random.randn(20)
plt.figure(1)
# you can specify the marker size two ways directly:
plt.plot(x1, 'bo', markersize=20) # blue circle with size 10
plt.plot(x2, 'ro', ms=10,) # ms is just an alias for markersize
plt.show()
From here
It is the area of the marker. I mean if you have s1 = 1000 and then s2 = 4000, the relation between the radius of each circle is: r_s2 = 2 * r_s1. See the following plot:
plt.scatter(2, 1, s=4000, c='r')
plt.scatter(2, 1, s=1000 ,c='b')
plt.scatter(2, 1, s=10, c='g')
I had the same doubt when I saw the post, so I did this example then I used a ruler on the screen to measure the radii.
I also attempted to use 'scatter' initially for this purpose. After quite a bit of wasted time - I settled on the following solution.
import matplotlib.pyplot as plt
input_list = [{'x':100,'y':200,'radius':50, 'color':(0.1,0.2,0.3)}]
output_list = []
for point in input_list:
output_list.append(plt.Circle((point['x'], point['y']), point['radius'], color=point['color'], fill=False))
ax = plt.gca(aspect='equal')
ax.cla()
ax.set_xlim((0, 1000))
ax.set_ylim((0, 1000))
for circle in output_list:
ax.add_artist(circle)
This is based on an answer to this question
If the size of the circles corresponds to the square of the parameter in s=parameter, then assign a square root to each element you append to your size array, like this: s=[1, 1.414, 1.73, 2.0, 2.24] such that when it takes these values and returns them, their relative size increase will be the square root of the squared progression, which returns a linear progression.
If I were to square each one as it gets output to the plot: output=[1, 2, 3, 4, 5]. Try list interpretation: s=[numpy.sqrt(i) for i in s]