Final Editing of a grid.arranged ggplot - ggplot2

I try to explain my problems but perhaps there are to many so I dont know where to start. And I am running out of time :(
I have tested the ability of fungi to alter plastic surfaces after 2 diff timepoints and in two batches. Method of surface investigation was ATR-FT-IR. I now have spectral IR data from 4 different substrates, each exposed to 5 diff fungi for two diff times. Every sample was measured 10 times (very mostly but sadly not always). Logically, I was running control samples (no fungi and no treatment, sample treated but without fungi), also for the two diff batches. SO- for each Substrate, I come up with around 140 columns and 1820 rows. I shrunk the data to respective means and standard deviations with excel and imported it as .xlsx- because .csv absolutely failed and i could figure out why ?! Catastrophe.
> head(pet)
Wavenumbers MEAN_PET_untreated SD_PET_untreated MEAN_c_PET_B1_AL1 SD_PET_B1_AL1 MEAN_c_PET_B1_AL2 SD_c_PET_B1_AL2
1 3997.805 8.021747e-05 0.0003198024 -5.862401e-05 0.0002445300 0.0001309613 0.0004636534
2 3995.877 7.575977e-05 0.0003168603 -4.503153e-05 0.0002384142 0.0001185064 0.0004360579
3 3993.948 7.713719e-05 0.0003169468 -3.218230e-05 0.0002414230 0.0001145128 0.0004352532
4 3992.020 7.847460e-05 0.0003191443 -3.255098e-05 0.0002519945 0.0001258732 0.0004388980
5 3990.091 7.835603e-05 0.0003159916 -4.792059e-05 0.0002617358 0.0001325122 0.0004465352
6 3988.163 7.727790e-05 0.0003063113 -6.286794e-05 0.0002593732 0.0001297744 0.0004532126
My goal was a multiplot, showing averaged spectral data with geom_path and geom_ribbons per fungus, yielding 5 elements per plot (substrate pur, controle t1, controle t2, fungi treat 1, fungi treat 2). The dataset is really large so I had problems to handle it and created these plots manually, so NOT by faceting.
F4<-ggplot(pet)+
geom_errorbar(aes(x = Wavenumbers, y = MEAN_c_PET_B2_AL2, ymin = MEAN_c_PET_B2_AL2 - SD_c_PET_B2_AL2, ymax = MEAN_c_PET_B2_AL2 + SD_c_PET_B2_AL2, group=1), alpha= .1, stat="identity", position = "identity", colour="red")+
geom_path(aes(x = Wavenumbers, y = MEAN_c_PET_B2_AL2), stat="identity", group= 1, colour= "red")+
geom_errorbar(aes(x = Wavenumbers, y = MEAN_c_PET_B2_AL1 ,ymax = MEAN_c_PET_B2_AL1 + SD_c_PET_B2_AL1, ymin = MEAN_c_PET_B2_AL1 - SD_c_PET_B2_AL1, group=1), alpha= .1, stat="identity", position = "identity", colour="purple")+
geom_path(aes(x = Wavenumbers, y = MEAN_c_PET_B2_AL1), stat="identity", group= 1, colour= "purple")+
geom_errorbar(aes(x = Wavenumbers, y = MEAN_PET_untreated, ymax = MEAN_PET_untreated + SD_PET_untreated, ymin = MEAN_PET_untreated - SD_PET_untreated, group=1), alpha= .1, stat="identity", position = "identity", colour="yellow")+
geom_path(aes(x = Wavenumbers, y = MEAN_PET_untreated), stat="identity", group= 1, colour= "yellow")+
geom_errorbar(aes(x = Wavenumbers, y = MEAN_F4_PET_B2_AL1, ymax = MEAN_F4_PET_B2_AL1 + SD_F4_PET_B2_AL1, ymin = MEAN_F4_PET_B2_AL1 - SD_F4_PET_B2_AL1, group=1), alpha= .1, stat="identity", position = "identity", colour="orange")+
geom_path(aes(x = Wavenumbers, y = MEAN_F4_PET_B2_AL1), stat="identity", group= 1, colour= "orange")+
geom_errorbar(aes(x = Wavenumbers, y = MEAN_F4_PET_B2_AL2, ymax = MEAN_F4_PET_B2_AL2 + SD_F4_PET_B2_AL2, ymin = MEAN_F4_PET_B2_AL2 - SD_F4_PET_B2_AL2, group=1), alpha= .1, stat="identity", position = "identity", colour="darkgreen")+
geom_path(aes(x = Wavenumbers, y = MEAN_F4_PET_B2_AL2), stat="identity", group= 1, colour= "darkgreen")+xlab(NULL)+ylab(NULL)+
scale_x_reverse(limits=c(4000 , 500))
So far I summarized the diff ggplots with:
pets<-grid.arrange(F1, F2, F7,F4, F19, ncol = 1, nrow = 5)
ggsave("Multi.pdf", width = 210, height = 297, units = "mm", pets)
This is nearly fine, not elegant and very complicated, but I wont give up at this stage of work as it costed me a whole week. Sadly, I am not really happy with the design, not even to say, I can not use this like it is. Currently, I try to find solutions regarding:
a) Getting rid of empty grid areas left and right to the plotted values. I use scale_x_reverse(limits=c(4000 , 500)), but the range is extended to both sides on the x axis.
b) Creating manually a legend, because even if it would be possible to do this via shared.legend or whatever, it would always yield to many elements. I only want 5 elements with the always repeating, same colors (red=substrate pure, orange= cT_t1, yellow= cT_t2, green= f_t1, purple = f_t2)
c) creating manually a y-labeling (Absorbance), spanning invisible over all plots (vertically)- I tried to label only the 3. plot in the middle, but this leads to a indentation of this plot and the ones above and below appear more left-ragged. If this would be possible, I could use the direct labeling for indicating the respective fungus (e.g. F4).
d) creating a global x labeling- because if I label only the last element, the height of the last plot is reduced by the height of the label.
e) Give it an overall name.
What makes me nervous, too, is that I get an error only for geom_path, telling me that 1 row was removed. But shouldnt this affect also the geom_ribbon? Has it something to do with the fact that I have to call ribbon BEFOR I call geom_path? Otherwise, the lines would have been hidden by the ribbon.
Removed 1 row(s) containing missing values (geom_path).
Also, I am a wondering about the long duration of code execution. 1 element needs 20 seconds, the whole plot 2 minutes to compute. But at least, it is not collapsing like Excel did before- inclusively data loss. Is it normal for such huge datasets? Or could it indicate a very problematic problem?
Ok, finally I hope someone is out there, having had similar work-around-solutions. Because, like I said, I am not willing to spend another week to tidyr or reshape or mutate or whatever.
Thanx in advance! :)

Related

Calculating the size of an object using opencv and numpy poly1d

I'm looking to use a small numpy array to generate a curve that I can use to predict the height measurement at non-known points. I have several points that I am using to create a poly1d. I know it's possible, we use software that does it just fine at work, and when I used a different image as a tester, plugging the values into Excel and getting the polynomial, it worked fine, but I'm getting pretty drastic measurements on a different calibratable image, I get drastically different results.
Here is the image that I'm trying to measure.
The stick on the front of the pole contains known measurements. From bottom to top, they are 3'6" (42"), 6'6" (78"), 9' 8" (116"), 13' (156)
The picture has been through opencv undistort with a calibrated camera.
This is the function that actually performs the logic. x and y are gathered by cv2 EVENT_LBUTTONUP, and sent to this function.
Checking the lengths of the array is just to help me figure out why this isn't working, trying to generate a line to show the curve fit.
dist = self.firstClick-y
self.yData.append(dist)
if len(self.yData) > 4:
print(self.poly(dist))
if len(self.yData) == 4:
array = np.array(self.xData)
array = np.expand_dims(array, axis=0)
print(self.xData)
print(self.yData)
array=np.append(array, [self.yData], axis=0)
print(array)
x = array[:,0]
y = array[:,1]
self.poly = np.poly1d(np.polyfit(x, y, 2))
poly1d = np.poly1d(self.poly)
xp = np.linspace(-2, 20, 1)
_ = plt.plot(x, y, '.', xp, self.poly(xp), '-', xp, self.poly(xp), '--')
plt.ylim(0,200)
plt.show()
When I run this code, my values tend to quickly go into the tens of thousands when I'm attempting to collect the measurement at 18' 11", (the lowest wire).
Any help would be appreciated, I've been up all night trying to fit this curve.
Edit:
Sorry, I should have included the code used to display and scale the image.
self.img = cv2.imread(imagePath, cv2.IMREAD_ANYCOLOR)
self.scale_percent = 30
self.width = int(self.img.shape[1] * self.scale_percent/100)
self.height = int(self.img.shape[0] * self.scale_percent/100)
dsize = (self.width, self.height)
self.output = cv2.resize(self.img, dsize)
img = self.output
cv2.imshow('image', img)
cv2.setMouseCallback('image', self.click_event)
cv2.waitKey()
I just called this function to display the image and the below code to calibrate the values.
if self.firstClick == 0:
self.firstClick = y
cv2.putText(self.output, "Pole Base", (x, y), font, 1, (255, 255, 0), 2)
cv2.imshow('image', self.output)
elif self.firstClick != 0 and self.secondClick == 0:
self.secondClick = y
print("The difference in first and second clicks is", self.firstClick - self.secondClick)
first = self.firstClick - self.secondClick
inch = first/42
foot = inch*12
self.foot = foot
print("One foot is currently: ", foot)
self.firstLine = 3.5*12
self.secondLine = 6.5*12
self.thirdLine = 9.67*12
self.fourthLine = 13*12
self.xData = np.array([self.firstLine, self.secondLine, self.thirdLine, self.fourthLine])
self.yData.append(self.firstLine)
print(self.firstLine)
print(self.secondLine)
print(self.thirdLine)
print(self.fourthLine)

Why is there a space between the bars and the axis in ggplot2 bar graphs, and how do I get rid of it?

I've been building a bar graph in R, and I noticed a problem. whenever the graph is made, it has a very small gap between the bars and the axis that causes a line of the background image to appear (Link). How can I get rid of this?
Code:
album_cover <- image_read("https://i.scdn.co/image/ab67616d0000b273922a12ba0b5a66f034dc9959")
ggplot(data=album_df, aes(x=rev(factor(track_names, track_names)), y=-1 * track_length)) +
ggtitle("Songs vs length")+
annotation_custom(rasterGrob(album_cover,
width = unit(1,"npc"),
height = unit(1,"npc")),
-Inf, Inf, -Inf, Inf)+
#geom_image(image = "https://i.scdn.co/image/ab67616d0000b273922a12ba0b5a66f034dc9959", size = Inf) +
geom_bar(stat="identity", position = "identity", color = 'NA', alpha = 0.9, width = 1, fill = 'white') +
scale_y_continuous(expand = c(0, 0), limits = c(-1 * max_track, 0)) +
scale_x_discrete(expand = c(0, 0)) +
theme(axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank()
) +
coord_flip()
Interesting issue. I've tried many things, including modification of many of the theme elements. It works with theme_void(), but then the issue resurfaces as you add back in the plot elements (namely the song titles on the axis, for some reason).
What finally did work is just squishing your image to be ever so slightly less than 1. In this case, just changing from 1 to 0.999 fixes the issue and you no longer have the strip of the image hanging out on the right. For this, I made up my own data, but I'm using the same image:
df <- data.frame(
track_names=paste0('Song',1:8),
track_length=c(3.5,7.5,5,3,7,10,6,7.4)
)
album_cover <- image_read2("https://i.scdn.co/image/ab67616d0000b273922a12ba0b5a66f034dc9959")
ggplot(data=df, aes(x=track_names, y=-1*track_length)) +
annotation_custom(rasterGrob(album_cover,
width=unit(0.999,'npc'), height=unit(1,'npc')),
xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf) +
geom_col(alpha=0.9, width=1, fill='white', color=NA) +
scale_y_continuous(expand=c(0,0)) +
scale_x_discrete(expand=c(0,0)) +
ggtitle('Songs vs Length') +
coord_flip()
Note, the same code above gives the following image below when width=unit(1, 'npc'),... in the rasterGrob() function (note the line at the right side of the image):

Set annotation for same coordinate points matplotlib

I have 12 different points and 10 of them are related to the first two; I want to set label for each of this 10 points individually, but sometimes two or more of them have the same coordinate yet I want to show all the label for that coordinate (not on top of each other but readable)
As you can see in the below picture two set of points have the same coordinate and the label of them have overlapping
booleanFunction = np.array(["K","I","H" ,"G", "F", "E" , "D" , "M", "B", "A"])
pointsx = np.empty((rs.shape[1],1))
pointsy = np.empty((rs.shape[1],1))
....
....
....
pl.figure()
pl.hold(True)
pl.plot(X1, Y1, 'ro', X2, Y2, 'y<')
pl.plot(pointsx, pointsy, 'b3')
for i in range (len(pointsx)):
pl.annotate(booleanFunction[i], xy=(pointsx[i], pointsy[i]), xycoords='data', textcoords='data')
I one of my codes to avoid annotation overlap I do something like this:
xoffset = 0.1
switch = -1
for i in range (len(pointsx)):
pl.annotate(booleanFunction[i], xy=(pointsx[i], pointsy[i]),
xytext=(pointsx[i]+switch*xoffset, pointsy[i]),
xycoords='data', textcoords='data')
switch*=-1
This writes your annotated text alternatively shifted left and right xoffset from the point you want to annotate. Of course you can use something similar for the y direction or for both.

implementing ease in update loop

I want to animate a sprite from point y1 to point y2 with some sort of deceleration. when it reaches point y2, the speed of the object will be 0 so it will completely stop.
I Know the two points, and I know the object's starting speed.
The animation time is not so important to me. I can decide on it if needed.
for example: y1 = 0, y2 = 400, v0 = 250 pixels per second (= starting speed)
I read about easing functions but I didn't understand how do I actually implement it in the
update loop.
here's my update loop code with the place that should somehow implement an easing function.
-(void)onTimerTick{
double currentTime = CFAbsoluteTimeGetCurrent() ;
float timeDelta = self.lastUpdateTime - currentTime;
self.lastUpdateTime = currentTime;
float *pixelsToMove = ???? // here needs to be some formula using v0, timeDelta, y2, y1
sprite.y += pixelsToMove;
}
Timing functions as Bézier curves
An easing timing function is basically a Bézier curve from (0,0) to (1,1) where the horizontal axis is "time" and the vertical axis is "amount of change". Since a Bézier curve mathematically is as
start*(1-t)^3 + c1*t(1-t)^2 + c2*t^2(1-t) + end*t^3
you can insert any time value and get the amount of change that should be applied. Note that both time and change is normalized (in the range of 0 to 1).
Note that the variable t is not the time value, t is how far along the curve you have come. The time value is the x value of the point along the curve.
The curve below is a sample "ease" curve that starts off slow, goes faster and slows down in the end.
If for example a third of the time had passed you would calculate what amount of change that corresponds to be update the value of the animated property as
currentValue = beginValue + amountOfChange*(endValue-beginValue)
Example
Say you are animating the position from (50, 50) to (200, 150) using a curve with control points at (0.6, 0.0) and (0.5, 0.9) and a duration of 4 seconds (the control points are trying to be close to that of the image above).
When 1 second of the animation has passed (25% of total duration) the value along the curve is:
(0.25,y) = (0,0)*(1-t)^3 + (0.6,0)*t(1-t)^2 + (0.5,0.9)*t^2(1-t) + (1,1)*t^3
This means that we can calculate t as:
0.25 = 0.6*t(1-t)^2 + 0.5*t^2(1-t) + t^3
Wolfram Alpha tells me that t = 0.482359
If we the input that t in
y = 0.9*t^2*(1-t) + t^3
we will get the "amount of change" for when 1 second of the duration has passed.
Once again Wolfram Alpha tells me that y = 0.220626 which means that 22% of the value has changed after 25% of the time. This is because the curve starts out slow (you can see in the image that it is mostly flat in the beginning).
So finally: 1 second into the animation the position is
(x, y) = (50, 50) + 0.220626 * (200-50, 150-50)
(x, y) = (50, 50) + 0.220626 * (150, 100)
(x, y) = (50, 50) + (33.0939, 22.0626)
(x, y) = (50+33.0939, 50+22.0626)
(x, y) = (83.0939, 72.0626)
I hope this example helps you understanding how to use timing functions.

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.