Adding geom_vline for eventdate after filtering for ID adds vlines for every IDs eventdate - ggplot2

Im having a large dataset with repeated measurements in long format for several IDs. It contains measurments of patients. Every measurement is recorded to a timepoint as well as a date which is stored in date variable. In addition I record whether or not the ID experience an "event". The time of the event is stored in a date variable. I'm drawing a plot for every single ID using ggplot2 of the measurements over time, and want to add a vertical line for when the "event" has happened. What i do is I first filter the data for the ID of which I want to draw the graph. Then I add the vline to the event date. However, when I add the vline, I get a line for every eventdate, even the IDs that are not filtered for in the analysis.
Here's is some sample data (In my real data there are alot more IDs)
library(tidyverse)
sampledata <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Measure1 = c(10, 20, 0, 30, 20, 10, 2, 0, 0, 0), timepoint = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), time = structure(c(18628, 18748, 18840, 18932, 19024, 19205, 19297, 19024, 19113, 19205), class = "Date"), event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), eventdate = structure(c(18779, 18779, 18779, 18779, 18779, 19024, 19024, 19024, 19024, 19024), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
Here is the graph for ID 1:
filter(sampledata, ID %in% 1&Measure1 !="NA") %>% ggplot(aes(x = time, y = Measure1)) +
geom_line(size=0.3,linetype="solid") +
geom_point(size=2, color="#0073C2FF") +
geom_vline(xintercept = as.numeric(as.Date(sampledata$eventdate)), linetype=1) +
theme_gray() + theme(text = element_text(size=12), axis.text=element_text(size=8), legend.position="none", axis.title.y = element_blank()) +
labs(y="ylab", x = "Follow up") +
scale_x_date(date_labels = "%Y-%m-%d", date_breaks = "2 months")
graph picture link
As you can see, I get a vertical line for ID 1's eventdate (2021-06-01), but I also get a line for ID 2's eventdate (2022-02-01).
I guess I'm doing something wrong when filtering. Any idea as to how I can achieve the graph with only the vline for the selected ID? (My next step is to loop the graph so as to do the same graph for all the IDs so I do not want to hard code anything)
Thank you!

The issue is that you passed the eventdate column from your unfiltered dataset sampledata to xintercept. Hence you get a vline for each eventdate in the unfiltered data.
To fix this use aesthetics, i.e. do aes(xintercept=eventdate). Additionally, even after doing so you are actually plotting multiple vlines as the events and event dates are duplicated. To fix this I use data = ~ distinct(.x, event, eventdate) to filter the data for unique events and event dates.
library(tidyverse)
sampledata <- structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), Measure1 = c(10, 20, 0, 30, 20, 10, 2, 0, 0, 0), timepoint = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), time = structure(c(18628, 18748, 18840, 18932, 19024, 19205, 19297, 19024, 19113, 19205), class = "Date"), event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), eventdate = structure(c(18779, 18779, 18779, 18779, 18779, 19024, 19024, 19024, 19024, 19024), class = "Date")), row.names = c(NA, 10L), class = "data.frame")
filter(sampledata, ID %in% 1 & Measure1 != "NA") %>%
ggplot(aes(x = time, y = Measure1)) +
geom_line(size = 0.3, linetype = "solid") +
geom_point(size = 2, color = "#0073C2FF") +
geom_vline(data = ~ distinct(.x, event, eventdate), aes(xintercept = eventdate), linetype = 1) +
theme_gray() +
theme(text = element_text(size = 12), axis.text = element_text(size = 8), legend.position = "none", axis.title.y = element_blank()) +
labs(y = "ylab", x = "Follow up") +
scale_x_date(date_labels = "%Y-%m-%d", date_breaks = "2 months")

Related

Google BigQuery Standard SQL get weight summarize result by group

Original data
structure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
What I need is
Year Country weightresult
weightresult=result*(number/sum(number_year,country))
with weight result by number and the sum number is according to Year,Country group
the process result is
tructure(list(Year = c(1999, 1999, 1999, 2000, 2000, 2000),
Country = c("a", "b", "b", "a", "a", "b"), number = c(2,
3, 4, 5, 3, 6), result = c(2, 4, 5, 6, 2, 2), weight = c(2,
7, 7, 8, 8, 6), wre = c(2, 1.71428571428571, 2.85714285714286,
3.75, 0.75, 2)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Finally need is
structure(list(Country = c("a", "a", "b", "b"), Year = c(1999,
2000, 1999, 2000), wre = c(2, 4.5, 4.57142857142857, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))
How to get the finally result in Bigquery Standard SQL
SELECT
Year,
Country,
(number/(SUM(number) OVER (PARTITION BY Year, Country))) * result AS wre,
Count(*),
FROM `table`
Where
Year<=2020
GROUP BY Year,Country
ORDER BY Year,Country
And the error is
SELECT list expression references column number which is neither grouped nor aggregated at ...
Use below
select
year,
country,
sum(number * result) / sum(number) as weighted_result
from your_table
where year <= 2020
group by year,country
order by year,country
with output

Add linetype to geom_segment legend

Aim: add the linetypes of the segments to the legend, as well as the colour.
Problem: only the colour is showing.
Data:
m = as.data.frame(matrix(c(1:10), ncol = 2, nrow = 10))
Plot:
ggplot(m, aes(v1,v2)) + geom_segment(aes(x = 0, xend = 9.75, y = 10, yend = 10, colour = "PEL"), linetype = "dotted") + geom_segment(aes(x = 0, xend = 9.75, y = 5, yend = 5, colour = "AL1"), linetype = "longdash") + geom_segment(aes(x = 0, xend = 9.75, y = 2, yend = 2, colour = "ISQG"), linetype = "solid") + scale_colour_manual("legend", values = c("PEL" = "black", "AL1" = "blue", "ISQG" = "purple"), guide = guide_legend(override.aes = list(alpha = 1))) + theme(legend.position = "bottom")
I've tried adding scale_linetype_manual(values = c("PEL" = "dotted", "AL1" = "longdash", "ISQG" = "solid") but nothing changes.
This answer is similar, Legend linetype in ggplot but I couldn't figure out how to make it work with geom_segment
Thank you in advance
The most ggplot-esque way of doing this, is to include a linetype variable as part of mapping in the aes() functions. You must then ensure that both the linetype and colour scales have the same titles, breaks, limits, labels etc.
Alternatively, you can also include the linetype in the override.aes part of guide_legend().
library(ggplot2)
ggplot() +
geom_segment(
aes(x = 0, xend = 9.75, y = 10, yend = 10, colour = "PEL", linetype ="PEL"),
) +
geom_segment(
aes(x = 0, xend = 9.75, y = 5, yend = 5, colour = "AL1", linetype ="AL1"),
) +
geom_segment(
aes(x = 0, xend = 9.75, y = 2, yend = 2, colour = "ISQG", linetype = "ISQG"),
) +
scale_colour_manual(
"legend",
values = c("PEL" = "black", "AL1" = "blue", "ISQG" = "purple"),
) +
scale_linetype_manual(
"legend",
values = c("PEL" = "dotted", "AL1" = "longdash", "ISQG" = "solid"),
) +
theme(legend.position = "bottom")
Created on 2022-05-19 by the reprex package (v2.0.1)

Wrong coloring in ggplot line graphs

I have created a ggplot graph with three lines. Each line represents a different column in a data frame and colored in a different color. For some reason, the colors in the final graph are not coordinated to the code.
The data frame:
Scenario 1 Scenario 2 Scenario 3 Years
0.0260 0.0340 0.0366 1
0.0424 0.0562 0.0696 2
0.0638 0.0878 0.1150 3
0.0848 0.1280 0.1578 4
0.1096 0.1680 0.2074 5
0.1336 0.2106 0.2568 6
This is the code:
ggplot(ext2, aes(x = Years))+
geom_line(aes(y = `Scenario 1`, color = "darkblue"))+
geom_line(aes(y = `Scenario 2`, color = "darkred"))+
geom_line(aes(y = `Scenario 3`, color = "darkgreen"))+
xlab("Years")+
ylab("Quasi - extinction probability")+
ggtitle("2 mature individuals")+
geom_segment(aes(x = 45,y = 0.5, xend = 45, yend = 1.1),linetype = "longdash")+
geom_segment(aes(x = 75,y = 0.2, xend = 75, yend = 0.5),linetype = "longdash")+
geom_segment(aes(x = 0,y = 0.5, xend = 100, yend = 0.5),linetype = "longdash")+
geom_segment(aes(x = 0,y = 0.2, xend = 100, yend = 0.2),linetype = "longdash")+
geom_text(x = 20, y = 0.80, label = "CE")+
geom_text(x = 40, y = 0.35, label = "EN")+
scale_colour_manual(values = c("darkblue", "darkred","darkgreen"), labels = c("Scenario 1","Scenario 2","Scenario 3"))+
theme(legend.title = element_blank())+
theme_minimal()
and this is the graph:
Click here to see graph
The problem is that what I defined as 'scenario 3' in the code is actually a representation of 'scenario 2' in the data frame. You can see it according to the values under scenario 2 in the data frame.
For ggplot, the data needs to be in long format before you plot. Then, you can make "Scenarios" (i.e., name) the group, so that you can manually color the individual lines (i.e., with scale_colour_manual).
library(tidyverse)
ext_long <- ext2 %>%
pivot_longer(!Years)
ggplot(ext_long, aes(x = Years, color = name)) +
geom_line(aes(y = value)) +
xlab("Years") +
ylab("Quasi - extinction probability") +
ggtitle("2 mature individuals") +
geom_segment(aes(
x = 45,
y = 0.5,
xend = 45,
yend = 1.1
), linetype = "longdash") +
geom_segment(aes(
x = 75,
y = 0.2,
xend = 75,
yend = 0.5
), linetype = "longdash") +
geom_segment(aes(
x = 0,
y = 0.5,
xend = 100,
yend = 0.5
), linetype = "longdash") +
geom_segment(aes(
x = 0,
y = 0.2,
xend = 100,
yend = 0.2
), linetype = "longdash") +
geom_text(x = 20, y = 0.80, label = "CE") +
geom_text(x = 40, y = 0.35, label = "EN") +
scale_colour_manual(
values = c("darkblue", "darkred", "darkgreen"),
labels = c("Scenario 1", "Scenario 2", "Scenario 3")
) +
theme(legend.title = element_blank()) +
theme_minimal()
Output (only have a small part of the data, which is the reason the lines do not extend across the graph)
Data
ext2 <- structure(
list(
Scenario.1 = c(0.026, 0.0424, 0.0638, 0.0848,
0.1096, 0.1336),
Scenario.2 = c(0.034, 0.0562, 0.0878, 0.128,
0.168, 0.2106),
Scenario.3 = c(0.0366, 0.0696, 0.115, 0.1578,
0.2074, 0.2568),
Years = 1:6
),
class = "data.frame",
row.names = c(NA,-6L)
)

The second description of the x-axis in ggplot2?

I am wondering if there is a way to add the second description of x-axis in ggplot2 as follows: Here "the second description" refers "Sample A / Sample B / two arrows" colored in red (shown in the figure).
Please click for the figure!
Of course, I can just put the "second description" using PowerPoint as I did, but I just wonder if it is possible to add it using ggplot2.
Here is the code for the background plot.
library(ggplot2)
library(ggridges)
x <- data.frame(v1=rnorm(100, mean = -2, sd = 0.022),
v2=rnorm(100, mean = -1, sd = 0.022),
v3=rnorm(100, mean = 0, sd = 0.022),
v4=rnorm(100, mean = 1, sd = 0.022),
v5=rnorm(100, mean = 2, sd = 0.022),
v6=rnorm(100, mean = 3, sd = 0.022),
v7=rnorm(100, mean = 4, sd = 0.022))
colnames(x) <- c("A",
"B",
"C",
"D",
"E",
"F",
"G")
head(x)
# Manipulate the data
library(reshape2)
data <- melt(x)
head(data)
# Generating plot
colors <- rainbow(7)
ggplot(data, aes(x = value, y = variable)) +
geom_density_ridges(aes(fill = variable), alpha=0.6, bandwidth=0.1) +
scale_fill_manual(values = colors)+
theme(axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
legend.text = element_text(size = 12),
plot.title = element_text(size = 17, face = "bold",
margin = margin(b=10), hjust = 0.5),
panel.spacing = unit(0.1, "lines"),
legend.position="none") +
geom_vline(xintercept = 0, linetype="dotted") +
geom_vline(xintercept = 2, linetype="dotted",
color = "red", size=1.2) +
xlab("") +
ylab("Groups") +
labs(title = 'Density plot of each group')
Thank you in advance!
I'm not 100% sure this is what you mean, but you can add text on the x-axis using the following in labs:
labs(x="← Sample A Sample B →")
I got the arrows from unicode here: http://xahlee.info/comp/unicode_arrows.html
There are bigger arrows in the link if needed.
EDIT:
Here's your code adapted with the new labels in red font:
ggplot(data, aes(x = value, y = variable)) +
geom_density_ridges(aes(fill = variable), alpha=0.6, bandwidth=0.1) +
scale_fill_manual(values = colors)+
theme(axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
legend.text = element_text(size = 12),
plot.title = element_text(size = 17, face = "bold",
margin = margin(b=10), hjust = 0.5),
panel.spacing = unit(0.1, "lines"),
legend.position="none") +
geom_vline(xintercept = 0, linetype="dotted") +
geom_vline(xintercept = 2, linetype="dotted",
color = "red", size=1.2) +
xlab("🢀 Sample A Sample B 🢂") +
theme(axis.title.x = element_text(size=40,colour = "red")) +
ylab("Groups") +
labs(title = 'Density plot of each group')
You can also push the labels further apart by adding extra spaces. Bring them closer together with fewer spaces.

Count most recent zeros in pandas data frame

date_0 = list(pd.date_range('2017-01-01', periods=6, freq='MS'))
date_1 = list(pd.date_range('2017-01-01', periods=8, freq='MS'))
data_0 = [9, 8, 4, 0, 0, 0]
data_1 = [9, 9, 0, 0, 0, 7, 0, 0]
id_0 = [0]*6
id_1 = [1]*8
df = pd.DataFrame({'ids': id_0 + id_1, 'dates': date_0 + date_1, 'data': data_0 + data_1})
For each id (here 0 and 1) I want to know how long is the series of zeros at the end of the time frame.
For the given example, the result is id_0 = 3, id_1 = 2.
So how do I limit the timestamps, so I can run something like that:
df.groupby('ids').agg('count')
First need get all consecutive 0 with trick by compare with shifted values for not equal and cumsum.
Then count pre groups, remove first level of MultiIndex and get last values per group by drop_duplicates with keep='last':
s = df['data'].ne(df['data'].shift()).cumsum().mul(~df['data'].astype(bool))
df = (s.groupby([df['ids'], s]).size()
.reset_index(level=1, drop=True)
.reset_index(name='val')
.drop_duplicates('ids', keep='last'))
print (df)
ids val
1 0 3
4 1 2