R Markdown - PDF Table with conditional bold format for row maximum AND percentage format - dataframe

This question is similar to my past question: Conditionally format each cell containing the max value of a row in a data frame - R Markdown PDF
The difference is in the past question my example was printing a table with numbers and this time it's technically characters (numbers with percentage format)
Data for example:
---
title: "Untitled"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, include=FALSE}
segment<- c('seg1', 'seg1', 'seg2', 'seg2', 'seg3', 'seg3', 'Tot')
subSegment<- c('subseg1.1', 'subseg1.2', 'subseg2.1', 'subseg2.2', 'subseg3.1', 'subseg3.2', "-")
co.1<- c(0.1, 0.4, 0.3, 0.2, 0.5, 0.4, 0.4)
co.2<- c(0.5, 0.3, 0.3, 0.2, 0.1, 0.5, 0.4)
co.3<- c(0.2, 0.1, 0.4, 0.4, 0.1, 0.1, 0.15)
co.4<- c(0.2, 0.2, 0.0, 0.2, 0.3, 0.0, 0.05)
total<- c(1,1,1,1,1,1,1)
df<-data.frame(segment, subSegment, co.1, co.2, co.3, co.4, total) %>%
rowwise() %>%
mutate(across(co.1:co.4, ~cell_spec(.x, 'latex', bold = ifelse(.x == max(c_across(co.1:co.4)), TRUE, FALSE))))
df %>%
kable(booktabs = TRUE,
caption = "Title",
align = "c",
escape = FALSE) %>%
kable_styling(latex_options = c("HOLD_position", "repeat_header", "scale_down"),
font_size = 6) %>%
pack_rows(index = table(fct_inorder(df$segment)),
italic = FALSE,
bold = FALSE,
underline = TRUE,
latex_gap_space = "1em",
background = "#f2f2f2")%>%
column_spec(1, monospace = TRUE, color = "white") %>%
row_spec(nrow(df), bold = TRUE)
```
so after doing this I get a very nice table:
My problem is that I want the numbers to be printed as percentages. I tried using the scales::percent both before and after the conditional formating but none of them work.
If I try to give the percentage format after the bold I get the error:
Error in UseMethod("round_any") :
no applicable method for 'round_any' applied to an object of class "character".
If I try to use it before the conditional bold then I can't find the maximum of each row since they are characters and not numbers.
aux.n<- df
aux.n[c(3:ncol(aux.n))] = sapply(aux.n[c(3:ncol(aux.n))], function(x) scales::percent(x, accuracy = 0.1))
I should add that this is just an example but actual numbers are stuff like 0.5471927 so it's really important to print "54.7%" instead of the full number.
libraries I used:
require("pacman")
p_load(tidyverse, reshape, reshape2, knitr, kableExtra, tinytex, scales, pander, janitor)

The percentage values are converted into character with the cell_spec argument. with a bit of stringr and regex the decimal values can be converted to percentages. Note % is a reserved symbol in LaTeX so needs escaping.
---
output:
pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
require("pacman")
p_load(dplyr, tidyr, stringr, kableExtra, forcats, tinytex, scales, janitor)
```{r df, include=FALSE}
segment<- c('seg1', 'seg1', 'seg2', 'seg2', 'seg3', 'seg3', 'Tot')
subSegment<- c('subseg1.1', 'subseg1.2', 'subseg2.1', 'subseg2.2', 'subseg3.1', 'subseg3.2', "-")
co.1<- c(0.1, 0.4, 0.3, 0.2, 0.5, 0.4, 0.4)
co.2<- c(0.5, 0.3, 0.3, 0.2, 0.1, 0.5, 0.4)
co.3<- c(0.2, 0.1, 0.4, 0.4, 0.1, 0.1, 0.15)
co.4<- c(0.2, 0.2, 0.0, 0.2, 0.3, 0.0, 0.05)
total<- c(1,1,1,1,1,1,1)
df <-
data.frame(segment, subSegment, co.1, co.2, co.3, co.4, total) %>%
rowwise() %>%
mutate(across(co.1:co.4, ~cell_spec(.x, 'latex', bold = ifelse(.x == max(c_across(co.1:co.4)), TRUE, FALSE)))) %>%
ungroup() %>%
pivot_longer(starts_with("co."))%>%
mutate(pc = percent(as.numeric(str_extract(value, "0.\\d+|0")), accuracy = 0.1),
value = str_replace(value, "0.\\d+|0", pc),
value = str_replace(value, "%", "\\\\%")) %>%
select(-pc) %>%
pivot_wider() %>%
select(-total, everything(), total)
```
```{r kable, results='asis'}
df %>%
kable(booktabs = TRUE,
caption = "Title",
align = "c",
escape = FALSE) %>%
kable_styling(latex_options = c("HOLD_position", "repeat_header", "scale_down"),
font_size = 6) %>%
pack_rows(index = table(fct_inorder(df$segment)),
italic = FALSE,
bold = FALSE,
underline = TRUE,
latex_gap_space = "1em",
background = "#f2f2f2") %>%
column_spec(1, monospace = TRUE, color = "white") %>%
row_spec(nrow(df), bold = TRUE)
```

Related

Graphing Multiple Column Averages from Different dfs Representing Different Years

Below is a sample of the data:
df_1 <- data.frame(total = c(0.9, 0.4, 0.2), white = c(0.6, 0.2, 0.1), black = c(0.3, 0.2, 0.1), immigrant = c(0.7, 0.3, 0.9))
df_2 <- data.frame(total = c(0.8, 0.7, 0.6), white = c(0.4, 0.3, 0.2), black = c(0.4, 0.4, 0.4), immigrant = c(0.9, 0.2, 0.1))
df_3 <- data.frame(total = c(0.6, 0.8, 0.9), white = c(0.4, 0.2, 0.7), black = c(0.2, 0.6, 0.2), immigrant = c(0.6, 0.8, 0.5))
Hi, I am interested in using ggplot2 to graph the dataframes above. In my example, each dataframe represents a different decade as follows: df_1 represents 1930, df_2 represents 1990, and df_3 represents 2020. I am interested in calculating the mean/average of each of the four columns and then graphing the results. I would like the x-axis to represent each year (1930, 1990, and 2020) and the y-axis to represent the calculated means (which should range from 0-1). The columns in all of the dataframes show different demographic groups and would be visualized as a point in the graph. Below is an idea of what I am envisioning.
Illustration of the desired graph
I tried grouping the dataframes first but then I am not sure how to categorize each dataframe as a different year. The code below is something I adapted from another graph I made but it didn't work as expected. Note, 'ratio' is meant to represent the calculated means of each column.
Consideration:
The number of rows in each column may be different throughout the dataframes
list(df_1,
df_2,
df_3) %>%
lapply(function(x) setNames(x, 'ratio')) %>%
{do.call(bind_rows, c(., .id = 'demographic'))} %>%
mutate(ratio = mean(ratio)) %>%
group_by(demographic) %>%
ggplot(aes(ratio, n, colour = demographic, group = demographic)) +
labs(x="Mean", y="Year", ))
If you want your plot to be a ggplot, then it's important for your data to be tidy. That means that 1) each variable must have its own column, 2) each observation must have its own row, and 3) each value must have its own cell. These requirements also imply that all relevant values are in one dataset, not distributed over multiple datasets.
One option is to assign a year variable to each dataset, bind your datasets together, and then "lengthen" your dataset using pivot_longer(), so you can see each combination of year and your grouping variable. Then you can use summarize() to average by year and your grouping variable.
library(tidyverse)
df_1 <- data.frame(total = c(0.9, 0.4, 0.2), white = c(0.6, 0.2, 0.1), black = c(0.3, 0.2, 0.1), immigrant = c(0.7, 0.3, 0.9))
df_2 <- data.frame(total = c(0.8, 0.7, 0.6), white = c(0.4, 0.3, 0.2), black = c(0.4, 0.4, 0.4), immigrant = c(0.9, 0.2, 0.1))
df_3 <- data.frame(total = c(0.6, 0.8, 0.9), white = c(0.4, 0.2, 0.7), black = c(0.2, 0.6, 0.2), immigrant = c(0.6, 0.8, 0.5))
df_1$year <- 1930
df_2$year <- 1990
df_3$year <- 2020
bigdf <- rbind(df_1, df_2, df_3) %>%
pivot_longer(cols = -year) %>%
mutate(year = as.factor(year)) %>%
group_by(year, name) %>%
summarize(value = mean(value))
ggplot(bigdf, aes(x = year, y = value,
color = name, group = name)) +
geom_path() + geom_point()
small edit
If you want to reorder the labels in the legend, you can turn name into an ordered factor.
bigdf <- bigdf %>%
mutate(name = factor(name,
levels = c("total",
"black",
"white",
"immigrant")))

How to replace the na with nothing in ggplot2 but not the other significance level?

I just want to have star(s) in my plot and not the na. How to replace the na with nothing in ggplot2 but not the other significance level?
Here is the script:
ggplot(df, aes(x=Gene, y=Count, fill=Stage))+
geom_boxplot()+theme_bw()+
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1, colour = 'black'))+
stat_compare_means(label.y = 15.5,label = "p.signif")
Thanks for any help.
It's difficult to answer your question without a minimal, reproducible example but here is one potential solution:
library(ggplot2)
library(ggpubr)
# use an example dataset
df <- ToothGrowth
# Default settings (has "ns")
ggplot(df, aes(x=supp, y=len, fill=dose))+
geom_boxplot()+
theme_bw()+
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1, colour = 'black'))+
stat_compare_means(label = "p.signif")
# Only plot the p.signif if it's <0.05
ggplot(df, aes(x=supp, y=len, fill=dose))+
geom_boxplot()+
theme_bw()+
theme(axis.text.x = element_text(size = 10, angle = 45, hjust = 1, colour = 'black'))+
stat_compare_means(aes(label = ifelse(..p.signif.. <= 0.05, ..p.signif.., "")))
Created on 2022-08-23 by the reprex package (v2.0.1)

What to do when pandas interval_range gives floating intervals

I wrote something like this:
cut_bins = pd.interval_range(start = -0.4, freq = 0.1, end = 0.8)
and it gives me this
how can I have them rounded to one decimal places?
One answer would be to split the object, round each bound of the intervals through a loop and rebuild the IntervalIndex:
import pandas as pd
cut_bins = pd.interval_range(start = -0.4, freq = 0.1, end = 0.8)
pd.IntervalIndex([
pd.Interval( round(i.left,1), round(i.right,1), i.closed )
for i in cut_bins
])
Out:
IntervalIndex([(-0.4, -0.3], (-0.3, -0.2], (-0.2, -0.1], (-0.1, 0.0], (0.0, 0.1] ... (0.3, 0.4], (0.4, 0.5], (0.5, 0.6], (0.6, 0.7], (0.7, 0.8]],
closed='right',
dtype='interval[float64]')

Sankey with Matplotlib

In this Senkey are two Inputs: K and S, three Outputs: H,F and Sp and the Rest: x
The Inputs shall come from the left Side, the Outputs go to the right Side.
The Rest shall go to the Top.
from matplotlib.sankey import Sankey
import matplotlib.pyplot as plt
fig = plt.figure(figsize = [10,10])
ax = fig.add_subplot(1,1,1)
ax.set(yticklabels=[],xticklabels=[])
ax.text(-10,10, "xxx")
Sankey(ax=ax, flows = [ 20400,3000,-19900,-400,-2300,800],
labels = ['K', 'S', 'H', 'F', 'Sp', 'x'],
orientations = [ 1, -1, 1, 0, -1, -1 ],
scale=1, margin=100, trunklength=1.0).finish()
plt.tight_layout()
plt.show()
I played a lot with the orientations, but nothing works or looks nice.
And, it there a way to set different colors for every arrow?
The scale of the Sankey should be such that input-flow times scale is about 1.0 and output-flow times scale is about -1.0 (see docs). Therefore, about 1/25000 is a good starting point for experimentation. The margin should be a small number, maybe around 1, or leave it out. I think the only way to have individual colors, is to chain multiple Sankeys together (with add), but that's probably not what you want. Use plt.axis("off") to suppress the axes completely.
My test code:
from matplotlib.sankey import Sankey
import matplotlib.pyplot as plt
fig = plt.figure(figsize = [10,10])
ax = fig.add_subplot(1,1,1)
Sankey(ax=ax, flows = [ 20400,3000,-19900,-400,-2300,-800],
labels = ['K', 'S', 'H', 'F', 'Sp', 'x'],
orientations = [ 1, -1, 1, 0, -1, -1 ],
scale=1/25000, trunklength=1,
edgecolor = '#099368', facecolor = '#099368'
).finish()
plt.axis("off")
plt.show()
Generated Sankey:
With different Colors
from matplotlib.sankey import Sankey
import matplotlib.pyplot as plt
from matplotlib import rcParams
plt.rc('font', family = 'serif')
plt.rcParams['font.size'] = 10
plt.rcParams['font.serif'] = "Linux Libertine"
fig = plt.figure(figsize = [6,4], dpi = 330)
ax = fig.add_subplot(1, 1, 1,)
s = Sankey(ax = ax, scale = 1/40000, unit = 'kg', gap = .4, shoulder = 0.05,)
s.add(
flows = [3000, 20700, -23700,],
orientations = [ 1, 1, 0, ],
labels = ["S Value", "K Value", None, ],
trunklength = 1, pathlengths = 0.4, edgecolor = '#000000', facecolor = 'darkgreen',
lw = 0.5,
)
s.add(
flows = [23700, -800, -2400, -20500],
orientations = [0, 1, -1, 0],
labels = [None, "U Value", "Sp Value", None],
trunklength=1.5, pathlengths=0.5, edgecolor = '#000000', facecolor = 'grey',
prior = 0, connect = (2,0), lw = 0.5,
)
s.add(
flows = [20500, -20000, -500],
orientations = [0, -1, -1],
labels = [None, "H Value", "F Value"],
trunklength =1, pathlengths = 0.5, edgecolor = '#000000', facecolor = 'darkred',
prior = 1, connect = (3,0), lw = 0.5,
)
diagrams = s.finish()
for d in diagrams:
for t in d.texts:
t.set_horizontalalignment('left')
diagrams[0].texts[0].set_position(xy = [-0.58, 0.9,]) # S
diagrams[0].texts[1].set_position(xy = [-1.5, 0.9,]) # K
diagrams[2].texts[1].set_position(xy = [ 2.35, -1.2,]) # H
diagrams[2].texts[2].set_position(xy = [ 1.75, -1.2,]) # F
diagrams[1].texts[2].set_position(xy = [ 0.7, -1.2]) # Sp
diagrams[1].texts[1].set_position(xy = [ 0.7, 0.9,]) # U
# print(diagrams[0].texts[0])
# print(diagrams[0].texts[1])
# print(diagrams[1].texts[0])
# print(diagrams[1].texts[1])
# print(diagrams[1].texts[2])
# print(diagrams[2].texts[0])
# print(diagrams[2].texts[1])
# print(diagrams[2].texts[2])
plt.axis("off")
plt.show()

How to format slider

I have a slider:
time_ax = fig.add_axes([0.1, 0.05, 0.8, 0.03])
var_time = Slider(time_ax, 'Time', 0, 100, valinit=10, valfmt='%0.0f')
var_time.on_changed(update)
and I want to customize the appearance of this slider:
I can add axisbg parameter to add_axes function, which will change default white background to assigned color, but that's all I see possible for now.
So, how to change other slider components:
silder border (default: black)
default value indicator (default: red)
slider progress (default: blue)
The slider border is just the spines of the Axes instance. The progress bar can be directly accessed for basic customization in the constructor, and the initial status indicator is an attribute of the slider. I was able to change all of those things:
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
fig = plt.figure()
time_ax = fig.add_axes([0.1, 0.05, 0.8, 0.03])
# Facecolor and edgecolor control the slider itself
var_time = Slider(time_ax, 'Time', 0, 100, valinit=10, valfmt='%0.0f',
facecolor='c', edgecolor='r')
# The vline attribute controls the initial value line
var_time.vline.set_color('blue')
# The spines of the axis control the borders
time_ax.spines['left'].set_color('magenta')
time_ax.spines['right'].set_color('magenta')
time_ax.spines['bottom'].set_color('magenta')
time_ax.spines['top'].set_color('magenta')
The color of the box you can change when you define the axis of the "ax" box:
fig, ax = plt.subplots()
plt.subplots_adjust(left=0.25, bottom=0.25)
t = np.arange(0.0, 1.0, 0.001)
a0 = 5
f0 = 3
s = a0*np.sin(2*np.pi*f0*t)
l, = plt.plot(t,s, lw=2, color='red')
plt.axis([0, 1, -10, 10])
axcolor = 'lightgoldenrodyellow'
axfreq = plt.axes([0.03, 0.25, 0.03, 0.65], axisbg=axcolor)
axamp = plt.axes([0.08, 0.25, 0.03, 0.65], axisbg=axcolor)
sfreq = Slider(axfreq, 'Freq', 0.1, 30.0, valinit=f0)
samp = Slider(axamp, 'Amp', 0.1, 10.0, valinit=a0)
# The vline attribute controls the initial value line
samp.vline.set_color('green')