What am I doing wrong with order()? - dataframe

I have a dataframe:
p.value
gene
2
0.302436613525335
UBE2Q2P2;100134869
3
0.15026618422578
SDR16C6P;442388
4
0.747366468058889
GTPBP6;8225
5
0.694564330199746
EFCAB12;90288
I want to order it by p.value descendingly. I ran the code
df<-df[order(-p.value),] and got error message Error in order(p.value) : object 'p.value' not found
When I tried doing that for mtcars with column 'mpg' it did work. I just changed the variables' names, so I really am perplexed.
Thanks guys

df <-
structure(list(p.value = c(0.302436613525335, 0.15026618422578,
0.747366468058889, 0.694564330199746),
gene = c("UBE2Q2P2;100134869",
"SDR16C6P;442388", "GTPBP6;8225", "EFCAB12;90288")),
row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
You have to use df$ to access p.value
df[order(-df$p.value),]
A dplyr alternative
dplyr::arrange(df, -p.value)

Related

tidyverse across where(!is.factor)?

I would like to create factor variables for all non-factor columns. I tried:
dat %>%
mutate(across(where(!is.factor), as.factor, .names = "{.col}_factor"))
But get error message:
Error in `mutate()`:
! Problem while computing `..1 = across(where(!is.factor), as.factor, .names = "{.col}_factor")`.
Caused by error in `across()`:
! invalid argument type
Run `rlang::last_error()` to see where the error occurred.
the where() function needs to be written as a formula, which in tidyverse shorthand is:
dat %>%
mutate(across(where(~!is.factor(.x)), as.factor, .names = "{.col}_factor"))

How to have partially italicized columns in pdf output?

This question is related to Creating a data frame that produces partially italicized cells with pkg:sjPlot functions
I'd like to have partially italicized cells in a kable. I have tried
library(tidyverse); library(kableExtra)
sum_dat_final2 <- list(Site = c("Hanauma Bay", "Hanauma Bay", "Hanauma Bay", "Waikiki", "Waikiki", "Waikiki"),
Coral_taxon = expression( italic(Montipora)~ spp.,
italic(Pocillopora)~spp.,
italic(Porites)~spp.,
italic(Montipora)~ spp.,
italic(Pocillopora)~spp.,
italic(Porites)~spp.))
sum_dat_final2 %>%
as.data.frame()%>%
kbl(longtable = F, "latex")
and got this error Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ‘"expression"’ to a data.frame
Many thanks in advance!!
You may italicize specific parts by adding $. In this sense, you need to set escape = F on your kbl function.
```{r}
library(tidyverse); library(kableExtra)
sum_dat_final2 <- list(Site = c("Hanauma Bay", "Hanauma Bay", "Hanauma Bay", "Waikiki", "Waikiki", "Waikiki"),
"Coral_taxon" = c("$Montipora$$~$ spp.",
"$Pocillopora$$~$spp.",
"$Porites$$~$spp.",
"$Montipora$$~$ spp.",
"$Pocillopora$$~$spp.",
"$Porites$$~$spp."))
sum_dat_final2 %>%
as.data.frame()%>%
kbl(longtable = F, "latex",
escape = F,
col.names = c("Site", "Coral taxonomie"))
```
--output

How to method chain .agg() and .assign() functions in Pandas

I am looking to replicate this Dplyr query in Pandas but am having trouble chaining the the .agg() and .assign() functions together, and would be so grateful for any advice
Dplyr code:
counties_selected %>%
group_by(state) %>%
summarize(total_area = sum(land_area),
total_population = sum(population)) %>%
mutate(density = total_population / total_area) %>%
arrange(desc(density))
Attempt at the same in Pandas:
Within the .assign() part I am redirecting the variable back into the original dataframe, but nothing else works
counties.\
groupby('state').\
agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum')).\
reset_index().\
assign(density = counties['total_population'] / counties['total_area']).\
arrange('density', ascending = False).\
head()
Problem is you need lambda for processing chained data, alreday processing in previous chained methods:
assign(density = counties['total_population'] / counties['total_area'])
to:
assign(density = lambda x: x['total_population'] / x['total_area'])
Another problem is for sorting is used instead:
arrange('density', ascending = False)
method DataFrame.sort_values:
sort_values('density', ascending = False):
All together, . is used to start of methods like:
df = (counties.groupby('state')
.agg(total_area = ('land_area', 'sum'),
total_population = ('population', 'sum'))
.reset_index()
.assign(density = lambda x: x['total_population'] / x['total_area'])
.sort_values('density', ascending = False)
.head())
With datar, it is easy to port your dplyr code to python code, without learning pandas APIs:
from datar.all import f, group_by, summarize, sum, mutate, arrange, desc
counties_selected >> \
group_by(f.state) >> \
summarize(total_area = sum(f.land_area),
total_population = sum(f.population)) >> \
mutate(density = f.total_population / f.total_area) >> \
arrange(desc(f.density))
I am the author of the package. Feel free to submit issues if you have any questions.

Issue when trying to plot geom_tile using ggplotly

I would like to plot a ggplot2 image using ggplotly
What I am trying to do is to initially plot rectangles of grey fill without any aesthetic mapping, and then in a second step to plot tiles and change colors based on aesthetics. My code is working when I use ggplot but crashes when I try to use ggplotly to transform my graph into interactive
Here is a sample code
library(ggplot2)
library(data.table)
library(plotly)
library(dplyr)
x = rep(c("1", "2", "3"), 3)
y = rep(c("K", "B","A"), each=3)
z = sample(c(NA,"A","L"), 9,replace = TRUE)
df <- data.table(x,y,z)
p<-ggplot(df)+
geom_tile(aes(x=x,y=y),width=0.9,height=0.9,fill="grey")
p<-p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p
But when I type this
ggplotly(p)
I get the following error
Error in [.data.frame(g, , c("fill_plotlyDomain", "fill")) :
undefined columns selected
The versions I use are
> packageVersion("plotly")
1 ‘4.7.1
packageVersion("ggplot2")
1 ‘2.2.1.9000’
##########Edited example for Arthur
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill="G"),width=0.9,height=0.9)
p<- p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p<-p+ scale_fill_manual(
guide = guide_legend(title = "test",
override.aes = list(
fill =c("red","white") )
),
values = c("red","grey"),
labels=c("A",""))
p
This works
but ggplotly(p) adds the grey bar labeled G in the legend
The output of the ggplotly function is a list with the plotly class. It gets printed as Plotly graph but you can still work with it as a list. Moreover, the documentation indicates that modifying the list makes it possible to clear all or part of the legend. One only has to understand how the data is structured.
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
p2 <- ggplotly(p)
str(p2)
The global legend is here in p2$x$layout$showlegend and setting this to false displays no legend at all.
The group-specific legend appears at each of the 9 p2$x$data elements each time in an other showlegend attribute. Only 3 of them are set to TRUE, corresponding to the 3 keys in the legend. The following loop thus clears all the undesired labels:
for(i in seq_along(p2$x$data)){
if(p2$x$data[[i]]$legendgroup!='A'){
p2$x$data[[i]]$showlegend <- FALSE
}
}
Voilà!
This works here:
ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
ggplotly(p)
I guess your problem comes from the use of 2 different data sources, df and filter(df,z=="A"), with columns with the same name.
[Note this is not an Answer Yet]
(Putting for reference, as it is beyond the limits for comments.)
The problem is rather complicated.
I just finished debugging the code of plotly. It seems like it's occurring here.
I have opened an issue in GitHub
Here is the minimal code for the reproduction of the problem.
library(ggplot2)
set.seed(1503)
df <- data.frame(x = rep(1:3, 3),
y = rep(1:3, 3),
z = sample(c("A","B"), 9,replace = TRUE),
stringsAsFactors = F)
p1 <- ggplot(df)+
geom_tile(aes(x=x,y=y, fill="grey"), color = "black")
p2 <- ggplot(df)+
geom_tile(aes(x=x,y=y),fill="grey", color = "black")
class(plotly::ggplotly(p1))
#> [1] "plotly" "htmlwidget"
class(plotly::ggplotly(p2))
#> Error in `[.data.frame`(g, , c("fill_plotlyDomain", "fill")): undefined columns selected

Concatenate DataFrames.DataFrame in Julia

I have a problem when I try to concatenate multiple DataFrames (a datastructure from the DataFrames package!) with the same columns but different row numbers. Here's my code:
using(DataFrames)
DF = DataFrame()
DF[:x1] = 1:1000
DF[:x2] = rand(1000)
DF[:time] = append!( [0] , cumsum( diff(DF[:x1]).<0 ) ) + 1
DF1 = DF[DF[:time] .==1,:]
DF2 = DF[DF[:time] .==round(maximum(DF[:time])),:]
DF3 = DF[DF[:time] .==round(maximum(DF[:time])/4),:]
DF4 = DF[DF[:time] .==round(maximum(DF[:time])/2),:]
DF1[:T] = "initial"
DF2[:T] = "final"
DF3[:T] = "1/4"
DF4[:T] = "1/2"
DF = [DF1;DF2;DF3;DF4]
The last line gives me the error
MethodError: Cannot `convert` an object of type DataFrames.DataFrame to an object of type LastMain.LastMain.LastMain.DataFrames.AbstractDataFrame
This may have arisen from a call to the constructor LastMain.LastMain.LastMain.DataFrames.AbstractDataFrame(...),
since type constructors fall back to convert methods.
I don't understand this error message. Can you help me out? Thanks!
I just ran into this exact problem on Julia 0.5.0 x86_64-linux-gnu, DataFrames 0.8.5, with both hcat and vcat.
Neither clearing the workspace nor reloading DataFrames solved the problem, but restarting the REPL fixed it immediately.