Statistical Tests in R - sql

I want to run Bonferroni P Adjusted Value Test on a stacked data set.
This is my code:
stat.2 <- stack.2 %>%
group_by(modules) %>%
t_test(values ~ phenotype) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance("p.adj")
The error which I'm facing is the following:
Error in mutate():
! Problem while computing data = map(.data$data, .f, ...).
Caused by error in t.test.default():
! not enough 'y' observations
Run rlang::last_error() to see where the error occurred.
Here's the data which I'm working on:

First I created reproducible data:
df <- data.frame(phenotype = c("Mesenchymal", "Classical", "Classical", "Mesenchymal", "Proneural", "Mesenchymal", "Proneural", "Messenchymal", "Messenchymal", "Classical", "Mesenchymal"),
values = runif(11, 0, 1),
modules = rep("MEmaroon", 11))
You can use this code:
library(dplyr)
library(rstatix)
df %>%
group_by(modules) %>%
t_test(values ~ phenotype) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance("p.adj")
Output:
# A tibble: 6 × 11
modules .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
<chr> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 MEmaroon values Classical Mesench… 3 4 0.668 4.25 0.538 1 ns
2 MEmaroon values Classical Messenc… 3 2 0.361 1.48 0.763 1 ns
3 MEmaroon values Classical Proneur… 3 2 -0.0161 2.90 0.988 1 ns
4 MEmaroon values Mesenchymal Messenc… 4 2 -0.0136 1.32 0.991 1 ns
5 MEmaroon values Mesenchymal Proneur… 4 2 -0.749 2.84 0.511 1 ns
6 MEmaroon values Messenchymal Proneur… 2 2 -0.380 1.33 0.756 1 ns

Related

How to melt a dataframe with tidyverse, and create a new column

I have pet survey data from 6 households.
The households are split into levels (a,b).
I would like to melt the dataframe by aminal name (id.var), household (var.name), abundance (value.name), whilst adding a new column ("level") for the levels a&b.
My dataframe looks like this:
pet abundance data
I can split it using reshape2:melt, but I don't know how to cut the a, b, from the column names and make a new column of them? Please help.
raw_data = as.dataframe(raw_data)
melt(raw_data,
id.variable = 'Animal', variable.name = 'Site', value.name = 'Abundance')
Having a go on some simulated data, pivot_longer is your best bet:
library(tidyverse)
df <- tibble(
Animal = c("dog", "cat", "fish", "horse"),
`1a` = sample(1:10, 4),
`1b` = sample(1:10, 4),
`2a` = sample(1:10, 4),
`2b` = sample(1:10, 4),
`3a` = sample(1:10, 4),
`3b` = sample(1:10, 4)
)
df |>
pivot_longer(
-Animal,
names_to = c("Site", "level"),
values_to = "Abundance",
names_pattern = "(.)(.)"
) |>
arrange(Site, level)
#> # A tibble: 24 × 4
#> Animal Site level Abundance
#> <chr> <chr> <chr> <int>
#> 1 dog 1 a 9
#> 2 cat 1 a 5
#> 3 fish 1 a 8
#> 4 horse 1 a 6
#> 5 dog 1 b 4
#> 6 cat 1 b 2
#> 7 fish 1 b 8
#> 8 horse 1 b 10
#> 9 dog 2 a 8
#> 10 cat 2 a 3
#> # … with 14 more rows

broom::tidy.lm() -- how to set number of digits?

I'm trying to use broom::tidy() to extract the ANOVA summary table for linear models, but for a better display, particularly for multivariate linear models.
I can't find a way to control the number of decimal digits that appear in the result for the sums of squares and statistics.
Here is an example, with a simple lm(): Note that I can get more digits in the output with print(aov1, digits=) [Sorry: reprex isn't working on my machine.]
> data("mtcars")
>
> mtcars$cyl <- factor(mtcars$cyl) # make factors
> mtcars$am <- factor(mtcars$am)
>
> aov1 <- anova(lm(mpg ~ cyl + am, data=mtcars))
> aov1
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 2 824.78 412.39 43.6566 2.477e-09 ***
am 1 36.77 36.77 3.8922 0.05846 .
Residuals 28 264.50 9.45
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> print(aov1, digits=10) # print with more digits
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
cyl 2 824.7845901 412.3922950 43.65661 2.4769e-09 ***
am 1 36.7669195 36.7669195 3.89221 0.058457 .
Residuals 28 264.4956779 9.4462742
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But tidy() doesn't appear to have any way to control the number of digits that I can find...
> tidy(aov1)
# A tibble: 3 x 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 cyl 2 825. 412. 43.7 2.48e-9
2 am 1 36.8 36.8 3.89 5.85e-2
3 Residuals 28 264. 9.45 NA NA
>
> tidy(aov1, digits=7)
# A tibble: 3 x 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 cyl 2 825. 412. 43.7 2.48e-9
2 am 1 36.8 36.8 3.89 5.85e-2
3 Residuals 28 264. 9.45 NA NA
>
> print(tidy(aov1), digits=7)
# A tibble: 3 x 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 cyl 2 825. 412. 43.7 2.48e-9
2 am 1 36.8 36.8 3.89 5.85e-2
3 Residuals 28 264. 9.45 NA NA
My goal is actually more general: to extract the univariate tests for each response in a multivariate linear model.
> data(NeuroCog, package="heplots")
> NC.mlm <- lm(cbind( Speed, Attention, Memory, Verbal, Visual, ProbSolv) ~ Dx,
+ data=NeuroCog)
> car::Anova(NC.mlm)
Type II MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
Dx 2 0.2992 6.8902 12 470 1.562e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I can do this by reshaping the data to long format and group_by(response) followed by some tidy processing, but I want more digits in the values for the SS and F statistics.
> #' Reshape from wide to long
> NC_long <- NeuroCog |>
+ select(-SocialCog, -Age, -Sex) |>
+ tidyr::gather(key = response, value = "value", Speed:ProbSolv)
>
> NC_long |>
+ mutate(response = factor(response, levels=unique(response))) |> # keep variable order
+ group_by(response) |>
+ do(tidy(anova(lm(value ~ Dx, .)))) |>
+ filter(term != "Residuals") |>
+ select(-term) |>
+ rename(F = statistic, df1 = df,
+ SS = sumsq, MS =meansq) |>
+ mutate(df2 = 239) |> # kludge: extract dfe from object?
+ relocate(df2, .after = df1) |>
+ mutate(signif = noquote(gtools::stars.pval(p.value))) |>
+ mutate(p.value = noquote(scales::pvalue(p.value)))
# A tibble: 6 x 8
# Groups: response [6]
response df1 df2 SS MS F p.value signif
<fct> <int> <dbl> <dbl> <dbl> <dbl> <noquote> <noquote>
1 Speed 2 239 8360. 4180. 37.1 <0.001 ***
2 Attention 2 239 5579. 2790. 17.4 <0.001 ***
3 Memory 2 239 3764. 1882. 13.9 <0.001 ***
4 Verbal 2 239 4672. 2336. 27.3 <0.001 ***
5 Visual 2 239 3692. 1846. 16.6 <0.001 ***
6 ProbSolv 2 239 4165. 2083. 25.1 <0.001 ***

Iteratively get the max of a data frame column, add one and repeat for all rows in r

I need to perform a database operation where I'll be adding new data to an existing table and then assigning the new rows a unique id. I'm asking about this in R so I can get the logic straight before I attempt to rewrite it in sql or pyspark.
Imagine that I've already added the new data to the existing data. Here's a simplified version of what it might look like:
library(tidyverse)
df <- tibble(id = c(1, 2, 3, NA, NA),
descriptions = c("dodgers", "yankees","giants", "orioles", "mets"))
# A tibble: 5 x 2
id descriptions
<dbl> <chr>
1 1 dodgers
2 2 yankees
3 3 giants
4 NA orioles
5 NA mets
What I want is:
# A tibble: 5 x 2
id descriptions
<dbl> <chr>
1 1 dodgers
2 2 yankees
3 3 giants
4 4 orioles
5 5 mets
An I can't use arrange with rowid_to_columns id's be deleted.
To get a unique id for the NA rows while not changing the existing ones, I want to get the max of the id column, add one, replace NA with that value and then move to the next row. My instinct was to do something like this: df %>% mutate(new_id = max(id, na.rm = TRUE) + 1) but that only get's the max plus one, not a new max for each row. I feel like I could do this with a mapping function but what I've tried returns a result identical to the input dataframe:
df %>%
mutate(id = ifelse(is.na(id),
map_dbl(id, ~max(.) + 1, na.rm = FALSE),
id))
# A tibble: 5 x 2
id descriptions
<dbl> <chr>
1 1 dodgers
2 2 yankees
3 3 giants
4 NA orioles
5 NA mets
Thanks in advance--now if someone can help me directly in sql, that's also a plus!
SQL option, using sqldf for demo:
sqldf::sqldf("
with cte as (
select max(id) as maxid from df
)
select cte.maxid + row_number() over () as id, df.descriptions
from df
left join cte where df.id is null
union
select * from df where id is not null")
# id descriptions
# 1 1 dodgers
# 2 2 yankees
# 3 3 giants
# 4 4 orioles
# 5 5 mets
Here is one method where we add the max value with the cumulative sum of logical vector based on the NA values and coalesce with the original column 'id'
library(dplyr)
df <- df %>%
mutate(id = coalesce(id, max(id, na.rm = TRUE) + cumsum(is.na(id))))
-output
df
# A tibble: 5 x 2
id descriptions
<dbl> <chr>
1 1 dodgers
2 2 yankees
3 3 giants
4 4 orioles
5 5 mets

R summarise_at dynamically by condition : mean for some columns, sum for others

I would like that but with the conditions in the summarise_at()
Edit #1: I've added the word dynamically in the title: When I use vars(c()) in the summarise_at() it's for fast and clear examples, but in fact it's for use contains(), starts_with() and matches(,, perl=TRUE), because I have 50 columns, with many sum() and some mean().
And the goal is for generate dynamic SQL with tbl()..%>% group_by() ... %>% summarise_at()...%>% collect().
Edit #2: I added example with SQL generated in my second example
library(tidyverse)
(mtcars
%>% group_by(carb)
%>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
# I don't want this line below, I would like a conditional in summarise_at() because I have 50 columns in my real case
%>% select(carb,cyl_mean,disp_mean,mpg_sum)
)
#> # A tibble: 6 x 4
#> carb cyl_mean disp_mean mpg_sum
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 4.57 134. 177.
#> 2 2 5.6 208. 224
#> 3 3 8 276. 48.9
#> 4 4 7.2 309. 158.
#> 5 6 6 145 19.7
#> 6 8 8 301 15
Created on 2020-02-19 by the reprex package (v0.3.0)
This works, but I want only sum for mpg, and only mean for cyl and disp:
library(RSQLite)
library(dbplyr)
library(tidyverse)
library(DBI)
db <- dbConnect(SQLite(),":memory:")
dbCreateTable(db, "mtcars_table", mtcars)
(tbl( db, build_sql( con=db,"select * from mtcars_table" ))
%>% group_by(carb)
%>% summarise_at(vars(c("mpg","cyl","disp")), list (~mean(.),~sum(.)))
%>% select(carb,cyl_mean,disp_mean,mpg_sum)
%>% show_query()
)
#> <SQL>
#> Warning: Missing values are always removed in SQL.[...] to silence this warning
#> SELECT `carb`, `cyl_mean`, `disp_mean`, `mpg_sum`
#> FROM (SELECT `carb`, AVG(`mpg`) AS `mpg_mean`, AVG(`cyl`) AS `cyl_mean`, AVG(`disp`) AS `disp_mean`, SUM(`mpg`) AS `mpg_sum`, SUM(`cyl`) AS `cyl_sum`, SUM(`disp`) AS `disp_sum`
#> FROM (select * from mtcars_table)
#> GROUP BY `carb`)
#> # Source: lazy query [?? x 4]
#> # Database: sqlite 3.30.1 [:memory:]
#> # … with 4 variables: carb <dbl>, cyl_mean <lgl>, disp_mean <lgl>,
#> # mpg_sum <lgl>
I tried all possibilities like that but it doesn't work or it produces error.
(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse(vars(contains(names(.),"mpg")),list(sum(.)),list(mean(.)))) )
Not good, too many columns
library(tidyverse)
(mtcars %>% group_by(carb)%>% summarise_at(vars(c("mpg","cyl","disp")),ifelse ((names(.)=="mpg"), list(~sum(.)) , list(~mean(.)))))
#> # A tibble: 6 x 34
#> carb mpg_sum cyl_sum disp_sum mpg_mean..2 cyl_mean..2 disp_mean..2
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 177. 32 940. 25.3 4.57 134.
#> 2 2 224 56 2082. 22.4 5.6 208.
#> 3 3 48.9 24 827. 16.3 8 276.
#> 4 4 158. 72 3088. 15.8 7.2 309.
#> 5 6 19.7 6 145 19.7 6 145
#> 6 8 15 8 301 15 8 301
#> # … with 27 more variables: mpg_mean..3 <dbl>, cyl_mean..3 <dbl>,
#> # disp_mean..3 <dbl>, mpg_mean..4 <dbl>, cyl_mean..4 <dbl>,
#> # disp_mean..4 <dbl>, mpg_mean..5 <dbl>, cyl_mean..5 <dbl>,
#> # disp_mean..5 <dbl>, mpg_mean..6 <dbl>, cyl_mean..6 <dbl>,
#> # disp_mean..6 <dbl>, mpg_mean..7 <dbl>, cyl_mean..7 <dbl>,
#> # disp_mean..7 <dbl>, mpg_mean..8 <dbl>, cyl_mean..8 <dbl>,
#> # disp_mean..8 <dbl>, mpg_mean..9 <dbl>, cyl_mean..9 <dbl>,
#> # disp_mean..9 <dbl>, mpg_mean..10 <dbl>, cyl_mean..10 <dbl>,
#> # disp_mean..10 <dbl>, mpg_mean..11 <dbl>, cyl_mean..11 <dbl>,
#> # disp_mean..11 <dbl>
Some other attempts and remarks: I would like conditional sum(.) or mean(.) depending of the name of the column in the summarise().
It could be good if it accepts not only primitive functions.
At then end it's for tbl()..%>% group_by() ... %>% summarise_at()...%>% collect() to generate conditional SQL with AVG() and SUM().
T-SQL function like ~(convert(varchar()) works for mutate_at() and similar ~AVG()works for summarise_at() but I arrive at the same point: conditional summarise_at() doesn't work depending of name of columns.
:)
An option is to group_by the 'carb', and then create the sum of 'mpg' as another grouping variable and then use summarise_at with the rest of the variables needed
library(dplyr)
mtcars %>%
group_by(carb) %>%
group_by(mpg_sum = sum(mpg), .add = TRUE) %>%
summarise_at(vars(cyl, disp), list(mean = mean))
# A tibble: 6 x 4
# Groups: carb [6]
# carb mpg_sum cyl_mean disp_mean
# <dbl> <dbl> <dbl> <dbl>
#1 1 177. 4.57 134.
#2 2 224 5.6 208.
#3 3 48.9 8 276.
#4 4 158. 7.2 309.
#5 6 19.7 6 145
#6 8 15 8 301
Or using the devel version of dplyr, this can be done in a single summarise by wrapping the blocks of columns in across and the single column by themselves and apply different functions on it
mtcars %>%
group_by(carb) %>%
summarise(across(one_of(c("cyl", "disp")), list(mean = mean)),
mpg_sum = sum(mpg))
# A tibble: 6 x 4
# carb cyl_mean disp_mean mpg_sum
# <dbl> <dbl> <dbl> <dbl>
#1 1 4.57 134. 177.
#2 2 5.6 208. 224
#3 3 8 276. 48.9
#4 4 7.2 309. 158.
#5 6 6 145 19.7
#6 8 8 301 15
NOTE: summarise_at/summarise_if/mutate_at/mutate_if/... etc. will be superseded by the across verb with the default functions (summarise/mutate/filter/...) in the upcoming releases
workaround waiting across() with regex
library(RSQLite)
library(dbplyr)
library(tidyverse)
library(DBI)
db <- dbConnect(SQLite())
mtcars_table <- mtcars %>% rename(mpg_sum=mpg,cyl_mean=cyl,disp_mean=disp )
RSQLite::dbWriteTable(db, "mtcars_table", mtcars_table)
req<-as.character((tbl( db, build_sql( con=db,"select * from mtcars_table" ))
%>% group_by(carb)
%>% summarise_at(vars(c(ends_with("mean"), ends_with("sum")) ), ~sum(.))
) %>% sql_render())
#> Warning: Missing values are always removed in SQL.
#> Use `SUM(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
req<-gsub("(SUM)(\\(.{1,30}mean.{1,10}\\))", "AVG\\2", req, perl=TRUE)
print(req)
#> [1] "SELECT `carb`, AVG(`cyl_mean`) AS `cyl_mean`, AVG(`disp_mean`) AS `disp_mean`,
# SUM(`mpg_sum`) AS `mpg_sum`\nFROM (select * from mtcars_table)\n
# GROUP BY `carb`"
dbGetQuery(db, req)
#> carb cyl_mean disp_mean mpg_sum
#> 1 1 4.571429 134.2714 177.4
#> 2 2 5.600000 208.1600 224.0
#> 3 3 8.000000 275.8000 48.9
#> 4 4 7.200000 308.8200 157.9
#> 5 6 6.000000 145.0000 19.7
#> 6 8 8.000000 301.0000 15.0
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DBI_1.1.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3
[6] readr_1.3.1 tidyr_1.0.2 tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0
[11] dbplyr_1.4.2 RSQLite_2.2.0
loaded via a namespace (and not attached):
[1] xfun_0.10 tidyselect_1.0.0 haven_2.2.0 lattice_0.20-38 colorspace_1.4-1
[6] vctrs_0.2.2 generics_0.0.2 htmltools_0.4.0 blob_1.2.1 rlang_0.4.4
[11] pillar_1.4.3 glue_1.3.1 withr_2.1.2 bit64_0.9-7 modelr_0.1.5
[16] readxl_1.3.1 lifecycle_0.1.0 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[21] rvest_0.3.5 memoise_1.1.0 evaluate_0.14 knitr_1.25 callr_3.3.2
[26] ps_1.3.0 fansi_0.4.1 broom_0.5.2 Rcpp_1.0.3 clipr_0.7.0
[31] scales_1.1.0 backports_1.1.5 jsonlite_1.6.1 fs_1.3.1 bit_1.1-15.1
[36] hms_0.5.3 digest_0.6.23 stringi_1.4.5 processx_3.4.1 grid_3.6.1
[41] cli_2.0.1 tools_3.6.1 magrittr_1.5 lazyeval_0.2.2 whisker_0.4
[46] crayon_1.3.4 pkgconfig_2.0.3 xml2_1.2.2 reprex_0.3.0 lubridate_1.7.4
[51] assertthat_0.2.1 rmarkdown_1.16 httr_1.4.1 rstudioapi_0.10 R6_2.4.1
[56] nlme_3.1-141 compiler_3.6.1

dbplyr, dplyr, and functions with no SQL equivalents [eg `slice()`]

library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")
I can create this mock SQL database above. And it's very cool that I can perform standard dplyr functions on this "database":
mtcars2 %>%
group_by(cyl) %>%
summarise(mpg = mean(mpg, na.rm = TRUE)) %>%
arrange(desc(mpg))
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.29.0 [:memory:]
#> # Ordered by: desc(mpg)
#> cyl mpg
#> <dbl> <dbl>
#> 1 4 26.7
#> 2 6 19.7
#> 3 8 15.1
It appears I'm unable to use dplyr functions that have no direct SQL equivalents, (eg dplyr::slice()). In the case of slice() I can use the alternative combination of filter() and row_number() to get the same results as just using slice(). But what happens when there's not such an easy workaround?
mtcars2 %>% slice(1:5)
#>Error in UseMethod("slice_") :
#> no applicable method for 'slice_' applied to an object of class
#> "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"
When dplyr functions have no direct SQL equivalents can I force their use with dbplyr, or is the only option to get creative with dplyr verbs that do have SQL equivalents, or just write the SQL directly (which is not my preferred solution)?
I understood this question: How can I make slice() work for SQL databases? This is different from "forcing their use" but still might work in your case.
The example below shows how to implement a "poor man's" variant of slice() that works on the database. We still need to do the legwork and implement it with verbs that work on the database, but then we can use it similarly to data frames.
Read more about S3 classes in http://adv-r.had.co.nz/OO-essentials.html#s3.
library(tidyverse)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")
# mtcars2 has a class attribute
class(mtcars2)
#> [1] "tbl_SQLiteConnection" "tbl_dbi" "tbl_sql"
#> [4] "tbl_lazy" "tbl"
# slice() is an S3 method
slice
#> function(.data, ..., .preserve = FALSE) {
#> UseMethod("slice")
#> }
#> <bytecode: 0x560a03460548>
#> <environment: namespace:dplyr>
# we can implement a "poor man's" variant of slice()
# for the particular class. (It doesn't work quite the same
# in all cases.)
#' #export
slice.tbl_sql <- function(.data, ...) {
rows <- c(...)
.data %>%
mutate(...row_id = row_number()) %>%
filter(...row_id %in% !!rows) %>%
select(-...row_id)
}
mtcars2 %>%
slice(1:5)
#> # Source: lazy query [?? x 11]
#> # Database: sqlite 3.29.0 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
Created on 2019-12-07 by the reprex package (v0.3.0)