R Shiny: Build an interactive SQL query and copy data into global environment - sql

I try to build a shiny app that enables users to query data. Users are supposed to provide a list of values of interest that are used to filter data stored in a database. Unfortunately, dplyr's translation does not appear to be the most efficient/performant solution such that I have to build a string and pass it to the database via sql(). The string manipulations are also a crude fail safe to handle entry errors/varieties. The following code illustrates this string building and data query process:
library(tibble)
library(dplyr)
library(dbplyr)
library(shiny)
library(RSQLite)
library(DBI)
library(stringr)
# Create string variables to experiment
mtcars1 <- tibble::rownames_to_column(mtcars, var = "model") %>% # Create column based on row names
dplyr::mutate(carmaker = stringr::word(model, 1)) # Create column with first word of column with row names
# Establish example database
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars1, "mtcars1", temporary = FALSE)
# Query example
# Input
string_input <- "Mazda, Merc"
# Prepare input string to be used in SQL
string_filter <- string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
# Data query
data <- tbl(con, "mtcars1") %>%
filter(sql(string_filter)) %>%
show_query() %>%
collect()
I'd like to implement this code in a shiny app:
# Shiny user interface
ui <- fluidPage(
textInput(inputId = "string_input", label = "Input", value = "", placeholder = "Enter list of car models without commas"),
actionButton(inputId = "go", label = "Go"),
textOutput(outputId = "string_output")
)
# Shiny server function
server <- function(input, output){
observeEvent(input$go, {
output$string_output <- reactive({input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
})
}
# Launch shiny app
shinyApp(ui, server)
The app takes to list of values supplied by the user as input, transforms it and shows the transformed list as output.
Here is what I want to do:
I would like store the transformed string_input in an extra local object for further use in the app, that is, I want to pass the string_input to the data query similar to the non-shiny example above.
I would like to copy the data query result to R's global environment such that I can use it, even after closing the app.
Regarding the my second point: I read that one can use <- and <<-, but I could not make it work in a reactive context.

I figured out the solution:
library(tibble)
library(dplyr)
library(dbplyr)
library(shiny)
library(RSQLite)
library(DBI)
library(stringr)
# Create string variables to experiment
mtcars1 <- tibble::rownames_to_column(mtcars, var = "model") %>% # Create column based on row names
dplyr::mutate(carmaker = stringr::word(model, 1)) # Create column with first word of column with row names
# Establish example database
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars1, "mtcars1", temporary = FALSE)
# Shiny user interface
ui <- fluidPage(
textInput(inputId = "string_input",
label = "Input",
value = "",
placeholder = "Enter a list of car makers (e.g. Mazda, Merc)"),
textOutput(outputId = "string_output"),
actionButton(inputId = "go", label = "Go"),
tableOutput(outputId = "data_output")
)
# Custom function to save reactive object to global environment
saveData <- function(x) {
export <<- x
}
# Shiny server function
server <- function(input, output){
list <- reactive({
input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
output$string_output <- reactive({list()})
data <- eventReactive(input$go, {
tbl(con, "mtcars1") %>%
dplyr::filter(sql(!!list())) %>%
dplyr::collect()
})
output$data_output <- renderTable(data())
observeEvent(input$go, {
saveData(data())
})
}
# Launch shiny app
shinyApp(ui, server)
The trick was to define the function saveData, pass the reactive data object to it and assign it to export via <<-.
Honestly, I do not understand all of the fundamentals so any suggestions for improvement are welcome. However, it works.

For your first question:
# Shiny server function
server <- function(input, output){
string_output <- eventReactive(input$go, {
input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
output$string_output <-renderText(string_output())
}
string_output() reactive function is now available for output as well as for data query.
Note that you could also use input$string_input instead of input$goas trigger to update the output while you type the criteria.
You can then use input$go to query the data:
data <- eventReactive(input$go, { dbGetQuery(yourConnection,YourQuery(string_output())})
output$data <- renderTable(data())
Not sure you can directly write from Shiny to R's environment, but you can for sure save data() as a file on the server.

Related

Creating a function that returns a new dataframe that can be saved- but will not rewrite the new dataframe each time it runs

I want to create a function that takes a dataframe, makes changes, and creates a new dataframe. I want to apply this code to 200+ datasets. How do I apply a function that will save a new dataframe- but not continuously overwrite it?
Here is my code:
for (i in KO_AFEPSI) {
i <- i[i$AFEPSI != 1,]
i <- separate(i,exon,into=c("chromosome", "start", "end"))
i_plus <- subset(i, strand == "+")
i_plus <- i_plus[order(i_plus$start),]
i_plus <- transform(i_plus, Order = ave(1:nrow(i_plus), gene,FUN = seq_along))
i_minus <- subset(i, strand == "-")
i_minus <- i_minus[order(i_minus$start, decreasing = TRUE),]
i_minus <- transform(i_minus, Order = ave(1:nrow(i_minus), gene,FUN = seq_along))
i <- bind_rows(i_plus, i_minus)
i_upstream_exon <- i %>% group_by(gene) %>% slice_min(Order)
i_downstream_exon <- i %>% group_by(gene) %>% slice_max(Order)
data <- merge(i_upstream_exon, i_downstream_exon, by = "gene", all = FALSE)
}
I basically want to save a new "data" for each dataframe I run through the function. How should i go about this??
Should I be using lapply?

Iterating to create tabs with gt in quarto

Something that is very handy is to iterate through a variable and then dynamically create tabs based on values of that variable (here homeworld). This works well with the results: asis chunk option. I can make that work but there is some strange interaction with the {gt} package whereby I can only make gt work with purrr::walk if I use gt::as_raw_html. However if I just produce a single table outside of purrr::walk I don't need gt::as_raw_html. Here is the error message I get {gt} does not work:
Error running filter
/Applications/quarto/share/filters/quarto-pre/quarto-pre.lua:
...lications/quarto/share/filters/quarto-pre/quarto-pre.lua:2410:
attempt to concatenate a nil value (local 'v') stack traceback:
...lications/quarto/share/filters/quarto-pre/quarto-pre.lua:2417: in
function
<...lications/quarto/share/filters/quarto-pre/quarto-pre.lua:2415>
Here is the quarto (quarto version 1.1.175) code to reproduce:
---
title: "Untitled"
format: html
execute:
warning: false
---
```{r r-pkgs}
library(dplyr)
library(glue)
library(gt)
library(purrr)
## just to simplify
starwars <- starwars %>%
filter(!is.na(sex))
```
# Does work
::: {.panel-tabset}
```{r}
#| results: asis
walk(
unique(starwars$sex), \(hw) {
cat(glue("## {hw} \n\n"))
starwars %>%
filter(sex == hw) %>%
count(homeworld) %>%
head() %>%
gt() %>%
as_raw_html() %>%
print()
cat("\n\n")
}
)
```
:::
# Does not work
::: {.panel-tabset}
```{r}
#| results: asis
#| eval: false
walk(
unique(starwars$sex), \(hw) {
cat(glue("## {hw} \n\n"))
starwars %>%
filter(sex == hw) %>%
count(homeworld) %>%
head() %>%
gt() %>%
print()
cat("\n\n")
}
)
```
:::
## single does work
```{r}
#| results: asis
starwars %>%
count(homeworld) %>%
head() %>%
gt()
```
Posting back answer from the duplicate issue in quarto-dev/quarto-cli#2370
About the issue
The behavior you see has to do with the print method used, and the iteration with walk()
When you use print() after gt() or after gt() %>% as_raw_html() it will not have the same effect, as the print method used will not be the same. In the context of knitr, this matters.
Using as_raw_html() makes sense to include table as raw HTML in such document, and it probably will have the same result as when gt object are printing in knitting to HTML table (though the use of htmltools). When you use gt() %>% print(), it will not use the correct printing method that is used when just gt() is in a chunk (the knit_print() method more on that here for advanced understanding.
More on how to create content dynamically with knitr
Let me add some context about knitr and dynamically created content.
Iterating to dynamically create content in knitr require to use the correct print method (usually knit_print()), and it is better to iterate on child content with knitr::knit_child() function that will correctly handle the printed output specific to sewing result in the document. We have some resource about that in R Markdown Cookbook that would apply to Quarto as well.
About knit_child() : https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html
About knit_expand() also : https://bookdown.org/yihui/rmarkdown-cookbook/knit-expand.html
As an example, this is how we recommend to dynamically create content when using knitr content so that R code result is correctly mixed with other content, specifically when the content to dynamically create is a mix of raw markdown, and R code results.
---
title: "Untitled"
format: html
execute:
warning: false
keep-md: true
---
```{r r-pkgs}
library(dplyr)
library(glue)
library(gt)
library(purrr)
## just to simplify
starwars <- starwars %>%
filter(!is.na(sex))
```
# Tables
::: {.panel-tabset}
```{r}
#| output: asis
res <- purrr::map_chr(unique(starwars$sex), \(hw) {
knitr::knit_child(text = c(
"## `r hw`",
"",
"```{r}",
"#| echo: false",
"starwars %>%",
" filter(sex == hw) %>%",
" count(homeworld) %>%",
" head() %>%",
" gt()",
"```",
"",
""
), envir = environment(), quiet = TRUE)
})
cat(res, sep = '\n')
```
:::
You could also put the child content in a separate file for easier writing as in the Cookbook, by writing what you would need as document for one value you are iterating
## `r hw`
```{r}
#| echo: false
starwars %>%
filter(sex == hw) %>%
count(homeworld) %>%
head() %>%
gt()
```
This is the safest way to mix markdown content (like ##) with R code output ( like Tables or Htmlwidgets) which are content you can't easily just cat() into the file.
See more in the issue

What is the preferred way to extract SQL from a tbl_dbi as a string?

I would like to extract the SQL from a tbl_dbi and save it as a character vector.
I have tried
library(dplyr)
my_tbl_dbi <- tbl(conn, "myTable")
my_query <- capture.output(show_query(my_tbl_dbi), type = message) %>%
paste0(collapse = " ")
This worked for me in the past but now is not working, perhaps because I was playing around with the sink function. My question is "Is there a more robust/standard way to pull out the SQL query from a tbl_dbi or is what I am doing the only way to do it?"
as.character(db_sql_render(my_tbl_dbi$src$con, my_tbl_dbi))
Note:
methods("show_query")
## [1] show_query.tbl_lazy* show_query.tbl_sql*
dbplyr:::show_query.tbl_sql
## function (x, ...)
## {
## message("<SQL>\n", db_sql_render(x$src$con, x))
## invisible(x)
## }
## <environment: namespace:dbplyr>
The source for db_sql_render.DBIConnection():
db_sql_render.DBIConnection <- function(con, sql, ...) {
qry <- sql_build(sql, con = con, ...)
sql_render(qry, con = con, ...)
}

DBI dbWriteTable

I really like the dbWriteTable function from DBI(I usually use RSQLite or ROracle as backend).
I use that function to import a lot of excel spreadsheets, problem is that if these spreadsheets were created over long term columns change are added/deleted or change name from one document to another.
So my question is does anyone have a relatively quick way to add data to database without having to match filed list perfectly?
Here is sample script that I use
require(RSQLite)
require(readxl)
# Create database file
conn <- dbConnect(drv=SQLite(),dbname = "path to database")
# Define import function
excel2sqltable <- function(conn, file, table) {
source.df <- read_excel(path=file,col_names = TRUE) %>%
cbind("SourceFile" = file, .)
names(source.df) <- source.df %>%
data.frame(check.names = TRUE) %>%
{gsub("[.]",x=names(.),replacement="_")}
print(paste("Importing ", file))
setOldClass(c("tbl_df", "data.frame"))
dbWriteTable(conn = conn, name = table, value = source.df, append=TRUE)
}
With that function I can do:
sapply(list.files(),FUN = function(x){excel2sqltable(conn,x,"Imports")})
You can use this as a guide:
library(RSQLite)
sqlite_conn <- dbConnect(drv = SQLite(),dbname = 'data_dump.sqlite')
excel2sqltable <- function(conn, file, table) {
source.df <- readxl::read_excel(path=file,col_names = TRUE) %>%
cbind("SourceFile" = file, .)
names(source.df) <- source.df %>%
data.frame(check.names = TRUE) %>%
{gsub("[.]",x=names(.),replacement="_")}
if(!dbExistsTable(conn, table)) {
dbWriteTable(conn = conn, name = table, value = source.df)
} else {
# Get both dataframe columns and table columns
df_cols <- colnames(source.df)
tbl_cols <- dbListFields(conn, table)
# Check if there are columns in the dataframe
# that are not in the destination table
# Loop through the missing columns and add
# them to the database table
if (length(setdiff(df_cols, tbl_cols)) > 0) {
missing_cols <- setdiff(df_cols, tbl_cols)
for (col_name in missing_cols) {
dbSendStatement(conn, sprintf('ALTER TABLE %s ADD %s VARCHAR', table, col_name))
}
}
setOldClass(c("tbl_df", "data.frame"))
dbWriteTable(conn = conn, name = table, value = source.df, append=TRUE)
}
}
lapply(list.files(), function(x) {
excel2sqltable(sqlite_conn, x, "Imports")
})
dbDisconnect(sqlite_conn)
I hope it serves a purpose.

Convert topicmodels output to JSON

I use the following function to convert the topicmodels output to JSON output to use in ldavis.
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
## Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
## Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
## Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
but I receive the following error
Error in LDAvis::createJSON(phi = phi, theta = theta, vocab = vocab, doc.length = doc_length, : Length of doc.length not equal
to the number of rows in theta; both should be equal to the number of
documents in the data.
Here is my complete code:
data <- read.csv("textmining.csv")
corpus <- Corpus(DataframeSource(data.frame(data$reasonforleaving)))
# Remove punctuations and numbers because they are generally uninformative.
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
# Convert all words to lowercase.
corpus <- tm_map(corpus, content_transformer(tolower))
# Remove stopwords such as "a", "the", etc.
corpus <- tm_map(corpus, removeWords, stopwords("english"))
# Use the SnowballC package to do stemming.
library(SnowballC)
corpus <- tm_map(corpus, stemDocument)
# remove extra words
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
corpus <- tm_map(corpus, toSpace, "still")
corpus <- tm_map(corpus, toSpace, "also")
# Remove excess white spaces between words.
corpus <- tm_map(corpus, stripWhitespace)
# Inspect the first document to see what it looks like.
corpus[[1]]$content
dtm <- DocumentTermMatrix(corpus)
# remove empty documents
library(slam)
dtm = dtm[row_sums(dtm)>0,]
# Use topicmodels package to conduct LDA analysis.
burnin <- 500
iter <- 1000
keep <- 30
k <- 5
result55 <- LDA(dtm, 5)
ldaoutput = topicmodels_json_ldavis(result55,corpus, dtm)
Do you know why I receive the error?
Thanks
I had the same issue with same code, and found this function here :
topicmodels2LDAvis <- function(x, ...){
post <- topicmodels::posterior(x)
if (ncol(post[["topics"]]) < 3) stop("The model must contain > 2 topics")
mat <- x#wordassignments
LDAvis::createJSON(
phi = post[["terms"]],
theta = post[["topics"]],
vocab = colnames(post[["terms"]]),
doc.length = slam::row_sums(mat, na.rm = TRUE),
term.frequency = slam::col_sums(mat, na.rm = TRUE)
)
}
Much much simpler to use, just put as argument your LDA result :
result55 <- LDA(dtm, 5)
serVis(topicmodels2LDAvis(result55))
Problem
Your problem is in for (i in 1:length(corpus)) in
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
Remember, you have removed some "empty" documents from your DocumentTermMatrix in dtm = dtm[row_sums(dtm)>0,],
so your vector length here is going to be too big.
Suggestion
You may want to keep a vector of the empty docs around as it will help you not only to generate the JSON but also to go back and forth between your empty and full document sets.
doc.length = colSums( as.matrix(tdm) > 0 )[!empty.docs]
My suggestion assumes you have the full tdm with empty docs in place