DBI dbWriteTable - sql

I really like the dbWriteTable function from DBI(I usually use RSQLite or ROracle as backend).
I use that function to import a lot of excel spreadsheets, problem is that if these spreadsheets were created over long term columns change are added/deleted or change name from one document to another.
So my question is does anyone have a relatively quick way to add data to database without having to match filed list perfectly?
Here is sample script that I use
require(RSQLite)
require(readxl)
# Create database file
conn <- dbConnect(drv=SQLite(),dbname = "path to database")
# Define import function
excel2sqltable <- function(conn, file, table) {
source.df <- read_excel(path=file,col_names = TRUE) %>%
cbind("SourceFile" = file, .)
names(source.df) <- source.df %>%
data.frame(check.names = TRUE) %>%
{gsub("[.]",x=names(.),replacement="_")}
print(paste("Importing ", file))
setOldClass(c("tbl_df", "data.frame"))
dbWriteTable(conn = conn, name = table, value = source.df, append=TRUE)
}
With that function I can do:
sapply(list.files(),FUN = function(x){excel2sqltable(conn,x,"Imports")})

You can use this as a guide:
library(RSQLite)
sqlite_conn <- dbConnect(drv = SQLite(),dbname = 'data_dump.sqlite')
excel2sqltable <- function(conn, file, table) {
source.df <- readxl::read_excel(path=file,col_names = TRUE) %>%
cbind("SourceFile" = file, .)
names(source.df) <- source.df %>%
data.frame(check.names = TRUE) %>%
{gsub("[.]",x=names(.),replacement="_")}
if(!dbExistsTable(conn, table)) {
dbWriteTable(conn = conn, name = table, value = source.df)
} else {
# Get both dataframe columns and table columns
df_cols <- colnames(source.df)
tbl_cols <- dbListFields(conn, table)
# Check if there are columns in the dataframe
# that are not in the destination table
# Loop through the missing columns and add
# them to the database table
if (length(setdiff(df_cols, tbl_cols)) > 0) {
missing_cols <- setdiff(df_cols, tbl_cols)
for (col_name in missing_cols) {
dbSendStatement(conn, sprintf('ALTER TABLE %s ADD %s VARCHAR', table, col_name))
}
}
setOldClass(c("tbl_df", "data.frame"))
dbWriteTable(conn = conn, name = table, value = source.df, append=TRUE)
}
}
lapply(list.files(), function(x) {
excel2sqltable(sqlite_conn, x, "Imports")
})
dbDisconnect(sqlite_conn)
I hope it serves a purpose.

Related

Creating a function that returns a new dataframe that can be saved- but will not rewrite the new dataframe each time it runs

I want to create a function that takes a dataframe, makes changes, and creates a new dataframe. I want to apply this code to 200+ datasets. How do I apply a function that will save a new dataframe- but not continuously overwrite it?
Here is my code:
for (i in KO_AFEPSI) {
i <- i[i$AFEPSI != 1,]
i <- separate(i,exon,into=c("chromosome", "start", "end"))
i_plus <- subset(i, strand == "+")
i_plus <- i_plus[order(i_plus$start),]
i_plus <- transform(i_plus, Order = ave(1:nrow(i_plus), gene,FUN = seq_along))
i_minus <- subset(i, strand == "-")
i_minus <- i_minus[order(i_minus$start, decreasing = TRUE),]
i_minus <- transform(i_minus, Order = ave(1:nrow(i_minus), gene,FUN = seq_along))
i <- bind_rows(i_plus, i_minus)
i_upstream_exon <- i %>% group_by(gene) %>% slice_min(Order)
i_downstream_exon <- i %>% group_by(gene) %>% slice_max(Order)
data <- merge(i_upstream_exon, i_downstream_exon, by = "gene", all = FALSE)
}
I basically want to save a new "data" for each dataframe I run through the function. How should i go about this??
Should I be using lapply?

R Shiny: Build an interactive SQL query and copy data into global environment

I try to build a shiny app that enables users to query data. Users are supposed to provide a list of values of interest that are used to filter data stored in a database. Unfortunately, dplyr's translation does not appear to be the most efficient/performant solution such that I have to build a string and pass it to the database via sql(). The string manipulations are also a crude fail safe to handle entry errors/varieties. The following code illustrates this string building and data query process:
library(tibble)
library(dplyr)
library(dbplyr)
library(shiny)
library(RSQLite)
library(DBI)
library(stringr)
# Create string variables to experiment
mtcars1 <- tibble::rownames_to_column(mtcars, var = "model") %>% # Create column based on row names
dplyr::mutate(carmaker = stringr::word(model, 1)) # Create column with first word of column with row names
# Establish example database
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars1, "mtcars1", temporary = FALSE)
# Query example
# Input
string_input <- "Mazda, Merc"
# Prepare input string to be used in SQL
string_filter <- string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
# Data query
data <- tbl(con, "mtcars1") %>%
filter(sql(string_filter)) %>%
show_query() %>%
collect()
I'd like to implement this code in a shiny app:
# Shiny user interface
ui <- fluidPage(
textInput(inputId = "string_input", label = "Input", value = "", placeholder = "Enter list of car models without commas"),
actionButton(inputId = "go", label = "Go"),
textOutput(outputId = "string_output")
)
# Shiny server function
server <- function(input, output){
observeEvent(input$go, {
output$string_output <- reactive({input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
})
}
# Launch shiny app
shinyApp(ui, server)
The app takes to list of values supplied by the user as input, transforms it and shows the transformed list as output.
Here is what I want to do:
I would like store the transformed string_input in an extra local object for further use in the app, that is, I want to pass the string_input to the data query similar to the non-shiny example above.
I would like to copy the data query result to R's global environment such that I can use it, even after closing the app.
Regarding the my second point: I read that one can use <- and <<-, but I could not make it work in a reactive context.
I figured out the solution:
library(tibble)
library(dplyr)
library(dbplyr)
library(shiny)
library(RSQLite)
library(DBI)
library(stringr)
# Create string variables to experiment
mtcars1 <- tibble::rownames_to_column(mtcars, var = "model") %>% # Create column based on row names
dplyr::mutate(carmaker = stringr::word(model, 1)) # Create column with first word of column with row names
# Establish example database
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars1, "mtcars1", temporary = FALSE)
# Shiny user interface
ui <- fluidPage(
textInput(inputId = "string_input",
label = "Input",
value = "",
placeholder = "Enter a list of car makers (e.g. Mazda, Merc)"),
textOutput(outputId = "string_output"),
actionButton(inputId = "go", label = "Go"),
tableOutput(outputId = "data_output")
)
# Custom function to save reactive object to global environment
saveData <- function(x) {
export <<- x
}
# Shiny server function
server <- function(input, output){
list <- reactive({
input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
output$string_output <- reactive({list()})
data <- eventReactive(input$go, {
tbl(con, "mtcars1") %>%
dplyr::filter(sql(!!list())) %>%
dplyr::collect()
})
output$data_output <- renderTable(data())
observeEvent(input$go, {
saveData(data())
})
}
# Launch shiny app
shinyApp(ui, server)
The trick was to define the function saveData, pass the reactive data object to it and assign it to export via <<-.
Honestly, I do not understand all of the fundamentals so any suggestions for improvement are welcome. However, it works.
For your first question:
# Shiny server function
server <- function(input, output){
string_output <- eventReactive(input$go, {
input$string_input %>%
base::gsub("[,]+", " ", .) %>% # remove commas
stringr::str_squish(.) %>% # remove multiple blanks
base::gsub(" ", ",", .) %>% # substitute blanks for commas
base::gsub("(\\w+)", "'\\1'", .) %>% # enclose words with single quotation marks
base::paste0("carmaker in (", ., ")") # create sql where statement
})
output$string_output <-renderText(string_output())
}
string_output() reactive function is now available for output as well as for data query.
Note that you could also use input$string_input instead of input$goas trigger to update the output while you type the criteria.
You can then use input$go to query the data:
data <- eventReactive(input$go, { dbGetQuery(yourConnection,YourQuery(string_output())})
output$data <- renderTable(data())
Not sure you can directly write from Shiny to R's environment, but you can for sure save data() as a file on the server.

What is the preferred way to extract SQL from a tbl_dbi as a string?

I would like to extract the SQL from a tbl_dbi and save it as a character vector.
I have tried
library(dplyr)
my_tbl_dbi <- tbl(conn, "myTable")
my_query <- capture.output(show_query(my_tbl_dbi), type = message) %>%
paste0(collapse = " ")
This worked for me in the past but now is not working, perhaps because I was playing around with the sink function. My question is "Is there a more robust/standard way to pull out the SQL query from a tbl_dbi or is what I am doing the only way to do it?"
as.character(db_sql_render(my_tbl_dbi$src$con, my_tbl_dbi))
Note:
methods("show_query")
## [1] show_query.tbl_lazy* show_query.tbl_sql*
dbplyr:::show_query.tbl_sql
## function (x, ...)
## {
## message("<SQL>\n", db_sql_render(x$src$con, x))
## invisible(x)
## }
## <environment: namespace:dbplyr>
The source for db_sql_render.DBIConnection():
db_sql_render.DBIConnection <- function(con, sql, ...) {
qry <- sql_build(sql, con = con, ...)
sql_render(qry, con = con, ...)
}

shinyApp with mysql database

shinyApp code is attached. I'm using data from mysql database and want to display data table based on the query and str and summary of the data. Code is running properly. Do I need to connect to database again and again to get displays of str, summary and table in my shiny app?
library(shiny)
library(DBI)
library(RMySQL)
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
selectInput("dataset", "Choose a dataset:",
choices = c("city", "country","countrylanguage")),
numericInput("obs", "Number of observations to view:", 10),
submitButton("Update View")
),
mainPanel(
tabsetPanel(tabPanel("Table",tableOutput("view")),
tabPanel("Str",verbatimTextOutput("dump")),
tabPanel("Summary",verbatimTextOutput("stats"))
))
)
)
# Define server logic required to summarize and view the
# selected dataset
server<-function(input, output) {
output$view <- renderTable({
conn <- dbConnect(drv = RMySQL::MySQL(),dbname = "world",host = "localhost",username = "root",password = "xxxx")
on.exit(dbDisconnect(conn), add = TRUE)
head(dbReadTable(conn = conn, name = input$dataset), n = input$obs)
})
output$dump <- renderPrint({
conn <- dbConnect(drv = RMySQL::MySQL(),dbname = "world",host = "localhost",username = "root",password = "xxxx")
on.exit(dbDisconnect(conn), add = TRUE)
str(dbReadTable(conn = conn, name = input$dataset))
})
output$stats <-renderPrint({
conn <- dbConnect(drv = RMySQL::MySQL(),dbname = "world",host = "localhost",username = "root",password = "xxxx")
on.exit(dbDisconnect(conn), add = TRUE)
summary(dbReadTable(conn = conn, name = input$dataset), n = input$obs)
})
# Return the requested dataset
datasetInput <- reactive({
switch(input$dataset
)})
}
shinyApp(ui, server)

dplyr distinct column values sql

I use dplyr in conjunction with a PostgreSQL data base which makes a reproducible example a bit hard for me. Anyways, I want to use the distinct function to filter out messy data, i.e. duplicate timestamps. So far I have:
db <- src_postgres(dbname = "a", host = "b", port = 1234,
user = "c")
measurements <- tbl(adres_db, "measurement")
sites <- group_by(measurements, site)
sites_clean <- filter(sites,
site < 38)
sites_clean <- distinct(sites_clean, timestamp)
P_stats <- summarise(
sites_clean,
count = n(),
P = mean(p_sum)
)
collect(P_stats)
I get the error:
Error: Can't calculate distinct only on specified columns with SQL
Is there a workaround for this? Will dplyr support this in a future version?
Update
I followed the documentation and created a minimal working example using a sqlite data base (also thanks beginneR for the %>% reminder)
library(dplyr)
set.seed(1)
my_db <- src_sqlite("my_db.sqlite3", create = T)
meas <- data.frame(id = 1:30,
timestamp = sample(Sys.time() + c(1,2,3) * 3600,
size = 30, replace = TRUE),
site = sample(c(1, 2, 40), size = 30, replace = TRUE),
p_sum = rpois(30, 2))
meas_sqlite <- copy_to(my_db, meas, temporary = FALSE)
meas_tbl <- tbl(my_db, "meas")
P_stats <- group_by(meas_tbl, site, timestamp) %>%
summarise(P = mean(p_sum)) %>%
group_by(site) %>%
filter(site < 38) %>%
summarise(count = n(), P = mean(P))
collect(P_stats)
While this works, I feel it is not as clean as it could be. Also I need to try the suggestion with row_number(timestamp) == 1 on the PostgreSQL instance.