Run SQL script from R with variables defined in R - sql

I have an SQL script which I need to run using R Studio. However, my SQL script has one variable that is defined in my R environment. I am using dbGetQuery; however, I do not know (and I didn't find a solution) how to pass these variables.
library(readr)
library(DBI)
library(odbc)
library(RODBC)
#create conection (fake one here)
con <- odbcConnect(...)
dt = Sys.Date()
df = dbGetQuery(.con, statement = read_file('Query.sql'))
The file 'Query.sql' makes reference to dt. How do I make the file recognize my variable dt?

There are several options, but my preferred is "bound parameters".
If, for instance, your 'Query.sql' looks something like
select ...
from MyTable
where CreatedDateTime > ?
The ? is a place-holder for a binding.
Then you can do
con <- dbConnect(...) # from DBI
df = dbGetQuery(con, statement = read_file('Query.sql'), params = list(dt))
With more parameters, add more ?s and more objects to the list, as in
qry <- "select ... where a > ? and b < ?"
newdat <- dbGetQuery(con, qry, params = list(var1, var2))
If you need a SQL IN clause, it gets a little dicey, since it doesn't bind things precisely like we want.
candidate_values <- c(2020, 1997, 1996, 1901)
qry <- paste("select ... where a > ? and b in (", paste(rep("?", length(candidate_values)), collapse=","), ")")
qry
# [1] "select ... where a > ? and b in ( ?,?,?,? )"
df <- dbGetQuery(con, qry, params = c(list(avar), as.list(candidate_values)))

Related

Get the results sets of a parametrized query `rbind`ed *and* directly in the database using R's `DBI`

Using R's DBI, I need to:
run a parametrized query with different parameters (i.e. a vector of parameters);
get the results sets concatenated (i.e. rbinded as per R terminology or unioned as per SQL terminology);
and get the resulting table in the database for further manipulation.
dbBind/dbGetquery fullfils requirements 1 and 2, but I then need to write the resulting data frame to the database using dbWriteTable, which is ineficient:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "iris", iris)
res <- dbGetQuery(con,
"select * from iris where Species = ?",
params = list(c("setosa", "versicolor")))
dbWriteTable(con, "mytable", res)
Conversely, dbExecute fulfils requirement 3, but I don't think it has the "rbind feature". Of course, this throw an error because the table would get overwritten:
dbExecute(con,
"create table mytable as select * from iris where Species = ?",
params = list(c("setosa", "versicolor")))
What is the most efficient/recommended way of doing so?
Notes:
I am not the DBA and can only access the database through R.
My example is too trivial and could be achieved in a single query. My use case really requires a parametrized query to be run multiple times with different parameters.
I have to use Oracle, but I am interested in a solution even if it don't works with Oracle.
1) Create the table with the first parameter and then insert each of the others into it.
library(RSQLite)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
parms <- c("setosa", "versicolor")
dbExecute(con, "create table mytable as
select * from iris where Species = ?",
params = parms[1])
for (p in parms[-1]) {
dbExecute(con, "insert into mytable
select * from iris where Species = ?",
params = p)
}
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
2) Alternately generate the text of an SQL statement to do it all. sqldf pulls in RSQLite and gsubfn which supplies fn$ that enables the text substitution.
library(sqldf)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
parms <- c("setosa", "versicolor")
parmString <- toString(sprintf("'%s'", parms))
fn$dbExecute(con, "create table mytable as
select * from iris where Species in ($parmString)")
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
3) A variation of (2) is to insert the appropriate number of question marks.
library(sqldf)
con <- dbConnect(SQLite())
dbWriteTable(con, "iris", iris)
params <- list("setosa", "versicolor")
quesString <- toString(rep("?", length(params)))
fn$dbExecute(con, "create table mytable as
select * from iris where Species in ($quesString)", params = params)
# check
res <- dbGetQuery(con, "select * from mytable")
str(res)
Based on #r2evans comment and #G.Grothendieck answer, instead of query/download/combine/upload, I used a parameterized query that inserts directly into a table.
First, I created the table with the appropriate columns to collect the results:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")
create_table <-
"CREATE TABLE warpbreaks2 (
breaks real,
wool text,
tension text
)"
dbExecute(con, create_table)
Then I executed an INSERT INTO step:
dbWriteTable(con, "warpbreaks", warpbreaks)
insert_into <-
"INSERT INTO warpbreaks2
SELECT warpbreaks.breaks,
warpbreaks.wool,
warpbreaks.tension
FROM warpbreaks
WHERE tension = ?"
dbExecute(con, insert_into, params = list(c("L", "M")))
This is a dummy example for illustration purpose. It could be achieve more directly with e.g.:
direct_query <-
"CREATE TABLE warpbreaks3 AS
SELECT *
FROM warpbreaks
WHERE tension IN ('L', 'M')"
dbExecute(con, direct_query )

List of objects in SQL-Server using R [duplicate]

I'm trying to catalog the structure of a MSSQL 2008 R2 database using R/RODBC. I have set up a DSN, connected via R and used the sqlTables() command but this is only getting the 'system databases' info.
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlTables(conn1)
However if I do this:
library(RODBC)
conn1 <- odbcConnect('my_dsn')
sqlQuery('USE my_db_1')
sqlTables(conn1)
I get the tables associated with the my_db_1 database. Is there a way to see all of the databases and tables without manually typing in a separate USE statement for each?
There may or may not be a more idiomatic way to do this directly in SQL, but we can piece together a data set of all tables from all databases (a bit more programatically than repeated USE xyz; statements) by getting a list of databases from master..sysdatabases and passing these as the catalog argument to sqlTables - e.g.
library(RODBC)
library(DBI)
##
tcon <- RODBC::odbcConnect(
dsn = "my_dsn",
uid = "my_uid",
pwd = "my_pwd"
)
##
db_list <- RODBC::sqlQuery(
channel = tcon,
query = "SELECT name FROM master..sysdatabases")
##
R> RODBC::sqlTables(
channel = tcon,
catalog = db_list[14, 1]
)
(I can't show any of the output for confidentiality reasons, but it produces the correct results.) Of course, in your case you probably want to do something like
all_metadata <- lapply(db_list$name, function(DB) {
RODBC::sqlTables(
channel = tcon,
catalog = DB
)
})
# or some more efficient variant of data.table::rbindlist...
meta_df <- do.call("rbind", all_metadata)

How do I substitute values from the R session into SQL bind variable placeholders?

I want to re-use raw SQL within an R script. However, SQL has variable binding that lets us parameterize the query.
Is there a quick way to directly substitute values from the R session into bind variable placeholders when using SQL within dbplyr?
I guess it doesn't have to be dbplyr, but that's what I was using.
I recall that RMarkdown supports an SQL engine that lets a chunk with SQL bind variables to values in the (Global?) environment. (Search for text "If you need to bind the values of R variables into SQL queries" in that page.) Based on this, it seems that someone has already set up a way to do easy variable binding.
For example, code below makes the program "Oracle SQL Developer" prompt me to enter a value for :param1 when I run it.
select
*
from
( select 'test' as x, 'another' as y from dual )
where x = :param1 and y = :param2
I would like to take that same code in R and run it with some parameters. This does not work, but it's kind of what I imagine might work if there was a function to do it:
# Assume "con" is a DB connection already established to an Oracle db.
tbl( con,
args_for_oracle_sql(
"select
*
from
( select 'test' as x, 'another' as y from dual )
where x = :param1 and y = :param2 ",
# Passing the named parameters
param1 = "test",
param2 = "another"
)
)
# Here's another interface idea that is perhaps similar to
# what is shown here for SQL: https://bookdown.org/yihui/rmarkdown/language-engines.html#sql
raw_sql <- "
select
*
from
( select 'test' as x, 'another' as y from dual )
where x = :param1 and y = :param2 "
# Set variables that match parameter names in the current environment.
param1 <- "test"
param2 <- "another"
tbl( con,
exc_with_args_for_oracle_sql(
# Pass raw SQL with the ":param1" markers
sql = raw_sql,
# Pass the function an environment that contains the values
# of the named parameters in the SQL. In this case, the
# current environment where I've set these values above.
env_with_args = environment()
)
)
By the way, I'm not sure which of the following libraries are needed, but this is what I load:
library(RODBC)
library(RODBCext)
library(RODBCDBI)
library(DBI)
library(dplyr)
library(dbplyr)
library(odbc)
Use the build_sql() function from dbplyr (for strings)
library(DBI)
library(dbplyr)
library(odbc)
param1 = readline("What is the value of param1 ?") # base R
param1 = rstudioapi::askForPassword("What is the value of param1 ?") # RStudio
param2 = readline("What is the value of param2 ?") #
con = dbConnect('XXX') # Connection
# write your query (dbplyr)
sql_query = build_sql("select
*
from
( select 'test' as x, 'another' as y from dual )
where x = ",param1," and y = ", param2, con = con)
df = dbGetQuery(con,sql_query)

Dynamic SQL Query in R (WHERE)

I am trying out some dynamic SQL queries using R and the postgres package to connect to my DB.
Unfortunately I get an empty data frame if I execute the following statement:
x <- "Mean"
query1 <- dbGetQuery(con, statement = paste(
"SELECT *",
"FROM name",
"WHERE statistic = '",x,"'"))
I believe that there is a syntax error somewhere in the last line. I already changed the commas and quotation marks in every possible way, but nothing seems to work.
Does anyone have an idea how I can construct this SQL Query with a dynamic WHERE Statement using a R variable?
You should use paste0 instead of paste which is producing wrong results or paste(..., collapse='') which is slightly less efficient (see ?paste0 or docs here).
Also you should consider preparing your SQL statement in separated variable. In such way you can always easily check what SQL is being produced.
I would use this (and I am using this all the time):
x <- "Mean"
sql <- paste0("select * from name where statistic='", x, "'")
# print(sql)
query1 <- dbGetQuery(con, sql)
In case I have SQL inside a function I always add debug parameter so I can see what SQL is used:
function get_statistic(x=NA, debug=FALSE) {
sql <- paste0("select * from name where statistic='", x, "'")
if(debug) print(sql)
query1 <- dbGetQuery(con, sql)
query1
}
Then I can simply use get_statistic('Mean', debug=TRUE) and I will see immediately if generated SQL is really what I expected.
The Problem The problem may be that you have spaces around Mean:
x <- "Mean"
s <- paste(
"SELECT *",
"FROM name",
"WHERE statistic = '",x,"'")
giving:
> s
[1] "SELECT * FROM name WHERE statistic = ' Mean '"
Corrected Version Instead try:
s <- sprintf("select * from name where statistic = '%s'", x)
giving:
> s
[1] "select * from name where statistic = 'Mean'"
gsubfn You could also try this:
library(gsubfn)
fn$dbGetQuery(con, "SELECT *
FROM name
WHERE statistic = '$x'")
Try this:
require(stringi)
stri_paste("SELECT * ",
"FROM name ",
"WHERE statistic = '",x,"'",collapse="")
## [1] "SELECT * FROM name WHERE statistic = 'Mean'"
or use concatenate operator %+%
"SELECT * FROM name WHERE statistic ='" %+% x %+% "'"
## [1] "SELECT * FROM name WHERE statistic ='mean'"
A newer way to do this is with the glue package, part of the tidyverse. It is described as "An implementation of interpreted string literals, inspired by Python's Literal String Interpolation."
Using glue, you would do:
library(glue)
library(DBI)
x <- "Mean"
query1 <- glue_sql("
SELECT *
FROM name
WHERE statistic = ({x})
", .con = con)
dbGetQuery(con, query1)
It's a great package due to its flexibility. For example, let's say you wanted to import mean, median and mode statistics. Then you would add an asterisk to the call like so:
x <- c("Mean", "Median", "Mode")
query2 <- glue_sql("
SELECT *
FROM name
WHERE statistic = ({x*})
", .con = con)
dbGetQuery(con, query2)

Specifying an SQL where statement based on a string in R (using rodbc)

This is my first attempt at using R to access data from within MS Access using ODBC.
The following query works:
id <- levels(assetid)[assetid[,1]][12]
qry <- "SELECT DriverName FROM Data WHERE ID = 'idofinterest'"
sqlQuery(con, qry)
However, I would like to know if there is a way to use the variable "id" in the "qry" statement (without using paste)? I have seen some statements on the web with $ and % signs - however I haven't had any success in using them.
Thanks.
Why don't you want to use paste? Anyway, sprintf is an alternative means of string munging.
qry <- sprintf("SELECT DriverName FROM Data WHERE ID = '%s'", id)
sqlQuery(con, qry)
Try fn$ from the gsubfn package :
> library(gsubfn)
> id <- "abc"
> fn$identity("SELECT DriverName FROM Data WHERE ID = '$id'")
[1] "SELECT DriverName FROM Data WHERE ID = 'abc'"