knitr sql chunk not saving data into variable - sql

My RMarkdown notebook with a SQL chunk runs fine when I run all the chunks one by one interactively, but when I try to knit, the SQL chunk does not have save the data into the specified variable. When the dataset that was supposed to be generated using the SQL chunk is referenced in later R chunks, the dataset variable is simply empty.
Here's an example
{r setup, include=FALSE, warning=FALSE, message=FALSE}
# load necessary libraries
library(bigrquery)
library(knitr)
library(tidyverse)
db <- dbConnect(dbi_driver(), dataset = 'sandbox', project = 'project_id', use_legacy_sql = FALSE)
df <- NULL
```
```{sql, connection=db, output.var=df}
select * from example_dataset
limit 10
```
returns dataset
```{r}
head(df)
```
NULL
I've tried the solution here (R: Knitr gives error for SQL-chunk), but it didn't solve my problem.

Just ran into the same problem and it looks like you need to quote the variable you are assigning.
```{sql, connection=db, output.var="df"}
select * from example_dataset
limit 10
```
Source: http://rmarkdown.rstudio.com/authoring_knitr_engines.html#sql

Related

How to prevent errors in format (string-variable) when downloading data from Google Big Query to R?

I have some data stored (Tweets streamed from Twitters Rest API) in Google Big Query, which, in the preview looks like this
'I’m up by myself.'
However, when I download it into R, it looks like this;
'I’m up by myself.'
Is there any way to prevent it?
I am using this code to download the data in R:
library(bigrquery)
project_id <- "my_project"
sql_string <-
"SELECT
text,
FROM my_under_project.my_table,
LIMIT 500
;"
test <- query_exec(sql_string, project = project_id, useLegacySql = FALSE, allowLargeResults=TRUE, max_pages = Inf)
str(test)
#data.frame': 500 obs. of 1 variable:
#$ text: chr "tweets" ...
The data from 'text' is stored as a string in Big Query.
Any help is appreciated! Thanks in advance!
I downloaded the data by 'bq_table_download' from the same package (instead of query_exec) from the same package and that solved the problem!
Special characters when importing from BigQuery to R

Issue automating CSV import to an RSQLite DB

I'm trying to automate writing CSV files to an RSQLite DB.
I am doing so by indexing csvFiles, which is a list of data.frame variables stored in the environment.
I can't seem to figure out why my dbWriteTable() code works perfectly fine when I enter it manually but not when I try to index the name and value fields.
### CREATE DB ###
mydb <- dbConnect(RSQLite::SQLite(),"")
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in 1:length(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = csvFiles[i], overwrite=T)
i=i+1
}
# EXAMPLE CODE THAT SUCCESSFULLY MANUAL IMPORTS INTO mydb
dbWriteTable(mydb,"DEPARTMENT",DEPARTMENT)
When I run the for loop above, I'm given this error:
"Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'DEPARTMENT': No such file or directory
# note that 'DEPARTMENT' is the value of csvFiles[1]
Here's the dput output of csvFiles:
c("DEPARTMENT", "EMPLOYEE_PHONE", "PRODUCT", "EMPLOYEE", "SALES_ORDER_LINE",
"SALES_ORDER", "CUSTOMER", "INVOICES", "STOCK_TOTAL")
I've researched this error and it seems to be related to my working directory; however, I don't really understand what to change, as I'm not even trying to manipulate files from my computer, simply data.frames already in my environment.
Please help!
Simply use get() for the value argument as you are passing a string value when a dataframe object is expected. Notice your manual version does not have DEPARTMENT quoted for value.
# FOR LOOP TO BATCH IMPORT DATA INTO DATABASE
for (i in seq_along(csvFiles)) {
dbWriteTable(mydb,name = csvFiles[i], value = get(csvFiles[i]), overwrite=T)
}
Alternatively, consider building a list of named dataframes with mget and loop element-wise between list's names and df elements with Map:
dfs <- mget(csvfiles)
output <- Map(function(n, d) dbWriteTable(mydb, name = n, value = d, overwrite=T), names(dfs), dfs)

Passing query as variable to Rmarkdown sql chunk

I'm trying to use the SQL chunk function available in the preview version of RStudio 1.0 to connect to a SQL Server (using the RSQLServer backend for DBI) and I'm having some difficulty passing variables.
If I connect to the server and then put the query in the chunk it works as expected
```{r, eval = F}
svr <- dbConnect(RSQLServer::SQLServer(), "Server_name", database = 'Database_name')
query <- 'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
```{sql, connection = svr, eval = F}
SELECT TOP 10 * FROM Database_name.dbo.table_name
```
But if I try to pass the query as a variable it throws an error
```{sql, connection = svr, eval = F}
?query
```
Error: Unable to retrieve JDBC result set for 'SELECT TOP 10 * FROM Database_name.dbo.table_name': Incorrect syntax near 'SELECT TOP 10 * FROM Database_name.dbo.table_name'.
Failed to execute SQL chunk
I think it's related to the way R wraps character vectors in quotes, because I get the same error if I run the following code.
```{sql, connection = svr, eval = F}
'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
Is there a way I can get around this error?
Currently I can achieve what I want by using inline expressions to print the query, using the pygments for highlighting and running the query in a R chunk with DBI commands, so using code chunks would be a bit nicer.
Looks like Using R variables in queries applies some kind of escaping and can therefore only be used in cases like the example from the documentation (SELECT * FROM trials WHERE subjects >= ?subjects) but not to dynamically set up the whole query.
Instead, the code chunk option can be used to achieve the desired behavior:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
```{r}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
query <- "SELECT * FROM tracks"
```
```{sql, connection=db, code = query}
```
I haven't been able to determine a way to print and execute in the same chunk however with a few extra lines of code it is possible to achieve my desired output.
Printing is solved by CL.'s answer and then I can use EXEC to run the code.
```{sql, code = query}
```
```{sql, connection = svr}
EXEC (?query)
```

SQL code in Rnw document with knitr

I used the following sql code in .Rmd document. However, I want to use the same SQL code in .Rnw document.
```{r label = setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, max.print = NA)
```
```{r, echo=FALSE, results='hide'}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "survey.db")
dbListTables(db)
```
```{sql, label = Q1, connection=db, tab.cap = "Table Caption"}
SELECT *
FROM Person;
```
Would prefer to get code formatting and output printing facility.
Porting the RMarkdown to RNW requires some tweaking:
Of course, chunk delimiters need to be adjusted: The RNW equivalent of ```{r, echo=FALSE} is <<echo=FALSE>>= and RNW chunks end with #. (See the minimal RNW example.)
Importantly, while chunks in RMarkdown documents always specify an engine, the engine in RNW is implicitly R unless the option engine is set. So ```{r} becomes simply <<>>=, but the equivalent of ```{sql} is <<engine="sql">>=.
RMarkdown includes some very useful magic when embedding SQL chunks, see knitr Language Engines: SQL on rmarkdown.rstudio.com. By default, results are rendered as a nice table and only the first 10 results are printed. In RNW, we need to take care of this on our own.
For embedding SQL in RMarkdown, note that the SQL connection must be passed to the SQL chunk via the connection option. The option output.var can be used to specify the name of the object to which the result of the query will be assigned.
A simple solution (see previous revision) would just assign the SQL result to an object, say res, using output.var and add another R chunk that prints res nicely, e.g. using xtable. However, there is a more elegant approach using hooks:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
\documentclass{article}
\begin{document}
\thispagestyle{empty}
<<include=FALSE>>=
library(knitr)
library(DBI)
knit_hooks$set(formatSQL = function(before, options, envir) {
if (!before && opts_current$get("engine") == "sql") {
sqlData <- get(x = opts_current$get("output.var"))
max.print <- min(nrow(sqlData), opts_current$get("max.print"))
myxtable <- do.call(xtable::xtable, c(list(x = sqlData[1:max.print, ]), opts_current$get("xtable.args")))
capture.output(myoutput <-do.call(xtable::print.xtable, c(list(x = myxtable, file = "test.txt"), opts_current$get("print.xtable.args"))))
return(asis_output(paste(
"\\end{kframe}",
myoutput,
"\\begin{kframe}")))
}
})
opts_chunk$set(formatSQL = TRUE)
opts_chunk$set(output.var = "formatSQL_result")
opts_chunk$set(max.print = getOption("max.print"))
#
<<echo=FALSE, results="hide">>=
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
#
<<engine = "sql", connection=db, max.print = 8, xtable.args=list(caption = "My favorite artists?", label="tab:artist"), print.xtable.args=list(comment=FALSE, caption.placement="top")>>=
SELECT * FROM artists;
#
\end{document}
A new chunk hook formatSQL is added. (Chunk hooks run whenever the corresponding chunk option is not NULL.) After a chunk with engine="sql", it reads the SQL results into sqlData. Then, it uses xtable to print the first max.print rows of the result.
By default, the chunk hook formatSQL is activated (i.e. it is globally set to TRUE) and SQL results are stored in formatSQL_result. The chunk option max.print controls the number of rows to be printed (set it to Inf to print all rows, always).
The table produced by xtable is highly customizable. The chunk option xtable.args is passed to xtable and print.xtable.args is passed to print.xtable. In the example these options are used to set a caption, a label and to suppress xtable's default comment.
Below the generated PDF. Note that syntax highlighting for non-R code in RNW requires installing highlight and adding the directory to path (Windows).

knitr and SQL query

I am having difficulty querying a SQL database from a knitr chunk. I can establish a connection and the query works in an R session but hangs indefinitely when knitting from RStudio.
---
title: "Untitled"
author: "XXXXX XXXXXXXXX"
date: "Monday, April 27, 2015"
output: html_document
---
TEST TEST
```{r}
library(RJDBC)
jd<-JDBC(driverClass = "com.osisoft.jdbc.Driver",classPath = "C://Program Files (x86)//PIPC//JDBC//PIJDBCDriver.jar")
piDB<-dbConnect(drv = jd,"jdbc:pisql://XX.XXX.XX.XX/Data Source=XXX;Integrated Security=SSPI")
sql1<-"SELECT * FROM pipoints"
sql.dat <- dbGetQuery(piDB, sql1)
dbDisconnect(piDB)
print('Success')
```
If you can use a different connection driver, try ODBC.
RODBC works fine with knitr in RStudio:
```{r}
library(RODBC)
myconn = odbcConnect('myServer')
myquery = paste0("") #add some query
data = sqlQuery(myconn, myquery)
head(data)
```
With RStudio v1.0 you can now use sql chunks directly from your RMarkdown or RNotebook. I use the odbc package for this. I like this because it avoids hard-coding login details into your projects while still creating projects that run end-to-end without user input.
An RMarkdown example below:
```{r}
# Unfortunately, odbc is not on CRAN yet
# So we will need devtools
# install.packages(devtools)
library(devtools)
devtools::install_github("rstats-db/odbc")
# Get connection info from the Windows ODBC Data Source Administrator using the name you set manually.
# If you don't know what this is, just search in the windows start menu for "ODBC Data Source Administrator"
con <- dbConnect(odbc::odbc(), 'MyDataWarehouse')
```
```{sql connection = con, output.var = result}
-- This is sql code, comments need to be marked accordingly
SELECT * FROM SOMETABLE LIMIT 200;
```
```{R}
# And the result is available in the next chunk!
result
```