I am having difficulty querying a SQL database from a knitr chunk. I can establish a connection and the query works in an R session but hangs indefinitely when knitting from RStudio.
---
title: "Untitled"
author: "XXXXX XXXXXXXXX"
date: "Monday, April 27, 2015"
output: html_document
---
TEST TEST
```{r}
library(RJDBC)
jd<-JDBC(driverClass = "com.osisoft.jdbc.Driver",classPath = "C://Program Files (x86)//PIPC//JDBC//PIJDBCDriver.jar")
piDB<-dbConnect(drv = jd,"jdbc:pisql://XX.XXX.XX.XX/Data Source=XXX;Integrated Security=SSPI")
sql1<-"SELECT * FROM pipoints"
sql.dat <- dbGetQuery(piDB, sql1)
dbDisconnect(piDB)
print('Success')
```
If you can use a different connection driver, try ODBC.
RODBC works fine with knitr in RStudio:
```{r}
library(RODBC)
myconn = odbcConnect('myServer')
myquery = paste0("") #add some query
data = sqlQuery(myconn, myquery)
head(data)
```
With RStudio v1.0 you can now use sql chunks directly from your RMarkdown or RNotebook. I use the odbc package for this. I like this because it avoids hard-coding login details into your projects while still creating projects that run end-to-end without user input.
An RMarkdown example below:
```{r}
# Unfortunately, odbc is not on CRAN yet
# So we will need devtools
# install.packages(devtools)
library(devtools)
devtools::install_github("rstats-db/odbc")
# Get connection info from the Windows ODBC Data Source Administrator using the name you set manually.
# If you don't know what this is, just search in the windows start menu for "ODBC Data Source Administrator"
con <- dbConnect(odbc::odbc(), 'MyDataWarehouse')
```
```{sql connection = con, output.var = result}
-- This is sql code, comments need to be marked accordingly
SELECT * FROM SOMETABLE LIMIT 200;
```
```{R}
# And the result is available in the next chunk!
result
```
Related
I have very limited experience with SQL or R, but I have received a very important dataset that was originally created in a SQL Database. It was exported into three separate text files, one of which is too large to open in Notepad. (3.92 GB) I need to be able to combine these three files together in SQL as they originally were, and export them into SPSS and R file formats. (I do have SQL Server Management Studio 17 on my computer, and again, I do have some experience, but it is very very limited.) I have tried asking the providers for guidance, and they have not been forthcoming. Any help would be greatly appreciated.tblSurveyValueLabel tblSurveyVariableLabeltblSurveyAnswer
Code so far:
library(dbplyr)
library(haven)
library(RMariaDB)
library(RPostgres)
library(RSQLite)
library(odbc)
library(bigrquery)
library(dplyr)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "servername",
Database = "dbname",
UID = "sa",
PWD = rstudioapi::askForPassword("Database password"),
Port = 1433
)
sa_db <- tbl(con, "tblSurveyAnswer")
sa_db
val_db <- tbl(con, "tblSurveyValueLabel")
# val_db
var_db <- tbl(con, "tblSurveyVariableLabel")
# var_db
saval_db <- merge(sa_db, val_db, by = c("Survey","Question"), all = TRUE)
saval_db_order <- saval_db[order(saval_db$Survey, saval_db$Question), ] # Sorting data by Survey/Question
str(saval_db)
savalvar_db <- merge(saval_db, var_db, by = c("Survey", "Question"), all = TRUE)
str(savalvar_db)
write_sav(savalvar_db, "savalvar_db.sav")
I use the setseed () function in RStudio to re-generate the same random output. Specifically, the clustering search function "k-means clustering." In RStudio and in the R language console it works.
However, when I enter the same command to an external script in SSMS, I always generate different numbers (specifically the code work succesfully, but every run is ended with different result, what I dont want).
T-SQL external script is:
EXEC sp_execute_external_script #language =N'R', #script=N'
library("plyr")
library("FactoMineR")
library("factoextra")
library("corrplot")
library("dplyr")
library("cluster")
library("ggplot2")
library("magrittr")
str(InputDataSet)
costs <- as.data.frame(InputDataSet)
costs.active <- costs[,c(3, 5, 6, 7)]
set.seed(123)
km.res <- kmeans(costs.active, 10, nstart = 25)
dd <- cbind(costs
, cluster = km.res$cluster)
d <- ddply(dd, .(cluster), nrow)
OutputDataSet <- as.data.frame(d);' #input_data_1 = '...'
The following is already a classic SQL source code.
EDIT
Now I found that, for example, the rnorm () function works fine. Is it possible that it would be related to kmeans function?
Can anyone helps me please?
My RMarkdown notebook with a SQL chunk runs fine when I run all the chunks one by one interactively, but when I try to knit, the SQL chunk does not have save the data into the specified variable. When the dataset that was supposed to be generated using the SQL chunk is referenced in later R chunks, the dataset variable is simply empty.
Here's an example
{r setup, include=FALSE, warning=FALSE, message=FALSE}
# load necessary libraries
library(bigrquery)
library(knitr)
library(tidyverse)
db <- dbConnect(dbi_driver(), dataset = 'sandbox', project = 'project_id', use_legacy_sql = FALSE)
df <- NULL
```
```{sql, connection=db, output.var=df}
select * from example_dataset
limit 10
```
returns dataset
```{r}
head(df)
```
NULL
I've tried the solution here (R: Knitr gives error for SQL-chunk), but it didn't solve my problem.
Just ran into the same problem and it looks like you need to quote the variable you are assigning.
```{sql, connection=db, output.var="df"}
select * from example_dataset
limit 10
```
Source: http://rmarkdown.rstudio.com/authoring_knitr_engines.html#sql
I'm trying to use the SQL chunk function available in the preview version of RStudio 1.0 to connect to a SQL Server (using the RSQLServer backend for DBI) and I'm having some difficulty passing variables.
If I connect to the server and then put the query in the chunk it works as expected
```{r, eval = F}
svr <- dbConnect(RSQLServer::SQLServer(), "Server_name", database = 'Database_name')
query <- 'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
```{sql, connection = svr, eval = F}
SELECT TOP 10 * FROM Database_name.dbo.table_name
```
But if I try to pass the query as a variable it throws an error
```{sql, connection = svr, eval = F}
?query
```
Error: Unable to retrieve JDBC result set for 'SELECT TOP 10 * FROM Database_name.dbo.table_name': Incorrect syntax near 'SELECT TOP 10 * FROM Database_name.dbo.table_name'.
Failed to execute SQL chunk
I think it's related to the way R wraps character vectors in quotes, because I get the same error if I run the following code.
```{sql, connection = svr, eval = F}
'SELECT TOP 10 * FROM Database_name.dbo.table_name'
```
Is there a way I can get around this error?
Currently I can achieve what I want by using inline expressions to print the query, using the pygments for highlighting and running the query in a R chunk with DBI commands, so using code chunks would be a bit nicer.
Looks like Using R variables in queries applies some kind of escaping and can therefore only be used in cases like the example from the documentation (SELECT * FROM trials WHERE subjects >= ?subjects) but not to dynamically set up the whole query.
Instead, the code chunk option can be used to achieve the desired behavior:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
```{r}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
query <- "SELECT * FROM tracks"
```
```{sql, connection=db, code = query}
```
I haven't been able to determine a way to print and execute in the same chunk however with a few extra lines of code it is possible to achieve my desired output.
Printing is solved by CL.'s answer and then I can use EXEC to run the code.
```{sql, code = query}
```
```{sql, connection = svr}
EXEC (?query)
```
I used the following sql code in .Rmd document. However, I want to use the same SQL code in .Rnw document.
```{r label = setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, max.print = NA)
```
```{r, echo=FALSE, results='hide'}
library(DBI)
db <- dbConnect(RSQLite::SQLite(), dbname = "survey.db")
dbListTables(db)
```
```{sql, label = Q1, connection=db, tab.cap = "Table Caption"}
SELECT *
FROM Person;
```
Would prefer to get code formatting and output printing facility.
Porting the RMarkdown to RNW requires some tweaking:
Of course, chunk delimiters need to be adjusted: The RNW equivalent of ```{r, echo=FALSE} is <<echo=FALSE>>= and RNW chunks end with #. (See the minimal RNW example.)
Importantly, while chunks in RMarkdown documents always specify an engine, the engine in RNW is implicitly R unless the option engine is set. So ```{r} becomes simply <<>>=, but the equivalent of ```{sql} is <<engine="sql">>=.
RMarkdown includes some very useful magic when embedding SQL chunks, see knitr Language Engines: SQL on rmarkdown.rstudio.com. By default, results are rendered as a nice table and only the first 10 results are printed. In RNW, we need to take care of this on our own.
For embedding SQL in RMarkdown, note that the SQL connection must be passed to the SQL chunk via the connection option. The option output.var can be used to specify the name of the object to which the result of the query will be assigned.
A simple solution (see previous revision) would just assign the SQL result to an object, say res, using output.var and add another R chunk that prints res nicely, e.g. using xtable. However, there is a more elegant approach using hooks:
The example uses the SQLite sample database from sqlitetutorial.net. Unzip it to your working directory before running the code.
\documentclass{article}
\begin{document}
\thispagestyle{empty}
<<include=FALSE>>=
library(knitr)
library(DBI)
knit_hooks$set(formatSQL = function(before, options, envir) {
if (!before && opts_current$get("engine") == "sql") {
sqlData <- get(x = opts_current$get("output.var"))
max.print <- min(nrow(sqlData), opts_current$get("max.print"))
myxtable <- do.call(xtable::xtable, c(list(x = sqlData[1:max.print, ]), opts_current$get("xtable.args")))
capture.output(myoutput <-do.call(xtable::print.xtable, c(list(x = myxtable, file = "test.txt"), opts_current$get("print.xtable.args"))))
return(asis_output(paste(
"\\end{kframe}",
myoutput,
"\\begin{kframe}")))
}
})
opts_chunk$set(formatSQL = TRUE)
opts_chunk$set(output.var = "formatSQL_result")
opts_chunk$set(max.print = getOption("max.print"))
#
<<echo=FALSE, results="hide">>=
db <- dbConnect(RSQLite::SQLite(), dbname = "chinook.db")
#
<<engine = "sql", connection=db, max.print = 8, xtable.args=list(caption = "My favorite artists?", label="tab:artist"), print.xtable.args=list(comment=FALSE, caption.placement="top")>>=
SELECT * FROM artists;
#
\end{document}
A new chunk hook formatSQL is added. (Chunk hooks run whenever the corresponding chunk option is not NULL.) After a chunk with engine="sql", it reads the SQL results into sqlData. Then, it uses xtable to print the first max.print rows of the result.
By default, the chunk hook formatSQL is activated (i.e. it is globally set to TRUE) and SQL results are stored in formatSQL_result. The chunk option max.print controls the number of rows to be printed (set it to Inf to print all rows, always).
The table produced by xtable is highly customizable. The chunk option xtable.args is passed to xtable and print.xtable.args is passed to print.xtable. In the example these options are used to set a caption, a label and to suppress xtable's default comment.
Below the generated PDF. Note that syntax highlighting for non-R code in RNW requires installing highlight and adding the directory to path (Windows).