SQL "where" clause failing with R JDBC HANA connection - sql

I've had nothing but trouble connecting to my company's HANA db through R but finally had a breakthrough, however now my sql statement is failing in subsetting data using a "where" statement.
The following returns a data frame of 10 observations across 9 variables
# Fetch all results
rs <- dbSendQuery(jdbcConnection, 'SELECT TOP 10
VISITTYPE,
ACCOUNT,
PLANNEDSTART,
PLANNEDEND,
EXECUTIONSTART,
EXECUTIONEND,
STATUS,
SOURCE,
ACCOUNT_NAME
FROM "_SYS_BIC"."cona-reporting.field-sales/Q_CA_R_SpringVisit"')
a <- dbFetch(rs)
However when I throw a where into it, I receive an error.
rs <- dbSendQuery(jdbcConnection, 'SELECT TOP 10
VISITTYPE,
ACCOUNT,
PLANNEDSTART,
PLANNEDEND,
EXECUTIONSTART,
EXECUTIONEND,
STATUS,
SOURCE,
ACCOUNT_NAME
FROM "_SYS_BIC"."cona-reporting.field-sales/Q_CA_R_SpringVisit" WHERE VISITTYPE = ZR')
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT TOP 10
VISITTYPE,
ACCOUNT,
PLANNEDSTART,
PLANNEDEND,
EXECUTIONSTART,
EXECUTIONEND,
STATUS,
SOURCE,
ACCOUNT_NAME
FROM "_SYS_BIC"."cona-reporting.field-sales/Q_CA_R_SpringVisit" WHERE VISITTYPE = ZR (SAP DBTech JDBC: [260] (at 222): invalid column name: ZR: line 11 col 101 (at pos 222))
What does this mean? ZR is not a column, it is a value within the column. Tried placing ZR in quotes to no other effect.
My double and single quote syntax is based on this other question I've asked.
Issues connecting R to HANA db with many special characters
Never got it working with RODBC so tried JODBC.

Likely it is handling of quotes within an embedded quote-enclosed string, further complicated by the double quote symbols used in SQL for identifiers. However, consider parameterization (an industry best practice whenever running SQL in application layer such as R) to avoid the need of quote punctuation or concatenation. Like most JDBC APIs, RJDBC supports parameterization. Also note, dbGetQuery summarily equates to dbSendQuery + dbFetch:
sql <- 'SELECT TOP 10 VISITTYPE,
ACCOUNT,
PLANNEDSTART,
PLANNEDEND,
EXECUTIONSTART,
EXECUTIONEND,
STATUS,
SOURCE,
ACCOUNT_NAME
FROM "_SYS_BIC"."cona-reporting.field-sales/Q_CA_R_SpringVisit"
WHERE VISITTYPE = ?'
param <- 'ZR'
df <- dbGetQuery(jdbcConnection, sql, param)

To complete the previous answer (which is of course preferable as it uses bind variables) here is described the *root cause** of the problem:
The use of a single quote in a single quoted string must be of course escaped
Contrary to the Oracle escaping using doubling the quote R uses backslash.
i.e. the proper usage is as follows:
> df <- dbGetQuery(jdbcConnection,
+ 'select * from "DUAL" where "DUMMY" = \'X\'')
> df
DUMMY
1 X
alternative way using double quoted string
> df <- dbGetQuery(jdbcConnection,
+ "select * from \"DUAL\" where \"DUMMY\" = 'X'")
> df
DUMMY
1 X

Related

R: Summary of SQL Tables

I am working with the R programming language.
Normally, when I want to get the summary of a table, I can use something like the "str()" function or the "summary()" function:
str(my_table)
summary(my_table)
However, now I am trying to do this with tables on a server.
For instance, I am trying to get the summaries of variable types for a specific table (e.g. "my_table") on a server. I found a very indirect way to do this:
#load libraries
library(OBDC)
library(RODBC)
library(dbi)
#establish a connection and name it as "dbhandle"
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
dbColumnInfo(rs)
My Question: Is there a more "direct" way to do this? For example, can I get information about each column (e.g. whether the column is integer, character, date, etc.) in a table without first sending the query and then requesting the information? Can I do this directly?
Thanks!
You could try using fetch() from "RMySQL" to turn your SQL query into an R object (e.g. data frame)
library(RMySQL)
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
# Get the results from MySQL into R
my_table = fetch(rs, n=-1)
# clear result
dbClearResult(rs)
rm(rs)
Then use the functions you describe.
str(my_table)
summary(my_table)

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

R Pass required variable from ODBC/HANA connection to sql statement

I have a table I am trying to call with my usual method
sql <- 'SELECT TOP 10 *
FROM "_SYS_BIC"."data-path.self-service.DOIP/table_name"'
df <- dbGetQuery(jdbcConnection, sql)
and receive the error
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT TOP 10 *
FROM "_SYS_BIC"."data-path.self-service.DOIP/table_name" (SAP DBTech JDBC: [2048]: column store error: search table error: [34023] Instantiation of calculation model failed;exception 306106: Undefined variable: $$IP_ExtractionWeekFrom$$. Variable is marked as required but not set in the query)
I've been trying to insert IP_ExtractionWeekFrom into the sql statement with a where clause with no luck
param1 <- 201943
sql <- 'SELECT TOP 10 *
FROM "_SYS_BIC"."ccf-edw.self-service.DOIP/R_CA_B_DemandPlan" where
"$$IP_ExtractionWeek$$" = ?'
SpringVisit <- dbGetQuery(jdbcConnection, sql, param1)
I've tried the term surrounded by the "$$" and without, and both with and without "$$" sourrounded in quotes and not. Usually am met with an "invalid column name" error.
Is this supposed to be called with something other than a where clause?
Consider maintaining your working Tableau query with the integration of parameters in R with properly handling of double quotes for identifiers and single quotes for literals.
Additionally, parameterization is not supported with the old ('PLACEHOLDER'= ('<varname>', <varvalue>)) syntax.
Instead, as explained in How to escape sql injection from HANA placeholder use the PLACEHOLDER."<varname>" => ? syntax.
param1 <- 201943
sql <- "SELECT TOP 10 *
FROM \"_SYS_BIC\".\"ccf-edw.self-service.DOIP/R_CA_B_DemandPlan\"(
PLACEHOLDER.\"$$IP_ExtractionWeekFrom$$\", ?),
PLACEHOLDER.\"$$IP_ExtractionWeekTo$$\",?)
)\"_SYS_BIC\".\"ccf-edw.self-service.DOIP/R_CA_B_DemandPlan\"
WHERE (1 <> 0)"
SpringVisit <- dbGetQuery(jdbcConnection, sql, param1, param1)
Additionally, if your JDBC already connects to the schema_SYS_BIC, use the synonymous qualifier :: as original query in order to reference package and calculation view:
sql <- "SELECT TOP 10 *
FROM \"ccf-edw.self-service.DOIP::R_CA_B_DemandPlan\"(
PLACEHOLDER.\"$$IP_ExtractionWeekFrom$$\", ?),
PLACEHOLDER.\"$$IP_ExtractionWeekTo$$\", ? )
)\"ccf-edw.self-service.DOIP::R_CA_B_DemandPlan\"
WHERE (1 <> 0)"

dbReadTable won't pull data but dbListFields will see correct fields

I am trying to pull data from a SQL database that I have access to. I can connect to the database, see the tables and get the fields associated with a given table, but cannot read a table into an R variable.
I'm working in R Studio, in case this makes a difference.
I have tried using online code snippets (new to R) and these work, except for the dbReadTable() examples. I have used both "Payments" and name="Payments" as the second argument, and both with and without "" quotes.
library(DBI)
con<-(dbConnect(odbc::odbc(), .connection_string="Driver={SQL Server},
Server=example_1234
Database=exampleDB
TrustedConnection=TRUE")
testing123 <- dbListFields(con,"Payments")
testing456 <- dbReadTable(con,"Payments")
I expect a connection to the database which is now named con. This works.
I expect testing123 to contain all the fields in "Payments". This also works.
I expect testing456 to be a data.frame copy of Payments. This produces:
Error: 'SELECT * FROM "Payments"
nanodbc/nanobdc.cpp:1587 42s02 [Microsoft][ODBC SQL SERVER DRIVER][SQL SERVER]Invalid pbject name 'Payments'.
It's slightly different without "Payments" as the argument - simply saying "Object "Payments" not found".
Any help much appreciated.
I suspect that it's because your table is in a different catalog or schema.
Rationale: DBI::dbListFields is doing select * from ... limit 0 (which is not correct syntax for sql server), but odbc::dbListFields is really calling a C++ function connection_sql_columns that is SQL Server specific. It might be permitting you to be a touch sloppy in that it will find the table even if you do not specify the catalog and/or schema. This is why your dbListFields is working. However, DBI::dbReadTable is really doing select * from ... under the hood (and odbc:: is not overriding it), so it is not allowing you to omit the schema (and/or catalog).
First, find the specific table information for your case:
DBI::dbGetQuery(con, "select top 1 table_catalog, table_schema, table_name, column_name from information_schema.columns where table_name='events'")
# table_catalog table_schema table_name column_name
# 1 my_catalog dbo Payments Id
(I'm projecting what you'll find.)
From here, try one of the following until it works:
x <- DBI::dbReadTable(con, DBI::SQL("[Payments]")) # equivalent to the original
x <- DBI::dbReadTable(con, DBI::SQL("[dbo].[Payments]"))
x <- DBI::dbReadTable(con, DBI::SQL("[my_catalog].[dbo].[Payments]"))
My guess is that DBI::dbGetQuery(con, "select top 1 * from Payments") will not work, so for "regular queries" you'll need to use the same hierarchy of catalog.schema.table, such as one of
DBI::dbGetQuery(con, "select top 1 * from dbo.Payments")
DBI::dbGetQuery(con, "select top 1 * from [dbo].[Payments]")
DBI::dbGetQuery(con, "select top 1 * from [my_catalog].[dbo].[Payments]")
(The use of the [ and ] quoted-identifier brackets are often a personal preference, strictly required in only some corner cases.)
Try changing your con argument just slightly:
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "example_1234",
Database = "exampleDB",
TrustedConnection = TRUE)
# read table to df
testing456 <- dbReadTable(con,"Payments")
# you can also use SQL queries directly, such as:
testing789 <- dbGetQuery(con, statement = "SELECT * FROM Payments WHERE ...")

Query data from Hana database with R, dealing with quotes

I use R to read data from a Hana database. Some of the table names include backslashes, which forces me to use quotation marks. I cannot read these tables using R. Let me show you an example ...
This SQL works in Hana:
SELECT COUNT(*) FROM P3O."/BBB/BBB";
When I try to use the same code to read the data with R from the Hana database I get these errors:
library("RODBC")
channel <- odbcConnect("xxx",uid="xxx",pwd="xxx")
query <-
paste("'","SELECT COUNT(*) FROM P30.", "\"/BBB/BBB\"","'",sep="")
RAW_dataHana <- sqlQuery(channel, query)
close(channel)
I get following errors:
Syntax error or access violation;257 sql syntax error: incorrect
syntax near \"SELECT COUNT() FROM ... [2] "[RODBC] ERROR: Could not
SQLExecDirect ''SELECT COUNT() FROM P30.\"/BBB/BBB\"''"
I think it has something to do with the quotation, but when I check the code with this, I think I get the correct query:
x = paste("'","SELECT COUNT(*) FROM P30.", "\"/BBB/BBB\"", "'",sep="")
cat(x)
> cat(x)
'SELECT COUNT(*) FROM P30."/BBB/BBB"'
Just got now to check my RODBC test code...
The easiest way to handle the quoting is to enclose the query string in single quotes, e.g.:
sales_fact<-sqlQuery (ch, 'SELECT TOP 200 "ORDERID", "VAR_INDICATOR",
sum("ORDER_CNT") AS "ORDER_CNT",
sum("VARIANCE") AS "VARIANCE",
sum("VARIANCE_PCT") AS "VARIANCE_PCT",
sum("BUDGET") AS "BUDGET",
sum("ACTUAL") AS "ACTUAL"
FROM "_SYS_BIC"."test/ODERS_CV"
GROUP BY "ORDERID", "VAR_INDICATOR"')
This also works with paste():
queryText <- 'SELECT TOP 200 "ORDERID", "VAR_INDICATOR",
sum("ORDER_CNT") AS "ORDER_CNT",...'
queryText <- paste(sep = '', queryText, ' "_SYS_BIC"."test/ODERS_CV"
GROUP BY "ORDERID", "VAR_INDICATOR"')