Execute a for loop in a R script in SAP HANA - sql

I have written several R scripts which I call in RLANG stored procedures in SAP HANA.
So far, the scripts worked fine until I included a for-loop in the R script. I got this error.
Could not execute 'CREATE_PROCEDURE USE_ML(IN pred "PRED", IN model "MODEL", OUT result "RES") LANGUAGE RLANG ...'
SAP DBTech JDBC: [257]: sql syntax error: unterminated external language"
The piece of code I have included is similar to the following one and works fine if launched directly in my R console.
pred <- data.frame(vendor = as.factor(c("John", "Jack", "John", "Jack")),
product = as.factor(c("Milk", "Water", "Beef", "Water")))
modLevel <- list(vendor = as.factor(c("John", "William", "Jack")),
product = as.factor(c("Milk","Beef", "Water", "Peanut")))
params <- c("vendor", "product")
for (p in params){
pred[,p] <- factor(pred[,p], levels(modLevel[[p]]))
}
Needless to say that I have a larger amount of parameters I want to pass in this for-loop.
My question is the following one. Is it (1) possible to include this for-loop from R in the SQL statement without getting the syntax error or (2) must I change the structure of my script (if so, how can I) ?
Any help would be very much appreciated.
EDIT Here is the full SQL procedure in HANA.
DROP PROCEDURE USE_ML_MODEL;
CREATE PROCEDURE USE_ML_MODEL(IN pred "PRED", IN model "MODEL", OUT result "RES")
LANGUAGE RLANG AS
BEGIN
modLevel <- unserialize(model$MOD_LEV[[1]])
params <- c("VENDOR", "PRODUCT")
for (p in params){
pred[,p] <- factor(pred[,p], levels(modLevel[[p]]))
}
result <- pred
END;
DROP PROCEDURE SQL_R_USE_MODEL;
CREATE PROCEDURE SQL_R_USE_MODEL(OUT result "RES")
LANGUAGE SQLSCRIPT AS
BEGIN
pred = SELECT VENDOR, PRODUCT FROM "PRED";
model = SELECT * FROM "MODEL";
CALL USE_ML_MODEL(:pred, :model , result);
END;
CALL SQL_R_USE_MODEL("RES") WITH OVERVIEW;

It appears that using a while loop in the R script solved the problem.
p <- 1
while (p < length(params)){
pred[,params[p]] <- factor(pred[,params[p]], levels(modLevel[[params[p]]]))
p <- p+1
}
I'll leave the question open in case someone can explain what is the problem using for.

Related

R executing T-SQL stored procedure with JSON as input

I'm generating a JSON in R Shiny and the next step would be to send that JSON to SQL Server for further processing and fetching a results
When I use glue_sql to interpolate the json the resulting query executes successfully in SQL Management Studio but returns a data.frame with 0 obs. and 0 variables in R.
jsoninput <-'[
{"start_date":20131231,"end_date":20151231,"fetch_date":20151231,"id":1},
{"start_date":20121231,"end_date":20141231,"fetch_date":20151231,"id":2},
{"start_date":20121231,"end_date":20141231,"fetch_date":20141231,"id":3}
]'
query <- glue::glue_sql("USE DWH EXEC usp_fetchFromJSON #p_json = {jsoninput}", .con = dbConnectionObject_DWH)
res <- DBI::dbGetQuery(conn = dbConnectionObject_DWH, query)
USE DWH EXEC usp_fetchFromJSON #p_json = '[
{"start_date":20131231,"end_date":20151231,"fetch_date":20151231,"id":1},
{"start_date":20121231,"end_date":20141231,"fetch_date":20151231,"id":2},
{"start_date":20121231,"end_date":20141231,"fetch_date":20141231,"id":3}
]'
Thanks!

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

postgres cursor unresolved reference in python

I wish to check if a record exists and then if it does i want to read the records in the other table. I am using the same cursor of the database that i created but it shows unresolved reference for the cursor inside the if block.
My code:
import psycopg2
conn=psycopg2.connect(host='localhost', database='my_first_db', user='postgres', password='postgres')
curr= conn.cursor()
res=curr.execute("select EXISTS(select * from teachers where t_name='xoxo' AND pass='xoxo2020')")
if curr.fetchone()[0]==1 :
{
curr.execute("select * from students")
result=curr.fetchall()
for x in result:
print(x)
#print('Table exists')
}
else:
print("not found")
print(res)
curr.close()
conn.close()
the curr in the second line of if block shows the unresolved error.
Thanks.
My familiarization with psycopg2 limited so you may have to adjust the following some. But in straight sql this can be accomplished is a single statement:
select *
from students
where exists (select null
from teachers
where t_name = 'xoxo'
and password = 'xoxo2020'
);
As best as I can determine this translates to psycopg2 as (subject ot above):
conn=psycopg2.connect(host='localhost', database='my_first_db', user='postgres', password='postgres')
curr= conn.cursor()
curr.execute("select * from students where exists (select null from teachers where t_name='xoxo' and password='xoxo2020')")
if curr.rowcount > 0
result=curr.fetchall()
for x in result:
print(x)
else:
print("not found")
curr.close()
conn.close()
The main idea is when working with sql stop thinking in terms of
check A; If it exists then do B;
But rather think in terms of
do B where A;
In other words let a single sql statement do ALL the work, including any checking needed.
for me it worked by creating a function outside, for the working inside the if statement and called the function in the if block. and it works fine. Though any answers for the query are welcome.

R Pass required variable from ODBC/HANA connection to sql statement

I have a table I am trying to call with my usual method
sql <- 'SELECT TOP 10 *
FROM "_SYS_BIC"."data-path.self-service.DOIP/table_name"'
df <- dbGetQuery(jdbcConnection, sql)
and receive the error
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", :
Unable to retrieve JDBC result set for SELECT TOP 10 *
FROM "_SYS_BIC"."data-path.self-service.DOIP/table_name" (SAP DBTech JDBC: [2048]: column store error: search table error: [34023] Instantiation of calculation model failed;exception 306106: Undefined variable: $$IP_ExtractionWeekFrom$$. Variable is marked as required but not set in the query)
I've been trying to insert IP_ExtractionWeekFrom into the sql statement with a where clause with no luck
param1 <- 201943
sql <- 'SELECT TOP 10 *
FROM "_SYS_BIC"."ccf-edw.self-service.DOIP/R_CA_B_DemandPlan" where
"$$IP_ExtractionWeek$$" = ?'
SpringVisit <- dbGetQuery(jdbcConnection, sql, param1)
I've tried the term surrounded by the "$$" and without, and both with and without "$$" sourrounded in quotes and not. Usually am met with an "invalid column name" error.
Is this supposed to be called with something other than a where clause?
Consider maintaining your working Tableau query with the integration of parameters in R with properly handling of double quotes for identifiers and single quotes for literals.
Additionally, parameterization is not supported with the old ('PLACEHOLDER'= ('<varname>', <varvalue>)) syntax.
Instead, as explained in How to escape sql injection from HANA placeholder use the PLACEHOLDER."<varname>" => ? syntax.
param1 <- 201943
sql <- "SELECT TOP 10 *
FROM \"_SYS_BIC\".\"ccf-edw.self-service.DOIP/R_CA_B_DemandPlan\"(
PLACEHOLDER.\"$$IP_ExtractionWeekFrom$$\", ?),
PLACEHOLDER.\"$$IP_ExtractionWeekTo$$\",?)
)\"_SYS_BIC\".\"ccf-edw.self-service.DOIP/R_CA_B_DemandPlan\"
WHERE (1 <> 0)"
SpringVisit <- dbGetQuery(jdbcConnection, sql, param1, param1)
Additionally, if your JDBC already connects to the schema_SYS_BIC, use the synonymous qualifier :: as original query in order to reference package and calculation view:
sql <- "SELECT TOP 10 *
FROM \"ccf-edw.self-service.DOIP::R_CA_B_DemandPlan\"(
PLACEHOLDER.\"$$IP_ExtractionWeekFrom$$\", ?),
PLACEHOLDER.\"$$IP_ExtractionWeekTo$$\", ? )
)\"ccf-edw.self-service.DOIP::R_CA_B_DemandPlan\"
WHERE (1 <> 0)"

ROracle Errors When Trying to Use Bound Parameters

I'm using ROracle on a Win7 machine running the following R version:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 1.1
year 2014
month 07
day 10
svn rev 66115
language R
version.string R version 3.1.1 (2014-07-10)
nickname Sock it to Me
Eventually, I'm going to move the script to a *nix machine, cron it, and run it with RScript.
I want to do something similar to:
select * from tablename where 'thingy' in ('string1','string2')
This would return two rows with all columns in SQLDeveloper (or Toad, etc).
(Ultimately, I want to pull results from one DB into a single column in a data.frame then use those results to loop through
and pull results from a second db, but I also need to be able to do just this function as well.)
I'm following the documentation for RORacle from here.
I've also looked at this (which didn't get an answer):
Bound parameters in ROracle SELECT statements
When I attempt the query from ROracle, I get two different errors, depending on whether I try a dbGetQuery() or dbSendQuery().
As background, here are the versions, queries and data I'm using:
Driver name: Oracle (OCI)
Driver version: 1.1-11
Client version: 11.2.0.3.0
The connection information is standard:
library(ROracle)
ora <- dbDriver("Oracle")
dbcon <- dbConnect(ora, username = "username", password = "password", dbname = "dbnamefromTNS")
These two queries return the expected results:
rs_send <- dbSendQuery(dbcon, "select * from tablename where columname_A = 'thingy' and rownum <= 1000")
rs_get <- dbGetQuery(dbcon, "select * from tablename where columname_A = 'thingy' and rownum <= 1000")
That is to say, 1000 rows from tablename where 'thingy' exists in columnname_A.
I have a data.frame of one column, with two rows.
my.data = data.frame(RANDOM_STRING = as.character(c('string1', 'string2')))
and str(my.data) returns this:
str(my.data)
'data.frame': 2 obs. of 1 variable:
$ RANDOM_STRING: chr "string1" "string2"
my attempted queries are:
nope <- dbSendQuery(dbcon, "select * from tablename where column_A = 'thingy' and widget_name =:1", data = data.frame(widget_name =my.data$RANDOM_STRING))
which gives me an error of:
Error in .oci.SendQuery(conn, statement, data = data, prefetch = prefetch, :
bind data does not match bind specification
and
not_this_either <- dbGetQuery(dbcon, "select * from tablename where column_A = 'thingy' and widget_name =:1", data = data.frame(widget_name =my.data$RANDOM_STRING))
which gives me an error of:
Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch, :
bind data has too many rows
I'm guessing that my problem is in the data=(widget_name=my.data$RANDOM_STRING) part of the queries, but haven't been able to rubber duck my way through it.
Also, I'm very curious as to why I get two separate and different errors depending on whether the queries use the send (and fetch later) format or the get format.
If you like the tidyverse there's a slightly more compact way to achieve the above using purrr
library(ROracle)
library(purrr)
ora <- dbDriver("Oracle")
con <- dbConnect(ora, username = "username", password = "password", dbname = "yourdbnamefromTNSlist")
yourdatalist <- c(12345, 23456, 34567)
output <- map_df(yourdatalist, ~ dbGetQuery(con, "select * from YourTableNameHere where YOURCOLUMNNAME = :d", .x))
Figured it out.
It wasn't a problem with Oracle or ROracle (I'd suspected this) but with my R code.
I stumbled over the answer trying to solve another problem.
This answer about "dynamic strings" was the thing that got me moving towards a solution.
It doesn't fit exactly, but close enough to rubberduck my way to an answer from there.
The trick is to wrap the whole thing in a function and run an ldply on it:
library(ROracle)
ora <- dbDriver("Oracle")
con <- dbConnect(ora, username = "username", password = "password", dbname = "yourdbnamefromTNSlist")
yourdatalist <- c(12345, 23456, 34567)
thisfinallyworks <- function(x) {
dbGetQuery(con, "select * from YourTableNameHere where YOURCOLUMNNAME = :d", data = x)
}
ldply(yourdatalist, thisfinallyworks)
row1 of results where datapoint in YOURCOLUMNNAME = 12345
row2 of results where datapoint in YOURCOLUMNNAME = 23456
row3 of results where datapoint in YOURCOLUMNNAME = 34567
etc