I'm generating a JSON in R Shiny and the next step would be to send that JSON to SQL Server for further processing and fetching a results
When I use glue_sql to interpolate the json the resulting query executes successfully in SQL Management Studio but returns a data.frame with 0 obs. and 0 variables in R.
jsoninput <-'[
{"start_date":20131231,"end_date":20151231,"fetch_date":20151231,"id":1},
{"start_date":20121231,"end_date":20141231,"fetch_date":20151231,"id":2},
{"start_date":20121231,"end_date":20141231,"fetch_date":20141231,"id":3}
]'
query <- glue::glue_sql("USE DWH EXEC usp_fetchFromJSON #p_json = {jsoninput}", .con = dbConnectionObject_DWH)
res <- DBI::dbGetQuery(conn = dbConnectionObject_DWH, query)
USE DWH EXEC usp_fetchFromJSON #p_json = '[
{"start_date":20131231,"end_date":20151231,"fetch_date":20151231,"id":1},
{"start_date":20121231,"end_date":20141231,"fetch_date":20151231,"id":2},
{"start_date":20121231,"end_date":20141231,"fetch_date":20141231,"id":3}
]'
Thanks!
Related
I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()
I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.
I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.
I am using R to handle large datasets (largest dataframe 30.000.000 x 120). These are stored in Azure Datalake Storage as parquet files, and we would need to query these daily and restore these in a local SQL database. Parquet files can be read without loading the data into memory, which is handy. However, creating SQL tables from parquuet files is more challenging as I'd prefer not to load the data into memory.
Here is the code I used. Unfortunately, this is not a perfect reprex as the SQL database need to exist for this to work.
# load packages
library(tidyverse)
library(arrow)
library(sparklyr)
library(DBI)
# Create test data
test <- data.frame(matrix(rnorm(20), nrow=10))
# Save as parquet file
write_parquet(test2, tempfile(fileext = ".parquet"))
# Load main table
sc <- spark_connect(master = "local", spark_home = spark_home_dir())
test <- spark_read_parquet(sc, name = "test_main", path = "/tmp/RtmpeJBgyB/file2b5f4764e153.parquet", memory = FALSE, overwrite = TRUE)
# Save into SQL table
DBI::dbWriteTable(conn = connection,
name = DBI::Id(schema = "schema", table = "table"),
value = test)
Is it possible to write a SQL table without loading parquet files into memory?
I lack the experience with T-sql bulk import and export but this is likely where you'll find your answer.
library(arrow)
library(DBI)
test <- data.frame(matrix(rnorm(20), nrow=10))
f <- tempfile(fileext = '.parquet')
write_parquet(test2, f)
#Upload table using bulk insert
dbExecute(connection,
paste("
BULK INSERT [database].[schema].[table]
FROM '", gsub('\\\\', '/', f), "' FORMAT = 'PARQUET';
")
)
here I use T-sql's own bulk insert command.
Disclaimer I have not yet used this command in T-sql, so it may riddled with error. For example I can't see a place to specify snappy compression within the documentation, although it can be specified if one instead defined a custom file format with CREATE EXTERNAL FILE FORMAT.
Now the above only inserts into an existing table. For your specific case, where you'd like to create a new table from the file, you would likely be looking more for OPENROWSET using CREATE TABLE AS [select statement].
column_definition <- paste(names(column_defs), column_defs, collapse = ',')
dbExecute(connection,
paste0("CREATE TABLE MySqlTable
AS
SELECT *
FROM
OPENROWSET(
BULK '", f, "' FORMAT = 'PARQUET'
) WITH (
", paste0([Column definitions], ..., collapse = ', '), "
);
")
where column_defs would be a named list or vector describing giving the SQL data-type definition for each column. A (more or less) complete translation from R data types to is available on the T-sql documentation page (Note two very necessary translations: Date and POSIXlt are not present). Once again disclaimer: My time in T-sql did not get to BULK INSERT or similar.
I have written several R scripts which I call in RLANG stored procedures in SAP HANA.
So far, the scripts worked fine until I included a for-loop in the R script. I got this error.
Could not execute 'CREATE_PROCEDURE USE_ML(IN pred "PRED", IN model "MODEL", OUT result "RES") LANGUAGE RLANG ...'
SAP DBTech JDBC: [257]: sql syntax error: unterminated external language"
The piece of code I have included is similar to the following one and works fine if launched directly in my R console.
pred <- data.frame(vendor = as.factor(c("John", "Jack", "John", "Jack")),
product = as.factor(c("Milk", "Water", "Beef", "Water")))
modLevel <- list(vendor = as.factor(c("John", "William", "Jack")),
product = as.factor(c("Milk","Beef", "Water", "Peanut")))
params <- c("vendor", "product")
for (p in params){
pred[,p] <- factor(pred[,p], levels(modLevel[[p]]))
}
Needless to say that I have a larger amount of parameters I want to pass in this for-loop.
My question is the following one. Is it (1) possible to include this for-loop from R in the SQL statement without getting the syntax error or (2) must I change the structure of my script (if so, how can I) ?
Any help would be very much appreciated.
EDIT Here is the full SQL procedure in HANA.
DROP PROCEDURE USE_ML_MODEL;
CREATE PROCEDURE USE_ML_MODEL(IN pred "PRED", IN model "MODEL", OUT result "RES")
LANGUAGE RLANG AS
BEGIN
modLevel <- unserialize(model$MOD_LEV[[1]])
params <- c("VENDOR", "PRODUCT")
for (p in params){
pred[,p] <- factor(pred[,p], levels(modLevel[[p]]))
}
result <- pred
END;
DROP PROCEDURE SQL_R_USE_MODEL;
CREATE PROCEDURE SQL_R_USE_MODEL(OUT result "RES")
LANGUAGE SQLSCRIPT AS
BEGIN
pred = SELECT VENDOR, PRODUCT FROM "PRED";
model = SELECT * FROM "MODEL";
CALL USE_ML_MODEL(:pred, :model , result);
END;
CALL SQL_R_USE_MODEL("RES") WITH OVERVIEW;
It appears that using a while loop in the R script solved the problem.
p <- 1
while (p < length(params)){
pred[,params[p]] <- factor(pred[,params[p]], levels(modLevel[[params[p]]]))
p <- p+1
}
I'll leave the question open in case someone can explain what is the problem using for.
I have a number of functions written on our Microsoft SQL servers.
I can easily access and query all data normally, but I cannot execute functions on the server using RODBC.
How can I execute sql-functions using R? Are there other packages that can do this?
Or do I need to switch strategies completely?
Example:
require(RODBC)
db <- odbcConnect("db")
df <- sqlQuery(channel = db, query = "USE [Prognosis]
GO
SELECT * FROM [dbo].[Functionname] ("information_variable")
GO" )
Error message:
"42000 102 [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near 'GO'."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'USE... "
This turned out to work:
df <- sqlQuery(channel = db,
query = "SELECT * FROM [dbo].[Functionname] ("information_variable")" )
So I dropped USE [The_SQL_TABLE] and GO