R Markdown / knitr with sql engine - error when creating new table in Oracle database

R Markdown / knitr with sql engine - error when creating new table in Oracle database - sql

I am writing a R Markdown file with a sequence of sql chunks. Each chunk contains a CREATE TABLE statement that writes data into an Oracle database. For instance:
```{r setup}
library(knitr)
library(ROracle)
con <- dbConnect(dbDriver("Oracle"), ...)
```
```{sql, connection = "con"}
CREATE TABLE TEST AS SELECT * FROM TESTTABLE
```
Ideally, I would like to get a report that contains the SQLs followed by messages of success or failure. However, when I run the chunks, tables are successfully created, but the following error messages appear:
Error in seq_len(ncol(data)) : argument must be coercible to
non-negative integer In addition: Warning message: In
seq_len(ncol(data)) : first element used of 'length.out' argument
This suggests to me that the result of the SQL statement (which is a logical TRUE or FALSE) cannot be transformed and displayed correctly. A workaround is to assign the result of the SQL to R via the output.var option and then print it in a separate R chunk.
```{sql, connection = "con", output.var = "createTableResult"}
CREATE TABLE TEST AS
SELECT * FROM TESTTABLE
```
```{r}
createTableResult
```
[1] TRUE
Is there a way to avoid this workaround?
Can I somehow get the resulting TRUE or FALSE message directly from the SQL chunk?

Related

Create table name using username in Hive query running in Oozie workflow?

I've got a Hive SQL script/action as part of an Oozie workflow. I'm doing a CREATE TABLE AS SELECT to output the results. I want to name the table using the username plus an appended string (e.g. "User123456_output_table"), but can't seem to get the correct syntax.
set tablename=${hivevar:current_user()};
CREATE TABLE `${hiveconf:tablename}_output_table` AS SELECT ...
That doesn't work and gives:
Error while compiling statement: FAILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: ${hivevar:current_user()%7D_output_table
Or changing the first line to set tablename=${current_user()}; starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [${current_user()}_output_table]: is not a valid table name
Or changing the first line to set tablename=current_user(); starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [current_user()_output_table]: is not a valid table name
Alternatively, is there a way to pass the username from the Oozie workflow via a parameter?
I'm using Hue to do all this rather than the command line.
Thanks

This is wrong: set tablename=${hivevar:current_user()}; - it will not be resolved and substituted as is.
Hive does not calculate variables before substitution, it substitutes them as is, all functions in variables are NOT calculated. variables are just text replacement.
This:
set tablename=current_user();
CREATE TABLE `${hiveconf:tablename}_output_table` ...
gets resolved as
CREATE TABLE `current_user()_output_table` ...
And functions are not supported in table names, it will not work this way.
The solution is to calculate functions outside the script and pass them as parameters.
See this blog: https://prodlife.wordpress.com/2013/12/06/parameterizing-hive-actions-in-oozie-workflows/

How to write tables into Panoply using RPostgreSQL?

I am trying to write a table into my data warehouse using the RPostgreSQL package
library(DBI)
library(RPostgreSQL)
pano = dbConnect(dbDriver("PostgreSQL"),
host = 'db.panoply.io',
port = '5439',
user = panoply_user,
password = panoply_pw,
dbname = mydb)
RPostgreSQL::dbWriteTable(pano, "mtcars", mtcars[1:5, ])
I am getting this error:
Error in postgresqlpqExec(new.con, sql4) :
RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN"
LINE 1: ..."hp","drat","wt","qsec","vs","am","gear","carb" ) FROM STDIN
^
)
The above code writes into Panoply as a 0 row, 0 byte table. Columns seem to be properly entered into Panoply but nothing else appears.

Fiest and most important redshift <> postgresql.
Redshift does not use the Postgres bulk loader. (so stdin is NOT allowed).
There are many options available which you should choose depending on your need, especially consider the volume of data.
For high volume of data you should write to s3 first and then use redshift copy command.
There are many options take a look at
https://github.com/sicarul/redshiftTools
for low volume see
inserting multiple records at once into Redshift with R

how to run multiple sql statements in airflow jinja template using jdbc hook

Trying to run a hive sql using jdbchook and jinja template through airflow. Template works fine for a single sql statement but throws a parsing error with multiple statements.
DAG
p1 = JdbcOperator(
task_id=DAG_NAME+'_create',
jdbc_conn_id='big_data_hive',
sql='/mysql_template.sql',
params={'env': ENVIRON},
autocommit=True,
dag=dag)
Template
create table {{params.env}}_fct.hive_test_templated
(cookie_id string
,sesn_id string
,load_dt string)
;
INSERT INTO {{params.env}}_fct.hive_test_templated
select* from {{params.env}}_fct.hive_test
;
Error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 7:0 missing EOF at ';' near ')'
The template queries works fine when I run it in Hue.

tobi is correct, the easiest way to do this is to parse your SQL statement into a list of SQL's and execute them sequentially.
The way that I do this is by using the sqlparse python library to split the string into a list of SQL statements and then pass them down to the hook (inherits dbapi hook) - the dbapi base class accepts a list of SQL statements and executes the sequentially, this could easily be implemented in the hive hook too. In the following example my "CustomSnoqflakeHook" inherits from the dbapi hook and the run method in the dbapi hook accpets a list of SQL statements :
hook = hooks.CustomSnowflakeHook(snowflake_conn_id=self.snowflake_conn_id)
sql = sqlparse.split(sqlparse.format(self.sql, strip_comments=True))
hook.run(
sql,
autocommit=self.autocommit,
parameters=self.parameters)
From the dbapi hook:
def run(self, sql, autocommit=False, parameters=None):
"""
Runs a command or a list of commands. Pass a list of sql
statements to the sql parameter to get them to execute
sequentially
:param sql: the sql statement to be executed (str) or a list of
sql statements to execute
:type sql: str or list
:param autocommit: What to set the connection's autocommit setting to
before executing the query.
:type autocommit: bool
:param parameters: The parameters to render the SQL query with.
:type parameters: mapping or iterable
"""
if isinstance(sql, basestring):
sql = [sql]
with closing(self.get_conn()) as conn:
if self.supports_autocommit:
self.set_autocommit(conn, autocommit)
with closing(conn.cursor()) as cur:
for s in sql:
if sys.version_info[0] < 3:
s = s.encode('utf-8')
self.log.info(s)
if parameters is not None:
cur.execute(s, parameters)
else:
cur.execute(s)
if not getattr(conn, 'autocommit', False):
conn.commit()

It seems to me that Hue parses the statement differently. Sometimes there are statement separators implemented which allow this to happen.
Airflow seems not to have those separators.
So the easiest way would be to separate the two statements and execute those statements in two separate tasks.

R-SQL Invalid value from generic function ‘fetch’, class “try-error”, expected “data.frame”

I am having a problem to fetch some data from database using ROracle. Everything works perfect (I am getting the data from different tables without any problem), but one of the tables throws an error:
library(ROracle)
con <- dbConnect(dbDriver("Oracle"),"xxx/x",username="user",password="pwd")
spalten<- dbListFields(con, name="xyz", schema = "x") # i still get the name of the columns for this table
rs <- dbSendQuery(con, "Select * From x.xyz") # no error
data <- fetch(rs) # this line throws an error
dbDisconnect(con)
Fehler in .valueClassTest(ans, "data.frame", "fetch") : invalid
value from generic function ‘fetch’, class “try-error”, expected
“data.frame”
I followed this question: on stackoverflow, and i selected the columns
rs <- dbSendQuery(con, "Select a From x.xyz")
but none of it worked and gave me the same error.
Any ideas what am I doing wrong?
P.S. I have checked the sql query in Oracle SQL Developer, and I do get the data table there
Update:
If anyone can help me to locate/query my Oracle error log, then perhaps I can find out what is actually happening on the database server with my troublesome query.

This is for debugging purposes only. Try running your code in the following tryCatch construct. It will display all warnings and errors which are happening.
result <- tryCatch({
con <- dbConnect(dbDriver("Oracle"),"xxx/x",username="user",password="pwd")
spalten <- dbListFields(con, name="xyz", schema = "x")
rs <- dbSendQuery(con, "Select * From x.xyz") # no error
data <- fetch(rs) # this line throws an error
dbDisconnect(con)
}, warning = function(war) {
print(paste("warning: ",war))
}, error = function(err) {
print(paste("error: ",err))
})
print(paste("result =",result))

I know I'm late to the game on this question, but I had a similar issue and discovered the problem: My query also ran fine in SQL Developer, but that only fetches 50 rows at a time. ROracle fetches the whole data set. So, an issue that appears later in the data set won't show immediately in SQL Developer. Once I pages through results in SQL Developer, it threw and error at a certain point because there was a problem with the actual value stored in the table. Not sure how an invalid value got there, but fixing it fixed the problem in ROracle.

select count(*) not working with perl DBI

The goal of my code is to return a count of the number of rows in a table based on one specific parameter.
Here is code that works:
######### SQL Commands
### Connect to the SQL Database
my $dbh = DBI->connect($data_source, $user, $pass)
or die "Can't connect to $data_source: $DBI::errstr";
### Prepare the SQL statement and execute
my $sth1 = $dbh->selectrow_array("select count(*) from TableInfo where Type = '2'")
or die "Can't connect to $data_source: $DBI::errstr";
### Disconnect from Database statements are completed.
$dbh->disconnect;
######### end SQL Commands
print $sth1;
This will successfully print a number which is 189 in this instance.
When I try to use the same code but change the "Type = '2'" (which should return a value of 2000) I get the following error:
DBD::ODBC::db selectrow_array failed: [Microsoft][ODBC Driver 11 for SQL Server]Numeric value out of range (SQL-22003) at ./getTotalExpSRP.cgi line 33, <DATA> line 225.
Can't connect to dbi:ODBC:BLTSS_SRP: [Microsoft][ODBC Driver 11 for SQL Server]Numeric value out of range (SQL-22003) at ./getTotalExpSRP.cgi line 33, <DATA> line 225.
I've search everywhere but I cannot find out why this happens.
From the symptom of the issue I would guess that there is a limit to the size of the results returned but I cannot find any supporting evidence of this.
I have run a trace on my Microsoft SQL 2005 server and can confirm that the sql statement is being run properly with no errors.
I've viewed my odbc trace log but unfortunately I cannot derive any useful information when comparing a working example to the failed one.
Any help would be appreciated!!
Thanks,

Now we've seen the trace I can explain this. DBD::ODBC calls SQLDescribeCol and is told:
DescribeCol column = 1, name = , namelen = 0, type = unknown(0), precision/column size = 10, scale = 0, nullable = 1
display size = 11
Then it calls SQLColAttribute and is told the column size is 4. Since the column type was unknown (why the driver did that I'm not sure) DBD::ODBC decides to bind the column as a char(4) and so as soon as the count is > 3 digits it will overflow.
The version of DBI and DBD::ODBC being used here is really old and I suspect the latest versions will cope better with this.

Numeric value out of range is a type conversion error. Is TYPE supposed to be a number of a character/string? If it should be a number, use a number
my $sth1 = $dbh->selectrow_array(
"select count(*) from TableInfo where Type=2") # Type=2, not Type='2'
or use placeholders and let Perl and the database driver worry about type conversions
my $sth = $dbh->prepare("select count(*) from TableInfo where Type=?");
$sth->execute(2);
$sth->execute('2'); # same thing
my $st1 = $sth->fetchall_arrayref;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

R Markdown / knitr with sql engine - error when creating new table in Oracle database - sql

Related

Create table name using username in Hive query running in Oozie workflow?

How to write tables into Panoply using RPostgreSQL?

how to run multiple sql statements in airflow jinja template using jdbc hook

R-SQL Invalid value from generic function ‘fetch’, class “try-error”, expected “data.frame”

select count(*) not working with perl DBI

Categories

Resources