RODBC package: How to get a logical value for the "Does the Table Exists?" query type? - sql

I am trying to transform the R/Shiny/SQL application to use data from SQL Server instead of Oracle. In the original code there is a lot of the following type conditions: If the table exists, use it as a data set, otherwise upload new data. I was looking for a counterpart of dbExistsTable command from DBI/ROracle packages, but the odbcTableExists is unfortunately just internal RODBC command not usable in R environment. Also a wrapper for RODBC package, allowing to use DBI type commands - RODBCDBI seems not working. Any ideas?
Here is some code example:
library(RODBC)
library(RODBCDBI)
con <- odbcDriverConnect('driver={SQL
Server};server=xx.xx.xx.xxx;database=test;uid=user;pwd=pass123')
odbcTableExists(con, "table")
Error: could not find function "odbcTableExists"
dbExistsTable(con,"table")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbExistsTable’ for signature ‘"RODBC", "character"’

You could use
[Table] %in% sqlTables(conn)$TABLE_NAME
Where [Table] is a character string of the table you are looking for.

Related

How to use dbWriteTable on a connection with no default Database

I've seen many posts on SO and the DBI Github regarding trouble using DBI::dbWriteTable (i.e. [1], [2]). These mostly have to do with the use of non-default schemas or whatnot.
That's not my case.
I have a server running SQL Server 2014. This server contains multiple databases.
I'm developing a program which interacts with many of these databases at the same time. I therefore defined my connection using DBI::dbConnect() without a Database= argument.
I've so far only had to do SELECTs on the databases, and this connection works just fine with dbGetQuery(). I just need to name my tables including the database names: DatabaseFoo.dbo.TableBar, which is more than fine since it makes things transparent and intentional. It also stops me from being lazy and making some calls omitting the Database name on whichever DB I named in the connection.
I now need to add data to a table, and I can't get it to work. A call to
DBI::dbWriteTable(conn, "DatabaseFoo.dbo.TableBar", myData, append = TRUE)
works, but creates a table named DatabaseFoo.dbo.TableBar in the master Database, which isn't what I meant (I didn't even know there was a master Database).
The DBI::dbWriteTable man page states the name should be
A character string specifying the unquoted DBMS table name, or the result of a call to dbQuoteIdentifier().
So I tried dbQuoteIdentifier() (and a few other variations):
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
"DatabaseFoo.dbo.TableBar"),
myData)
# no error, same problem as above
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar")),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar"),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar")),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
DBI::dbWriteTable(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar"),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
In the DBI::Id() attempts, I also tried using cluster instead of catalog. No effect, identical error.
However, if I change my dbConnect() call to add a Database="DatabaseFoo" argument, I can simply use dbWriteTable(conn, "TableBar", myData) and it works.
So the question becomes, am I doing something wrong? Is this related to the problems in the other questions?
This is a shortcoming in the DBI package. The dev version DBI >= 1.0.0.9002 no longer suffers from this problem, will hit CRAN as DBI 1.1.0 soonish.

Using Jinja template variables with BigQueryOperator in Airflow

I'm attempting to use the BigQueryOperator in Airflow by using a variable to populate the sql= attribute. The problem I'm running into is that the file extension is dropped when using Jinja variables. I've setup my code as follows:
dag = DAG(
dag_id='data_ingest_dag',
template_searchpath=['/home/airflow/gcs/dags/sql/'],
default_args=DEFAULT_DAG_ARGS
)
bigquery_transform = BigQueryOperator(
task_id='bq-transform',
write_disposition='WRITE_TRUNCATE',
sql="{{dag_run.conf['sql_script']}}",
destination_dataset_table='{{dag_run.conf["destination_dataset_table"]}}',
dag=dag
)
The passed variable contains the name of the SQL file stored in the separate SQL directory. If I pass the value as a static string, sql="example_file.sql", everything works fine. However, when I pass the example_file.sql using Jinja template variable it automatically drops the file extension and I receive this error:
BigQuery job failed.
Final error was: {u'reason': u'invalidQuery', u'message': u'Syntax error: Unexpected identifier "example_file" at [1:1]', u'location': u'query'}
Additionally, I've tried hardcoding ".sql" to the end of the variable anticipating that the extension would be dropped. However, this causes the entire variable reference to be interpreted as as string.
How do you use variables to populate BigQueryOperator attributes?
Reading the BigQuery operator docstring it seems that you can provide the sql statement in 2 ways:
1. As a string that can contain templating macros
2. A reference to a file that can contain templating macros (the file, not the file name).
You cannot template the file name but only the SQL statement. In fact, your error message shows that BigQuery did not recognize the identifier "example_file". If you inspect the BigQuery history for the project which ran that query, you will see that the query string was "example_file.sql" which is not a valid SQL statement, thus the error.

Apache spark jdbc connect to apache drill error

I am sending query to apache drill from apache spark. I am getting the following error:
java.sql.SQLException: Failed to create prepared statement: PARSE
ERROR: Encountered "\"" at line 1, column 23.
When traced, I found I need to write a custom sql dialect. The problem I do not find any examples for pyspark. All the examples are for scala or java. Any help is highly appreciated.!
Here is the pyspark code :
`dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:drill:zk=ip:2181;schema=dfs").option("driver","org.apache.drill.jdbc.Driver").option("dbtable","dfs.`/user/titanic_data/test.csv`").load()`
Looks like you have used a double quote in your SQL query (please share your SQL).
By default Drill uses back tick for quoting identifiers - `
But you can change it by setting the system/session option (when you are already connected to Drill by JDBC for example) or you can specify it in JDBC connecting string. You can find more information here:
https://drill.apache.org/docs/lexical-structure/#identifier-quotes
I navigated to the drill web ui and updated the planner.parser.quoting_identifiers parameter to ". Then I edited my query as below:
dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:drill:zk=ip:2181;schema=dfs;").option("driver","org.apache.drill.jdbc.Driver").option("dbtable","dfs.\"/user/titanic_data/test.csv\"").load()
And it worked like charm!

How to use insert_job

I want to run a Bigquery SQL query using insert method.
I ran the following code just like so:
JobConfigurationQuery = Google::Apis::BigqueryV2::JobConfigurationQuery
bq = Google::Apis::BigqueryV2::BigqueryService.new
scopes = [Google::Apis::BigqueryV2::AUTH_BIGQUERY]
bq.authorization = Google::Auth.get_application_default(scopes)
bq.authorization.fetch_access_token!
query_config = {query: "select colA from [dataset.table]"}
qr = JobConfigurationQuery.new(configuration:{query: query_config})
bq.insert_job(projectId, qr)
and I got an error as below:
Caught error invalid: Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0:
Please let me know how to use the insert_job method.
I'm not sure what client library you're using, but insert_job probably takes a JobConfiguration. You should create one of those and set the query parameter to equal your JobConfigurationQuery you've created.
This is necessary because you can insert various jobs (load, copy, extract) with different types of configurations to this one API method, and they all take a single configuration type with a subfield that specifies which type and details about the job to insert.
More info from BigQuery's documentation:
jobs.insert documentation
job resource: note the "configuration" field and its "query" subfield

See the SQL commands generated by EntityFramework: Cast exception

Based on this and this, I'm doing the following to get the SQL enerated by Entity Framework 5.0
var query = from s in db.ClassesDetails
where s.ClassSet == "SetOne"
orderby s.ClassNum
select s.ClassNum;
var objectQuery = (System.Data.Objects.ObjectQuery)query; // <= problem!
var sql = objectQuery.ToTraceString();
However on the second line I get the following exception:
Unable to cast object of type 'System.Data.Entity.Infrastructure.DbQuery`1[System.Int16]' to type 'System.Data.Objects.ObjectQuery'.
Did something change since those SO answers were posted? What do I need to do to get the queries as strings? We're running against Azure SQL so can't run the usual SQL profiler tools :(
ObjectQuery is created when you are using ObjectContext. When you are using DbContext it uses and creates DbQuery. Also, note that this is actually not a DbQuery but DbQuery<T>. I believe that to display SQL when having DbQueries you can just do .ToString() on the DbQuery instance so no cast should be required. Note that parameter values will not be displayed though. Parameter values were added to the output very recently in EF6 - if you need this you can try the latest nightly build from http://entityframework.codeplex.com