How to use dbWriteTable on a connection with no default Database - sql

I've seen many posts on SO and the DBI Github regarding trouble using DBI::dbWriteTable (i.e. [1], [2]). These mostly have to do with the use of non-default schemas or whatnot.
That's not my case.
I have a server running SQL Server 2014. This server contains multiple databases.
I'm developing a program which interacts with many of these databases at the same time. I therefore defined my connection using DBI::dbConnect() without a Database= argument.
I've so far only had to do SELECTs on the databases, and this connection works just fine with dbGetQuery(). I just need to name my tables including the database names: DatabaseFoo.dbo.TableBar, which is more than fine since it makes things transparent and intentional. It also stops me from being lazy and making some calls omitting the Database name on whichever DB I named in the connection.
I now need to add data to a table, and I can't get it to work. A call to
DBI::dbWriteTable(conn, "DatabaseFoo.dbo.TableBar", myData, append = TRUE)
works, but creates a table named DatabaseFoo.dbo.TableBar in the master Database, which isn't what I meant (I didn't even know there was a master Database).
The DBI::dbWriteTable man page states the name should be
A character string specifying the unquoted DBMS table name, or the result of a call to dbQuoteIdentifier().
So I tried dbQuoteIdentifier() (and a few other variations):
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
"DatabaseFoo.dbo.TableBar"),
myData)
# no error, same problem as above
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar")),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::SQL("DatabaseFoo.dbo.TableBar"),
myData)
# Error: Can't unquote DatabaseFoo.dbo.TableBar
DBI::dbWriteTable(conn,
DBI::dbQuoteIdentifier(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar")),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
DBI::dbWriteTable(conn,
DBI::Id(catalog = "DatabaseFoo",
schema = "dbo",
table = "TableBar"),
myData)
# Error: Can't unquote "DatabaseFoo"."dbo"."TableBar"
In the DBI::Id() attempts, I also tried using cluster instead of catalog. No effect, identical error.
However, if I change my dbConnect() call to add a Database="DatabaseFoo" argument, I can simply use dbWriteTable(conn, "TableBar", myData) and it works.
So the question becomes, am I doing something wrong? Is this related to the problems in the other questions?

This is a shortcoming in the DBI package. The dev version DBI >= 1.0.0.9002 no longer suffers from this problem, will hit CRAN as DBI 1.1.0 soonish.

Related

Fetching big SQL table in the web app session

I'm quite new in web app so apologize if my question is abit basic. I'm developing a Web app with R shiny where the inputs are very large tables from Azure SQL server. They are 20 tables each in the order of hundred-thousand rows and hundreds of columns containing numbers, Characters and etc. I have no problem calling them, my main issue is that it takes so much time to fetch everything from Azure SQL server. It takes approximately 20 minutes. So the user of the web app needs to wait quite a long.
I'm using DBI package as follows:
db_connect <- function(database_config_name){
dbConfig <- config::get(database_config_name)
connection <- DBI::dbConnect(odbc::odbc(),
Driver = dbConfig$driver,
Server = dbConfig$server,
UID = dbConfig$uid,
PWD = dbConfig$pwd,
Database = dbConfig$database,
encoding = "latin1"
)
return(connection)
}
and then fetching tables by :
connection <- db_connect(db_config_name)
table <- dplyr::tbl(con, dbplyr::in_schema(fetch_schema_name(db_config_name,table_name,data_source_type), fetch_table_name(db_config_name,table_name,data_source_type)))
I searched a lot but didn't come across a good solution, I appreciate any solutions can tackle this problem.
I work with R accessing SQL Server (not Azure) daily. For larger data (as in your example), I always revert to using the command-line tool sqlcmd, it is significantly faster. The only pain point for me was learning the arguments and working around the fact that it does not return proper CSV, there is post-query munging required. You may have an additional pain-point of having to adjust my example to connect to your Azure instance (I do not have an account).
In order to use this in a shiny environment and preserve its interactivity, I use the processx package to start the process in the background and then poll its exit status periodically to determine when it has completed.
Up front: this is mostly a "loose guide", I do not pretend that this is a fully-functional solution for you. There might be some rough-edges that you need to work through yourself. For instance, while I say you can do it asynchronously, it is up to you to work the polling process and delayed-data-availability into your shiny application. My answer here provides starting the process and reading the file once complete. And finally, if encoding= is an issue for you, I don't know if sqlcmd does non-latin correctly, and I don't know if or how to fix this with its very limited, even antiquated arguments.
Steps:
Save the query into a text file. Short queries can be provided on the command-line, but past some point (128 chars? I don't know that it's clearly defined, and have not looked enough recently) it just fails. Using a query-file is simple enough and always works, so I always use it.
I always use temporary files for each query instead of hard-coding the filename; this just makes sense. For convenience (for me), I use the same tempfile base name and append .sql for the query and .csv for the returned data, that way it's much easier to match query-to-data in the temp files. It's a convention I use, nothing more.
tf <- tempfile()
# using the same tempfile base name for both the query and csv-output temp files
querytf <- paste0(tf, ".sql")
writeLines(query, querytf)
csvtf <- paste0(tf, ".csv")
# these may be useful in troubleshoot, but not always [^2]
stdouttf <- paste0(tf, ".stdout")
stderrtf <- paste0(tf, ".stderr")
Make the call. I suggest you see how fast this is in a synchronous way first to see if you need to add an async query and polling in your shiny interface.
exe <- "/path/to/sqlcmd" # or "sqlcmd.exe"
args <- c("-W", "b", "-s", "\037", "-i", querytf, "-o", csvtf,
"-S", dbConfig$server, "-d", dbConfig$database,
"-U", dbConfig$uid, "-P", dbConfig$pwd)
## as to why I use "\037", see [^1]
## note that the user id and password will be visible on the shiny server
## via a `ps -fax` command-line call
proc <- processx::process$new(command = exe, args = args,
stdout = stdouttf, stderr = stderrtf) # other args exist
# this should return immediately, and should be TRUE until
# data retrieval is done (or error)
proc$is_alive()
# this will hang (pause R) until retrieval is complete; if/when you
# shift to asynchronous queries, do not do this
proc$wait()
One can use processx::run instead of process$new and proc$wait(), but I thought I'd start you down this path in case you want/need to go asynchronous.
If you go with an asynchronous operation, then periodically check (perhaps every 3 or 10 seconds) proc$is_alive(). Once that returns FALSE, you can start processing the file. During this time, shiny will continue to operate normally. (If you do not go async and therefore choose to proc$wait(), then shiny will hang until the query is complete.)
If you make a mistake and do not proc$wait() and try to continue with reading the file, that's a mistake. The file may not exist, in which case it will err with No such file or directory. The file may exist, perhaps empty. It may exist and have incomplete data. So really, make a firm decision to stay synchronous and therefore call proc$wait(), or go asynchronous and poll periodically until proc$is_alive() returns FALSE.
Reading in the file. There are three "joys" of using sqlcmd that require special handling of the file.
It does not do embedded quotes consistently, which is why I chose to use "\037" as a separator. (See [^1].)
It adds a line of dashes under the column names, which will corrupt the auto-classing of data when R reads in the data. For this, we do a two-step read of the file.
Nulls in the database are the literal NULL string in the data. For this, we update the na.strings= argument when reading the file.
exitstat <- proc$get_exit_status()
if (exitstat == 0) {
## read #1: get the column headers
tmp1 <- read.csv(csvtf, nrows = 2, sep = "\037", header = FALSE)
colnms <- unlist(tmp1[1,], use.names = FALSE)
## read #2: read the rest of the data
out <- read.csv(csvtf, skip = 2, header = FALSE, sep = "\037",
na.strings = c("NA", "NULL"), quote = "")
colnames(out) <- colnms
} else {
# you should check both stdout and stderr files, see [^2]
stop("'sqlcmd' exit status: ", exitstat)
}
Note:
After a lot of pain with several issues (some in sqlcmd.exe, some in data.table::fread and other readers, all dealing with CSV-format non-compliance), at one point I chose to stop working with comma-delimited returns, instead opting for the "\037" field Delimiter. It works fine with all CSV-reading tools and has fixed so many problems (some not mentioned here). If you're not concerned, feel free to change the args to "-s", "," (adjusting the read as well).
sqlcmd seems to use stdout or stderr in different ways when there are problems. I'm sure there's rationale somewhere, but the point is that if there is a problem, check both files.
I added the use of both stdout= and stderr= because of a lot of troubleshooting I did, and continue to do if I munge a query. Using them is not strictly required, but you might be throwing caution to the wind if you omit those options.
By the way, if you choose to only use sqlcmd for all of your queries, there is no need to create a connection object in R. That is, db_connect may not be necessary. In my use, I tend to use "real" R DBI connections for known-small queries and the bulk sqlcmd for anything above around 10K rows. There is a tradeoff; I have not measured it sufficiently in my environment to know where the tipping point is, and it is likely different in your case.

Enable Impala Impersonation on Superset

Is there a way to make the logged user (on superset) to make the queries on impala?
I tried to enable the "Impersonate the logged on user" option on Databases but with no success because all the queries run on impala with superset user.
I'm trying to achieve the same! This will not completely answer this question since it does not still work but I want to share my research in order to maybe help another soul that is trying to use this instrument outside very basic use cases.
I went deep in the code and I found out that impersonation is not implemented for Impala. So you cannot achieve this from the UI. I found out this PR https://github.com/apache/superset/pull/4699 that for whatever reason was never merged into the codebase and tried to copy&paste code in my Superset version (1.1.0) but it didn't work. Adding some logs I can see that the configuration with the impersonation is updated, but then the actual Impala query is with the user I used to start the process.
As you can imagine, I am a complete noob at this. However I found out that the impersonation thing happens when you create a cursor and there is a constructor parameter in which you can pass the impersonation configuration.
I managed to correctly (at least to my understanding) implement impersonation for the SQL lab part.
In the sql_lab.py class you have to add in the execute_sql_statements method the following lines
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
where cursor_kwargs is defined in db_engine_specs/impala.py as the following
#classmethod
def get_configuration_for_impersonation(cls, uri, impersonate_user, username):
logger.info(
'Passing Impala execution_options.cursor_configuration for impersonation')
return {'execution_options': {
'cursor_configuration': {'impala.doas.user': username}}}
#classmethod
def get_cursor_configuration_for_impersonation(cls, uri, impersonate_user,
username):
logger.debug('Passing Impala cursor configuration for impersonation')
return {'configuration': {'impala.doas.user': username}}
Finally, in models/core.py you have to add the following bit in the get_sqla_engine def
params = extra.get("engine_params", {}) # that was already there just for you to find out the line
self.cursor_kwargs = self.db_engine_spec.get_cursor_configuration_for_impersonation(
str(url), self.impersonate_user, effective_username) # this is the line I added
...
params.update(self.get_encrypted_extra()) # already there
#new stuff
configuration = {}
configuration.update(
self.db_engine_spec.get_configuration_for_impersonation(
str(url),
self.impersonate_user,
effective_username))
if configuration:
params.update(configuration)
As you can see I just shamelessy pasted the code from the PR. However this kind of works only for the SQL lab as I already said. For the dashboards there is an entirely different way of querying Impala that I did not still find out.
This means that queries for the dashboards are handled in a different way and there isn't something like this
with closing(engine.raw_connection()) as conn:
# closing the connection closes the cursor as well
cursor = conn.cursor(**database.cursor_kwargs)
My gut (and debugging) feeling is that you need to first understand the sqlalchemy part and extend a new ImpalaEngine class that uses a custom cursor with the impersonation conf. Or something like that, however it is not simple (if we want to call this simple) as the sql_lab part. So, the trick is to find out where the query is executed and create a cursor with the impersonation configuration. Easy, isnt'it ?
I hope that this could shed some light to you and the others that have this issue. Let me know if you did find out another way to solve this issue, or if this comment was useful.
Update: something really useful
A colleague of mine succesfully implemented impersonation with impala without touching any superset related, but instead working directly with the impyla lib. A PR was open with the code to change. You can apply the patch directly in the impyla src used by superset. You have to edit both dbapi.py and hiveserver2.py.
As a reminder: we are still testing this and we do not know if it works with different accounts using the same superset instance.

Linked SQL table in Access 2003 (!) not updatable

I'm working in a legacy app for the moment, upgrading Access 2003 to link to SQL Server tables (2008 R2 or later). With tables linked by code, I can insert, but not update or delete. I've tried everything on the web, no dice. Details below.
Being terse so not tl;dr.
Tables first created using upsizing wizard. In use, app has to connect to different ones in same schema, so can't just set and forget. Can't do local DSN's, many installs, though DSN file is possible. But problems there too, DSN not found. Details later.
Before the rest: Soon I'm further updating this app to Access 2016 or so. If this is different enough / easier there, I'll wait a few days. Maybe someone could suggest the best refsite for that.
* problem details follow *
Using a DSN and the UI to link a table, I get an editable table. Hurray.
But when I use the code below (found on every refsite), link is made but only selecting and inserting work. Everything else fails fails fails, no matter what.
Public Function LinkToSqlTable(sqlInstance As String, sqlDb As String,
sqlTableName As String, localTableName As String)
Dim linked As New TableDef
' ***factored-out functionality, known to work: reader can ignore*** '
DeleteTable localTableName
' connection-string steps, placeholders replaced by args '
Dim sCnx As String
sCnx = "ODBC;Driver=SQL Server;Server=_instance_;" & _
"Database=_db_;Integrated Security=SSPI"
sCnx = Replace(sCnx, "_instance_", sqlInstance)
sCnx = Replace(sCnx, "_db_", sqlDb)
' linked-table steps '
Set linked = CurrentDb.CreateTableDef(localTableName)
linked.Connect = sCnx
linked.SourceTableName = sqlTableName
CurrentDb.TableDefs.Append linked
' ui '
RefreshDatabaseWindow
End Function
* ID column or permissions? *
I thought the problem was lack of identity column originally, I added one, but no change. At least now I have a PK field like I should. ;-)
When I manually link table, UI demands to know the ID column. So could it still be it? Fine, but how do I set that in code? Searches revealed nothing.
I assume then it's permissions as sites etc. say. I also took all the steps I could think of to fix that. No dice.
* things I've tried *
Aside from the ID-column stuff I said before, these things (not in order):
Since DSN saved as a file, tried using it as exampled, in cnx string. Fail.
Used DSN contents, carefully winnowed & translated, in cnx string. Fail.
Used connection string from the table that I had connected manually with DSN. Fail.
Changed driver in cnx string across all major options, even omitted it. Fail.
Changed security in cnx to Integrated Security=SSPI and other options, and omitted entirely. Fail.
I added my actual local user as exampled, with and without password. Fail.
(Previous few options tried across earlier options, though not 100% coverage.)
In SQL Server, using SSMS, I tried security power:
Added SQS-authentication login to the instance
Matching user to the default db seen here
Gave that login-user read and write permissions in db here (plus others, sometimes)
Added matching id & pw to the cnx string. Fail.
I tried setting up this db in SQS to have let-everyone-do-everything "security" temporarily. Fail.
This, that, and the other thing. Everything fail!!
So a permissions issue? Some way to use DSN file after all? Mismatched permission settings in my cnx string? Boneheaded oversight? Something else that I've missed? I'm pretty good at both SQL Server and Access, but only at a basic level in their security stuff and connection strings are the devil.
* retrieved table properties *
Just in case they help, I retrieved these (after objects added to TableDefs collection).
** This one, done in UI and with DSN and this-is-ID-field, worked with editing: **
Name = dbo_tblSendTo
Updatable = False
DateCreated = 4/19/2016 11:11:40 AM
LastUpdated = 4/19/2016 11:11:42 AM
Connect = ODBC;Description=SQL Server tables for TeleSales 5;DRIVER=SQL Server Native Client 10.0;SERVER=(local)\sqlexpress;Trusted_Connection=Yes;APP=Microsoft Office 2003;WSID=CMSERVER;DATABASE=TS5_General;
Attributes = 536870912
SourceTableName = dbo.tblSendTo
RecordCount = -1
ValidationRule =
ValidationText =
ConflictTable =
ReplicaFilter =
** And this one, from table linked via code, didn't: **
Name = tblSendTo
Updatable = False
DateCreated = 4/19/2016 11:17:51 AM
LastUpdated = 4/19/2016 11:17:51 AM
Connect = ODBC;Description=SQL Server tables for TeleSales 5;DRIVER=SQL Server Native Client
> 10.0;SERVER=(local)\sqlexpress;Trusted_Connection=Yes;APP=Microsoft Office 2003;WSID=CMSERVER;DATABASE=TS5_General;
Attributes = 536870912
SourceTableName = dbo.tblSendTo
RecordCount = -1
ValidationRule =
ValidationText =
ConflictTable =
ReplicaFilter =
* my plea *
So..... Please someone help me out. I don't like feeling stupid like this, and regrettably I need to do this instead of replacing it with .NET code or similar.
Thanks, anyone who can...
Ed.
Alas, I am able to answer my own question.
edited a little since first posted in reply to HansUp's comments
I had added an identity column to the table that I couldn't edit. However, I had not set it up as a primary key. It turns out that using identity doesn't make something a primary key automatically.
But the latter, making it primary key using either of the 2 possible DDL syntaxes, is crucial. Since I thought I had dealt with the no edits without unique key problem, I focused on permissions.
All of the permissions things here, then, are just a sideshow.
The upshot of this is to be sure to add an identity column and make it a primary key if for some reason your original table schema didn't have that.
If I have the time, I will be trimming the question to reflect what I've discovered.

RODBC package: How to get a logical value for the "Does the Table Exists?" query type?

I am trying to transform the R/Shiny/SQL application to use data from SQL Server instead of Oracle. In the original code there is a lot of the following type conditions: If the table exists, use it as a data set, otherwise upload new data. I was looking for a counterpart of dbExistsTable command from DBI/ROracle packages, but the odbcTableExists is unfortunately just internal RODBC command not usable in R environment. Also a wrapper for RODBC package, allowing to use DBI type commands - RODBCDBI seems not working. Any ideas?
Here is some code example:
library(RODBC)
library(RODBCDBI)
con <- odbcDriverConnect('driver={SQL
Server};server=xx.xx.xx.xxx;database=test;uid=user;pwd=pass123')
odbcTableExists(con, "table")
Error: could not find function "odbcTableExists"
dbExistsTable(con,"table")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbExistsTable’ for signature ‘"RODBC", "character"’
You could use
[Table] %in% sqlTables(conn)$TABLE_NAME
Where [Table] is a character string of the table you are looking for.

DBD::Oracle, Cursors and Environment under mod_perl

Need some help, because I can't find any solution for my problems with DBD::Oracle.
So at first, this is the current situation:
We are running Apache2 with mod_perl 2.0.4 at our company
Apache web server was set up with a startup script which is setting some environment variables (LD_LIBRARY_PATH, ORACLE_HOME, NLS_LANG)
In httpd.conf there are also environment variables for LD_LIBRARY_PATH and ORACLE_HOME (via SetEnv)
We are generally using the perl module DBI with driver DBD::Oracle to connect to our main database
Before we create a new instance of DBI we are setting some perl env variables, too (%ENV). We are setting ORACLE_HOME and NLS_LANG.
So far, this works fine. But now we are extending our system and need to connect to a remote database. Again, we are using DBI and DBD::Oracle. But for now there are some new conditions:
New connection must run in parallel to the existing one
TNSNAMES.ORA for the new connection is placed at a different location (not at $ORACLE_HOME.'/network/admin')
New database contents are provided by stored procedures, which we are fetching with DBD::Oracle and cursors (like explained here: https://metacpan.org/pod/DBD::Oracle#Binding-Cursors)
The stored procedures are returning object types and collection types, containing attributes of oracle type DATE
To get these dates in a readable format, we set a new env variable $ENV{NLS_DATE_FORMAT}
To ensure the date format we additionally alter the session by alter session set nls_date_format ...
Okay, this works fine, too. But only if we make a new connection on the console. New TNS location is found by the script, connection could be established and fetching data from the procedures by cursor is also working. Alle DATE types are formatted as specified.
Now, if we try to make this connection at apache environment, it fails. At first the datasource name could not resolved by DBI/DBD::Oracle. I think this is because of our new TNSNAMES.ORA file or rather the location is not found by DBI/DBD::Oracle in Apache context (published by $ENV{TNS_ADMIN}). But I don't know why???
The second problem is (if I create a dirty workaround for our first one) that the date format, published by $ENV{NLS_DATE_FORMAT} is only working on first level of our cursor select.
BEGIN OPEN :cursor FOR SELECT * FROM TABLE(stored_procedure) END;
The example above returns collection types of object which are containing date attributes. In Apache context the format published by NLS_DATE_FORMAT is not recognized. If I use a simple form of the example like this
BEGIN OPEN :cursor FOR SELECT SYSDATE FROM TABLE(stored_procedure) END;
the result (a single date field) is formatted well. So I think subordinated structures were not formatted because $ENV{NLS_DATE_FORMAT} works only in console context and not in Apache context, too.
So there must be a problem with the perl environment variables (%ENV) running under Apache and mod_perl. Maybe a problem of mod_perl?
I am at my wit's end. Maybe anyone in the whole wide world has a solution ... and excuse my english :-) If you need some further explanations, I will try to define it more precisely.
If your problem is that changes to %ENV made while processing a request don't seem to be honoured, this is because mod_perl assumes you might be running multiple threads and doesn't actually change the process environment when you change %ENV, so external libraries (like the oracle client) or child processes don't see the change.
You can work around it by first using the prefork MPM, so there aren't any threading issues, and then making changes to the environment using Env::C instead of the %ENV hash.