Goal: be able to conduct SQL queries on a data frame in R.
Approach used: using dbWriteTable to write the table to the database that I would then be able to query on using SQL and join to other tables existing in the DB.
Issue: Seems to execute successfully, but table does not seem to actually exist in the db. Errors thrown when attempting to query table. Details below:
Data frame name: testing_df = 1 column dataframe
channel <- DBI::dbConnect(odbc::odbc(), "data_source_name", uid="user_name", pwd='password')
dbGetQuery(channel,"use role role_name;")
dbGetQuery(channel,"use warehouse warehouse_name;")
dbGetQuery(channel,"use schema schema_name;")
dbGetQuery(channel,"use database db_name;")
table_name = Id(database="database_name",schema="schema_name",table="table_testing")
dbWriteTable(conn = channel,
name = table_name,
value = testing_df,
overwrite=TRUE)
dbReadTable(channel,name=table_name)
dbExistsTable(channel,name=table_name)
dbReadTable provides output of data frame expected.
dbExistsTable provides the following output:
> dbExistsTable(channel,name=table_name)
[1] TRUE
Issue: The table cannot be located in the actual database UI, and when running the following in R:
desired_output <- dbGetQuery(channel,sprintf("select * from database_name.schema_name.table_testing;"))
desired_output
I get the following error:
SQL compilation error: Object 'table_testing' does not exist or not authorized.
I am able to check in the database and see that the table actually does not exist.
Question: Does anyone know if dbWriteTable is actually supposed to be writing the table to the database, or if I'm misunderstanding the purpose of dbWriteTable? Any better ways to approach this task?
I have a question similar to this Stackoverflow post.
How can I create a persistent table from a SQL query in a database (I use a DB2 database)? My goal is to use a table from one schema and to permanently create a more or less modified table in another schema.
What works so far is to pull the data to R and subsequently create a table in a different schema:
dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
collect() %>%
DBI::dbWriteTable(con, Id(schema = "SCHEMA_B", table = "NEW_TABLE"), ., overwrite = TRUE)
However, I'd like to incorporate the compute() function in a dplyr pipeline such that I do not have to pull the data into R, that is, I'd like keep the data on the database. As a side note: I do not know how I would substitute the DBI'sdbWriteTable() for dplyr's copy_to() – being able to do that would also help me.
Unfortunately, I am not able to make it work, even after reading ?compute() and its online documentation. The following code framework does not work and results in an error:
dplyr::tbl(con, in_schema("SCHEMA_A", "TABLE")) %>%
dplyr::compute(in_schema("SCHEMA_B", "NEW_TABLE"), analyze = FALSE, temporary = FALSE)
Is there a solution for using compute() or some other solution applicable to a dplyr pipeline?
I use a custom function that takes the SQL query behind a remote table, converts in into a query that can be executed on the SQL server to save a new table, and then executes that query using the DBI package. Key details below, full details (and other functions I find useful) in my GitHub repository here.
write_to_database <- function(input_tbl, db_connection, db, schema, tbl_name){
# SQL query
sql_query <- glue::glue("SELECT *\n",
"INTO {db}.{schema}.{tbl_name}\n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS from_table")
# run query
DBI::dbExecute(db_connection, as.character(sql_query))
}
The essence of the idea is to construct an SQL query that if you executed it in your database language directly, would give you the desired outcome. In my application this takes the form:
SELECT *
INTO db.schema.table
FROM (
/* sub query for existing table */
) AS alias
Note that this is using SQL server, and your particular SQL syntax might be different. INTO is the SQL server pattern for writing a table. In the example linked to in the question, the syntax is TO TABLE.
Thanks to #Simon.S.A., I could solve my problem. As he showed in his reply, one can define a custom function and incorporate it in a dplyr pipeline. My adapted code looks like this:
# Custom function
write_to_database <- function(input_tbl, db_connection, schema, tbl_name){
# SQL query
sql_query <- glue::glue("CREATE TABLE {schema}.{tbl_name} AS (\n",
"SELECT * FROM (\n",
dbplyr::sql_render(input_tbl),
"\n)) WITH DATA;")
# Drop table if it exists
DBI::dbExecute(con, glue::glue("BEGIN\n",
"IF EXISTS\n",
"(SELECT TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA = '{schema}' AND TABNAME = '{tbl_name}') THEN\n",
"PREPARE stmt FROM 'DROP TABLE {schema}.{tbl_name}';\n",
"EXECUTE stmt;\n",
"END IF;\n",
"END"))
# Run query
DBI::dbExecute(db_connection, as.character(sql_query))
}
# Dplyr pipeline
dplyr::tbl(con, in_schema("SCHEMA_A", "SOURCE_TABLE_NAME")) %>%
dplyr::filter(VARIABLE == "ABC") %>%
show_query() %>%
write_to_database(., con, "SCHEMA_B", "NEW_TABLE_NAME")
It turns out that DB2 appears to not know DROP TABLE IF EXISTS such that some additional programming is necessary. I used this Stackoverflow post to get it done. Furthermore, in my case, I do not need to specify the database explicitly such that the parameter db in the custom function is left out.
I am working on an IronPython script that pulls some new data from an oracle database, using DatabaseDataSource. I can get the script to pull the data just fine if I just overwrite or append to a table as the output. However, I would like to handle the output of one query in my script, and then use those results to generate another query. Does anybody know how I can do this?
Here's some abbreviated code for what I'm doing now for the first query:
PROVIDER = "System.Data.OracleClient"
DATASOURCE = "Data Source=(DESCRIPTION=(ADDRESS=(COMMUNITY=TCP)(PROTOCOL = TCP) (HOST=host)(PORT=1521))(CONNECT_DATA=(SID=sid))); UserId=userid;Password=password"
SQL = "SELECT PARM FROM PARAMETERS WHERE ..."
dbsettings = DatabaseDataSourceSettings(PROVIDER, DATASOURCE, SQL)
ds = DatabaseDataSource(dbsettings)
outputTable.ReplaceData(ds)
This works, but it just replaces outputTable with the data from the query, obviously. What I'd like to do is read the data I get back in the form of an array. Then, based on the results I will generate another query or set of queries, and eventually I'd merge the data from several of these into one table.
I am using R v3.0.3 32bit and Access 2013. I have created a database table in Access with two tuples and wish to be able to query the database through R. The SQL is trivial to do this but the parameters necessary are contained within an XML document.
Is there a way in R to use XML values from a DOM tree as direct input into an SQL query? I've read post on here which show it can be done in SQL server by programming statements such as SELECT .Value(/////)
Thanks in advance
Consider using the XML package to extract the XML into a data frame and then use various columns as parameters in needed MS Access query:
library(RODBC)
library(XML)
# XML IMPORT
doc<-xmlParse("C:\\Path\\To\\XMLFile.xml")
xmldf <- xmlToDataFrame(nodes = getNodeSet(doc, "//MainNodeElement"))
# MS ACCESS CONNECTION
conn <-odbcDriverConnect('driver={Microsoft Access Driver (*.mdb, *.accdb)};
DBQ=C:\\Path\\To\\AccessDatabase.accdb')
# LOOP THROUGH ROWS OF XML DF
for (i in 0:nrow(xmldf)) {
# MODIFY SELECT OR ACTION QUERY TO MEET NEEDS, EVEN ADD PARAMS
querydf <- sqlQuery(myconn3, paste0("select * from table1 where param='",
xmldf$colname[i], "'"))
}
close(conn)
I'm having trouble creating a table using RODBC's sqlSave (or, more accurately, writing data to the created table).
This is different than the existing sqlSave question/answers, as
the problems they were experiencing were different, I can create tables whereas they could not and
I've already unsuccesfully incorporated their solutions, such as closing and reopening the connection before running sqlSave, also
The error message is different, with the only exception being a post that was different in the above 2 ways
I'm using MS SQL Server 2008 and 64-bit R on a Windows RDP.
I have a simple data frame with only 1 column full of 3, 4, or 5-digit integers.
> head(df)
colname
1 564
2 4336
3 24810
4 26206
5 26433
6 26553
When I try to use sqlSave, no data is written to the table. Additionally, an error message makes it sound like the table can't be created though the table does in fact get created with 0 rows.
Based on a suggestion I found, I've tried closing and re-opening the RODBC connection right before running sqlSave. Even though I use append = TRUE, I've tried dropping the table before doing this but it doesn't affect anything.
> sqlSave(db3, df, table = "[Jason].[dbo].[df]", append = TRUE, rownames = FALSE)
Error in sqlSave(db3, df, table = "[Jason].[dbo].[df]", :
42S01 2714 [Microsoft][ODBC SQL Server Driver][SQL Server]There is already
an object named 'df' in the database.
[RODBC] ERROR: Could not SQLExecDirect 'CREATE TABLE [Jason].[dbo].[df]
("df" int)'
I've also tried using sqlUpdate() on the table once it's been created. It doesn't matter if I create it in R or SQL Server Management Studio, I get the error table not found on channel
Finally, note that I have also tried this without append = TRUE and when creating a new table, as well as with and without the rownames option.
Mr.Flick from Freenode's #R had me check if I could read in the empty table using sqlQuery and indeed, I can.
Update
I've gotten a bit closer with the following steps:
I created an ODBC connection that goes directly to my Database within the SQL Server, instead of just to the default (Master) DB then specifying the path to the table within the table = or tablename = statements
Created the table in SQL Server Management Studio as follows
GO
CREATE TABLE [dbo].[testing123](
[Person_DIMKey] [int] NULL
) ON [PRIMARY]
GO
In R I used sqlUpdate with my new ODBC connection and no brackets around the tablename
Now sqlUpdate() sees the table, however it complains that it needs a unique column
Indicating that the only column in the table is the unique column with index = colname results in an error saying that the column does not exist
I dropped and recreated the table specifying a primary key,
GO
CREATE TABLE [dbo].[jive_BNR_Person_DIMKey](
[jive_BNR_Person_DIMKey] [int] NOT NULL PRIMARY KEY
) ON [PRIMARY]
GO
which generated both a Primary Key and Index (according to the GUI interface of SQL Sever Management Studio) named PK__jive_BNR__2754EC2E30F848ED
I specified this index/key as the unique column in sqlUpdate() but I get the following error:
Error in sqlUpdate(db4, jive_BNR_Person_DIMKey, tablename = "jive_BNR_Person_DIMKey", :
index column(s) PK__jive_BNR__2754EC2E30F848ED not in database table
For the record, I was specifying the correct column name (not "colname") for index; thanks to MrFlick for requesting clarification.
Also, these steps are numbered 1 through 7 in my post but StackOverflow resets the numbering of the list a few times when it gets displayed. If anyone can help me clean that aspect of this post up I'd appreciate it.
After hours of working on this, I was finally able to get sqlSave to work while specifying the table name--deep breathe, where to start. Here is the list of things I did to get this to work:
Open 32-bit ODBC Administrator and create a User DSN and configure it for your specific database. In my case, I am creating a global temp table so I linked to tempdb. Use this connection Name in your odbcConnection(Name). Here is my code myconn2 <- odbcConnect("SYSTEMDB").
Then I defined my data types with the following code: columnTypes <- list(Record = "VARCHAR(10)", Case_Number = "VARCHAR(15)", Claim_Type = "VARCHAR(15)", Block_Date = "datetime", Claim_Processed_Date = "datetime", Status ="VARCHAR(100)").
I then updated my data frame class types using as.character and as.Date to match the data types listed above.
I already created the table since I've been working on it for hours so I had to drop the table using sqlDrop(myconn2, "##R_Claims_Data").
I then ran: sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
Then my head fell off because it worked! I really hope this helps someone going forward. Here are the links that helped me get to this point:
Table not found
sqlSave in R
RODBC
After re-reading the RODBC vignette and here's the simple solution that worked:
sqlDrop(db, "df", errors = FALSE)
sqlSave(db, df)
Done.
After experimenting with this a lot more for several days, it seems that the problems stemmed from the use of the additional options, particularlly table = or, equivalently, tablename =. Those should be valid options but somehow they manage to cause problems with my particular version of RStudio ((Windows, 64 bit, desktop version, current build), R (Windows, 64 bit, v3), and/or MS SQL Server 2008.
sqlSave(db, df) will also work without sqlDrop(db, "df") if the table has never existed, but as a best practice I'm writing try(sqlDrop(db, "df", errors = FALSE), silent = TRUE) before all sqlSave statements in my code.
We have had this same problem, which after a bit of testing we solved simply by not using square brackets in the schema and table name reference.
i.e. rather than writing
table = "[Jason].[dbo].[df]"
instead write
table = "Jason.dbo.df"
Appreciate this is now long past the original question, but just for anyone else who subsequently trips up on this problem, this is how we solved it. For reference, we found this out by writing a simple 1 item dataframe to a new table, which when inspected in SQL contained the square brackets in the table name.
Here are a few rules of thumb:
If things aren't working out, then manually specify the column types just as #d84_n1nj4 suggested.
columnTypes <- list(Record = "VARCHAR(10)", Case_Number = "VARCHAR(15)", Claim_Type = "VARCHAR(15)", Block_Date = "datetime", Claim_Processed_Date = "datetime", Status ="VARCHAR(100)")
sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
If #1 doesn't work, then continue to specify the columns, but specify them all as VARCHAR(255). Treat this as a temp or staging table, and move the data over with sqlQuery with your next step, just as #danas.zuokas suggested. This should work, but even if it doesn't, it gets you closer to the metal and puts you in better position to debug the problem with SQL Server Profiler if you need it. <- And yes, if you still have a problem, it's likely due to either a parsing error or type conversion.
columnTypes <- list(Record = "VARCHAR(255)", Case_Number = "VARCHAR(255)", Claim_Type = "VARCHAR(255)", Block_Date = "VARCHAR(255)", Claim_Processed_Date = "VARCHAR(255)", Status ="VARCHAR(255)")
sqlSave(myconn2, MainClmDF2, tablename = "##R_Claims_Data", verbose=TRUE, rownames= FALSE, varTypes=columnTypes)
sqlQuery(channel, 'insert into real_table select * from R_Claims_Data')
Due to RODBC's implementation, and not due to any inherent limitation in T-SQL, R's logical type (i.e. [TRUE, FALSE]) will not convert to T-SQL's BIT type (i.e. [1, 0]), so don't try this. Either convert the logical type to [1, 0] in the R layer or take it down to the SQL layer as a VARCHAR(5) and convert it to a BIT in the SQL layer.
In addition to some of the answered posted earlier, here's my workaround. NOTE: I use this as part of a small ETL process, and the destination table in the DB is dropped and recreated each time.
Basically you want to name your dataframe what you destination table is named:
RodbcTest <- read.xlsx('test.xlsx', sheet = 4, startRow = 1, colNames = TRUE, skipEmptyRows = TRUE)
Then make sure your connection string includes the target database (not just server):
conn <- odbcDriverConnect(paste("DRIVER={SQL Server};Server=localhost\\sqlexpress;Database=Charter;Trusted_Connection=TRUE"))
after that, I run a simple sqlQuery that conditionally drops the table if it exists:
sqlQuery(conn, "IF OBJECT_ID('Charter.dbo.RodbcTest') IS NOT NULL DROP TABLE Charter.dbo.RodbcTest;")
Then finally, run sqlSave without the tablename param, which will create the table and populate it with your dataframe:
sqlSave(conn, RodbcTest, safer = FALSE, fast = TRUE)
I've encountered the same problem-- the way I found around it is to create the an empty table using regular CREATE TABLE SQL syntax, and then append to it via sqlSave. For some reason, when I tried it your way, I could actually see the table name in the MSSQL database - even after R threw the error message you showed above - but it would be empty.