How to insert R dataframe into existing table in SQL Server - sql

After trying a few different packages and methods found online, I am yet to find a solution that works for inserting a dataframe from R into an existing table in SQL Server.
I've had great success doing this with MySQL, but SQL Server seems to be more difficult.
I have managed to write a new table using the DBI package, but I can't find a way to insert into using this method. Looking at the documentation, there doesn't seem to be a way of inserting.
As there are more than 1000 rows of data, using sqlQuery from the RODBC package also seems unfeasable.
Can anybody suggest a working method for inserting large amounts of data from a dataframe into an existing SQL table?

I've had similar needs using R and PostGreSQL using the r-postgres-specific drivers. I imagine similar issues may exist with SQLServer. The best solution I found was to write to a temporary table in the database using either dbWriteTable or one of the underlying functions to write from a stream to load very large tables (for Postgres, postgresqlCopyInDataframe, for example). The latter usually requires more work in terms of defining and aligning SQL data types and R class types to ensure writing, wheres dbWriteTable tends to be a bit easier. Once written to a temporary table, to then issue an SQL statement to insert into your table as you would within the database environment. Below is an example using high-level DBI library database calls:
dbExecute(conn,"start transaction;")
dbExecute(conn,"drop table if exists myTempTable")
dbWriteTable(conn,"myTempTable",df)
dbExecute(conn,"insert into myRealTable(a,b,c) select a,b,c from myTempTable")
dbExecute(conn,"drop table if exists myTempTable")
dbExecute(conn,"commit;")

Related

Best practice for data migration

Currently We have developed a system for a manual work they have been doing using many excel files.
Is there a best practice for data migration? because I wanted to use backend language like .net to do the validation and insert into tables rather than using SQL to do migration.
Total record in excel is around 12K rows but for many tables so its not needed consider a lot about performance and it is only one time.
I would add a few calculated columns in Excel that would generate SQL Insert / Update scripts. Something like ="INSERT INTO table (column) VALUES ('"&A1&"');"
Then just copy calculated column and run it through SQL client. I used to have a macros to run it directly from Excel through OLEDB that would highlight failed expressions and store SQL Exceptions next to them.
That way the data can be easily tidied, corrected and SQL re-run as needed.

Move data from one database to another over ODBC

Let say, I am connected to two different databases (one SQLite and one Oracle) over ODBC in one program. Is it possible to execute a query on one database and insert the resultset as a new table in the second database directly by just passing something like a data cursor, i.e. without the hurdle to create insert statements with explicit values from the result set and executing those on the destination database?
As far as I am aware you cannot do that. You can use something like a join engine to do it (e.g., http://www.easysoft.com/products/data_access/odbc_odbc_join_engine/index.html) but it might be overkill if you are doing it just the once.
If you use Oracle then you can use Oracle Heterogeneous Services which can work with ODBC sources. Have a look at: http://www.dba-oracle.com/t_heterogeneous_database_connections_sql_server.htm

Postgres, plpgsql: Is there a way to connect to other DB from inside of a stored procedure?

I have two DB's one is feed by filtered data from another, now i'm using perl script witch executes query on foreign DB, stores a result in a csv file, and loads it to local DB using \COPY sytnatx
Is there a way to write plpgsql function witch will connect to foreign DB and load filtered data in local DB ( I know it can be done in ie. plperl, but i search more "native" way )
And there is the DBI-LINK that supports much more databases :)
Currently, PostgreSQL has dblink, but it only supports connecting to other PostgreSQL instances - not any other database, sadly.
I would recommend PL/Proxy, which is significantly easier to use - just write the desired stored procedure on the target database (with some minor caveats, like not using enumerated types), and declare the same function on the source, PL/Proxy will handle the communications. It is the basis for Skype's distributed database architecture and is production-ready.

MS-SQL Bulk Insert with RODBC

Is it possible to perform a bulk insert into an MS-SQL Server (2000, 2005, 2008) using the RODBC package?
I know that I can do this using freebcp, but I'm curious if the RODBC package implements this portion of the Microsoft SQL API and if not, how difficult it would be to implement it.
check out the new odbc and DBI packages. DBI::dbWriteTable writes around 20,000 records per second... Much much faster than the Row Inserts from RODBC::sqlSave()
You're probably looking for ?sqlSave which uses a parametrized INSERT INTO query (taking place in one operation) when you set Fast=True.
Now You can use dbBulkCopy from the new rsqlserver package:
A typical scenario:
You create a matrix
you save it as a csv file
You call dbBulkCopy to read fil and insert it using internally bcp tool of MS Sql server.
This assume that your table is already created in the data base:
dat <- matrix(round(rnorm(nrow*ncol),nrow,ncol)
id.file = "temp_file.csv"
write.csv(dat,file=id.file,row.names=FALSE)
dbBulkCopy(conn,'NEW_BP_TABLE',value=id.file)
Using RODBC, the fastest insert we've been able to create (260 million row insert) looks like the following (in R pseudo code):
ourDataFrame <- sqlQuery(OurConnection, "SELECT myDataThing1, myDataThing2
FROM myData")
ourDF <- doStuff(ourDataFrame)
write.csv(ourDF,ourFile)
sqlQuery(OurConnection, "CREATE TABLE myTable ( la [La], laLa [LaLa]);
BULK INSERT myTable FROM 'ourFile'
WITH YOURPARAMS=yourParams;")
If you're running this from between servers, you need a network drive that the R server can write to (e.g. one server with permissions for writing to the DB uses Rscript to productionalize the code), and the SQL Server can read from.
From everything I can find, there is NO solution for bulk insert to MySQL and nothing that works with SSIS which is why Microsoft is including in-database analytics with SQL Server 2016 after buying Revolution R Analytics.
I tried to comment on the previous answer but don't have the reputation to do it.
The rsqlserver package needs to run with rClr and neither of those packages are well-behaved, especially because rsqlserver's INSERT functions have poor data type handling. So if you use it, you'll have no idea what you're looking at in the SQL table as much of the information in your data.frame will have been transformed.
Considering the RODBC package has been around for 15 years, I'm pretty disappointed that no one has created a bulk insert function...
Our n2khelper package can use bcp (bulkcopy) when it is available. When not available it falls back to multiple INSERT statements.
You can find the package on https://github.com/INBO-Natura2000/n2khelper
Install it with devtools::install_git("INBO-Natura2000/n2khelper") and look for the odbc_insert() function.

Move Data from Oracle to SQL Server

I would like to copy parts of an Oracle DB to a SQL Server DB. I need to move the data because the Oracle box is being decommissioned. I only need the data for reference purposes so don't need indexes or stored procedures or contstaints, etc. All I need is the data.
I have a link to the Oracle DB in SQL Server. I have tested the following query, which seemed to work just fine:
select
*
into
NewTableName
from
linkedserver.OracleTable
I was wondering if there are any potential issues with using this approach?
Using SSIS (sql integration services) may be a good alternative especially if your table names are the same on both servers. Use the import wizard via and it should create the destination tables for you and let you edit any mappings.
The only issue I see with that is you will need to execute that of course for each and every table you need. Glad you are decommissioning the oracle server :-). Otherwise if you are not concerned with indexes or any of the existing sprocs I don't see any issue in what you are doing.
The "select " approach could be very slow if tables are large. Consider writing pro*C in that case or use Fastreader http://www.wisdomforce.com/products-FastReader.html
A faster and easier approach might be to use the Data Transformation Services, depending on the number of objects you're trying to copy over.