In SQL Server 2016 they introduced parallel inserts into existing tables. By not having certain features on the target table SQL Server can insert the data in parallel streams.
Using the syntax of
INSERT [tableName] WITH (TABLOCK)
SELECT .....
The data will be inserted in parallel. I have seen great improvements using this. What normally would take about 10 minutes to insert 120 million, using this new feature takes only about 30 seconds.
How can I use this new setting in SSIS? I am using Visual Studio 2015 Enterprise and SQL Server 2016.
I know I can use a "Execute SQL Task" and put something like this in, but what I'm wondering is how to use this in the Data Flow? Is there a specific Connection Manager and setting in the Destination Adapter?
in sql server 2016, we need to provide two condition for using parallelism for our insert operations. The first one is compatibility level of database must be set at 130. So before you run your ssis package check your database compatibility level.
SELECT name, compatibility_level FROM sys.databases
the second condition is using TABLOCK hint. In SSIS Package you can choose TABLOCK hint with OLEDB Destination.
No, you cannot utilize Parallel inserts in DTF.
According to Microsoft description of Parallel Inserts in SQL 2016, it can be used only if executing INSERT ... SELECT ... statement with some limitations. Data Flow prepares data table in memory of SSIS Server, OLEDB or ODBC destination will try to load it with 'INSERTorINSERT BULK` statements, which are not subject to parallel operations.
Related
We have a linked server (OraOLEDB.Oracle) defined in the SQL Server environment. Oracle 12c, SQL Server 2016. There is also an Oracle client (64 bit) installed on SQL Server.
When retrieving data from Oracle (a simple query, getting all columns from a 3M row, fairly narrow table, with varchars, dates and integers), we are seeing the following performance numbers:
sqlplus: select from Oracle > OS File on the SQL Server itself
less than 2k rows/sec
SSMS: insert into a SQL Server table select from Oracle using OpenQuery (passthrough to Oracle, so remote execution)
less than 2k rows/sec
SQL Export/Import tool (in essence, SSIS): insert into a SQL Server table, using the OLEDB Oracle for source and OLEDB SQL Server for target
over 30k rows/second
Looking for ways to improve throughput using OpenQuery/OpenResultSet, to match SSIS throughput. There is probably some buffer/flag somewhere that allows to achieve the same?
Please advise...
Thank you!
--Alex
There is probably some buffer/flag somewhere that allows to achieve the same?
Probably looking for the FetchSize parameter
FetchSize - specifies the number of rows the provider will fetch at a
time (fetch array). It must be set on the basis of data size and the
response time of the network. If the value is set too high, then this
could result in more wait time during the execution of the query. If
the value is set too low, then this could result in many more round
trips to the database. Valid values are 1 to 429,496, and 296. The
default is 100.
eg
exec sp_addlinkedserver N'MyOracle', 'Oracle', 'ORAOLEDB.Oracle', N'//172.16.8.119/xe', N'FetchSize=2000', ''
See, eg https://blogs.msdn.microsoft.com/dbrowne/2013/10/02/creating-a-linked-server-for-oracle-in-64bit-sql-server/
I think there are many way to enhance the performance on the INSERT query, I suggest reading the following article to get more information about data loading performance.
The Data Loading Performance Guide
There are one method you can try which is minimizing the logging by using clustered index. check the link below for more information:
New update on minimal logging for SQL Server 2008
Hello I have to pass a select from a database that is on an ip address to another (identical) database that is on a completely different IP, below the query how to pass to make the switch?
Sql Code:
/*Insert into database with same name into same table addres:: 172.16.50.98*/
Insert into
/* select from database address: 172.16.50.96*/
SELECT IdUtente,Longitudine,Latitudine,Stato,DataCreazione
FROM Quote.dbo.Marcatura
where DataCreazione>'2019-01-08 18:37:28.773'
Linked Server/ OpenQuery is the way to achieve this. have a look on this.
including parameters in OPENQUERY
If the data that's being imported isn't large and this won't be a reoccurring task a linked server would probably be the better option. Creating one through the SSMS GUI is easier if you haven't done this before, but an example of creating one using the SP_ADDLINKEDSERVER stored procedure through T-SQL is below. If your account doesn't have access to the other server the SP_ADDLINKEDSRVLOGIN stored procedure will need to be used to configure the linked server with an account that has the appropriate permissions on the source server, as well as database and any referenced objects. While using the linked server syntax (4 part name) is simpler and easier to read, I'd strongly recommend doing the insert with OPENQUERY instead if only one linked server will be used. This will execute the SQL on the source server, applying any filters there and only return the necessary rows, whereas the linked server syntax will return all the rows before performing the filtering. You can read more about the differences between the two here. You indicated the database name is the same on both servers, and this assumes the same for the table and schema names as well. Make sure to update these accordingly if they differ.
If a large volume of the data will imported or if this will be a regular process creating an SSIS package and setting this to run as a SQL Agent job will be the better approach. If you choose to go this route there are a number of things to consider, but the links below will help you get started. SQL Server Data Tools (SSDT) is where the packages can be developed. While not necessary, executing the packages from the SSIS Catalog, SSISDB, will be much more beneficial than just the using the file system. Either an OLE DB or SQL Server Destination can be used since the table that's being loaded to is on SQL Server, however a SQL Server Destination can only be used on a local database.
Linked Server:
--Create linked server
--SQL product name and SQLNCLI11 provider for SQL Server
EXEC [MASTER].DBO.SP_ADDLINKEDSERVER #server = N'MyLinkedServer', #srvproduct=N'SQL',
#provider=N'SQLNCLI11', #datasrc=N'ServerIPAddress'
--OPENQUERY insert
INSERT INTO Quote.dbo.Marcatura (IdUtente, Longitudine, Latitudine, Stato, DataCreazione)
SELECT
IdUtente,
Longitudine,
Latitudine,
Stato,
DataCreazione
FROM OPENQUERY(MyLinkedServer, '
SELECT
IdUtente,
Longitudine,
Latitudine,
Stato,
DataCreazione
FROM Quote.dbo.Marcatura')
SSIS:
SSIS
SSDT
SSISDB
Execute SQL Task
Data Flow Task
OLE DB Source
OLE DB Destination
SQL Server Destination
SQL Server Agent SSIS Packages
SSIS solution
I think this requires a very simple SSIS package to be achieved:
Create two OLEDB Connection manager; one for each server
Add a data flow task
Inside the Data flow task addan OLEDB Source and OLEDB destination
In the OLEDB source (172.16.50.98 connection manager) select SQL command as Access mode and use the following command:
SELECT IdUtente,Longitudine,Latitudine,Stato,DataCreazione
FROM Quote.dbo.Marcatura
where DataCreazione >'2019-01-08 18:37:28.773'
Map the source columns to the OLEDB destination (172.16.50.96 connection manager)
Helpful links
Extract Data by Using the OLE DB Source
SSIS OLEDB Source to OLE DB Destination example
I came across a good SSIS and SQL Problem. How do I in SSIS create a package that will execute a SQL query in management studio and grab the results of that query (the query results are "Insert INTO statements") and run that insert into statement query results into another sql database within SSIS that updates a table in another server? (The first query runs in one database and the second query runs in a different database)
First of all, sql queries execute on the database, not management studio. Management studio is visual interface for configuring,managing and administer databases.
To me it doesn't sound like there's any problem here at all. Create one connection manager for each DB. Then create two "Execute SQL Tasks", put your insert statements in them use use your connection manangers you've created.
Run the first query in an Execute SQL task and store the results in a string variable.
Then run a second Execute SQL task, using the variable as your SQL command.
Create Connection Managers for each of the databases you need, your source and both (or all) destinations.
Create a Data Flow Task.
In your OLEDB Source, execute your SELECT statement.
Pump the results into a MultiCast Transformation. This allows you to send the exact same result set to multiple destinations.
Create a Destination for each table you want to write to, and connect them to the MultiCast.
Bob's your uncle.
I have to port sql server (2008) database with t-sql scripts. I can generate "create" script per each database object (stored procedure, table) from Sql Server Management Studio (though it looks to take much time)
How do I port data for tables? I'd like to have scripts like that:
INSERT INTO ... VALUES(...)
INSERT INTO ... VALUES(...)
INSERT INTO ... VALUES(...)
...
Can I generate such scripts from Sql Server Management Studio or is there some free 3'rd party utility for that? (I guess there should be).
Thank you in advance!
The (free) SMSS Tool pack addin can generate insert scripts for a DB.
If you're going to be doing bulk inserts of data, I'd suggest using bulk insert. You can do the insert from T-SQL, but I prefer to use the bcp command line utility as I can do both the export and import with minimal change to the run line. Oh... and it runs a lot faster than a bunch of insert statements. Have a look at the documentation and see if it fits your purposes.
Is it possible to perform a bulk insert into an MS-SQL Server (2000, 2005, 2008) using the RODBC package?
I know that I can do this using freebcp, but I'm curious if the RODBC package implements this portion of the Microsoft SQL API and if not, how difficult it would be to implement it.
check out the new odbc and DBI packages. DBI::dbWriteTable writes around 20,000 records per second... Much much faster than the Row Inserts from RODBC::sqlSave()
You're probably looking for ?sqlSave which uses a parametrized INSERT INTO query (taking place in one operation) when you set Fast=True.
Now You can use dbBulkCopy from the new rsqlserver package:
A typical scenario:
You create a matrix
you save it as a csv file
You call dbBulkCopy to read fil and insert it using internally bcp tool of MS Sql server.
This assume that your table is already created in the data base:
dat <- matrix(round(rnorm(nrow*ncol),nrow,ncol)
id.file = "temp_file.csv"
write.csv(dat,file=id.file,row.names=FALSE)
dbBulkCopy(conn,'NEW_BP_TABLE',value=id.file)
Using RODBC, the fastest insert we've been able to create (260 million row insert) looks like the following (in R pseudo code):
ourDataFrame <- sqlQuery(OurConnection, "SELECT myDataThing1, myDataThing2
FROM myData")
ourDF <- doStuff(ourDataFrame)
write.csv(ourDF,ourFile)
sqlQuery(OurConnection, "CREATE TABLE myTable ( la [La], laLa [LaLa]);
BULK INSERT myTable FROM 'ourFile'
WITH YOURPARAMS=yourParams;")
If you're running this from between servers, you need a network drive that the R server can write to (e.g. one server with permissions for writing to the DB uses Rscript to productionalize the code), and the SQL Server can read from.
From everything I can find, there is NO solution for bulk insert to MySQL and nothing that works with SSIS which is why Microsoft is including in-database analytics with SQL Server 2016 after buying Revolution R Analytics.
I tried to comment on the previous answer but don't have the reputation to do it.
The rsqlserver package needs to run with rClr and neither of those packages are well-behaved, especially because rsqlserver's INSERT functions have poor data type handling. So if you use it, you'll have no idea what you're looking at in the SQL table as much of the information in your data.frame will have been transformed.
Considering the RODBC package has been around for 15 years, I'm pretty disappointed that no one has created a bulk insert function...
Our n2khelper package can use bcp (bulkcopy) when it is available. When not available it falls back to multiple INSERT statements.
You can find the package on https://github.com/INBO-Natura2000/n2khelper
Install it with devtools::install_git("INBO-Natura2000/n2khelper") and look for the odbc_insert() function.