Bulk Insert in multiple relational tables using Dapper .Net-using scope identity - sql

I need to import millions of records in multiple sql server relational tables.
TableA(Aid(pk),Name,Color)----return id using scope identity
TableB(Bid,Aid(fk),Name)---Here we need to insert Aid(pk) which we got using scocpe Identity
How I can do bulk insert of collection of millions of records using dapper in one single Insert statement

Dapper just wraps raw ADO.NET; raw ADO.NET doesn't offer a facility for this, therefore dapper does not. What you want is SqlBulkCopy. You could also use a table-valued-parameter, but this really feels like a SqlBulkCopy job.
In a pinch, you can use dapper here - Execute will unroll an IEnumerable<T> into a series of commands about T - but it will be lots of commands; and unless you explicitly enable async-pipelining, it will suffer from latency per-command (the pipelined mode avoids this, but it will still be n commands). But SqlBulkCopy will be much more efficient.
If the input data is an IEnumerable<T>, you might want to use ObjectReader from FastMember; for example:
IEnumerable<SomeType> data = ...
using(var bcp = new SqlBulkCopy(connection))
using(var reader = ObjectReader.Create(data, "Id", "Name", "Description"))
{
bcp.DestinationTableName = "SomeTable";
bcp.WriteToServer(reader);
}

Related

I am getting an exception with vb.net when certain punctuations are used [duplicate]

I am very new to working with databases. Now I can write SELECT, UPDATE, DELETE, and INSERT commands. But I have seen many forums where we prefer to write:
SELECT empSalary from employee where salary = #salary
...instead of:
SELECT empSalary from employee where salary = txtSalary.Text
Why do we always prefer to use parameters and how would I use them?
I wanted to know the use and benefits of the first method. I have even heard of SQL injection but I don't fully understand it. I don't even know if SQL injection is related to my question.
Using parameters helps prevent SQL Injection attacks when the database is used in conjunction with a program interface such as a desktop program or web site.
In your example, a user can directly run SQL code on your database by crafting statements in txtSalary.
For example, if they were to write 0 OR 1=1, the executed SQL would be
SELECT empSalary from employee where salary = 0 or 1=1
whereby all empSalaries would be returned.
Further, a user could perform far worse commands against your database, including deleting it If they wrote 0; Drop Table employee:
SELECT empSalary from employee where salary = 0; Drop Table employee
The table employee would then be deleted.
In your case, it looks like you're using .NET. Using parameters is as easy as:
string sql = "SELECT empSalary from employee where salary = #salary";
using (SqlConnection connection = new SqlConnection(/* connection info */))
using (SqlCommand command = new SqlCommand(sql, connection))
{
var salaryParam = new SqlParameter("salary", SqlDbType.Money);
salaryParam.Value = txtMoney.Text;
command.Parameters.Add(salaryParam);
var results = command.ExecuteReader();
}
Dim sql As String = "SELECT empSalary from employee where salary = #salary"
Using connection As New SqlConnection("connectionString")
Using command As New SqlCommand(sql, connection)
Dim salaryParam = New SqlParameter("salary", SqlDbType.Money)
salaryParam.Value = txtMoney.Text
command.Parameters.Add(salaryParam)
Dim results = command.ExecuteReader()
End Using
End Using
Edit 2016-4-25:
As per George Stocker's comment, I changed the sample code to not use AddWithValue. Also, it is generally recommended that you wrap IDisposables in using statements.
You are right, this is related to SQL injection, which is a vulnerability that allows a malicioius user to execute arbitrary statements against your database. This old time favorite XKCD comic illustrates the concept:
In your example, if you just use:
var query = "SELECT empSalary from employee where salary = " + txtSalary.Text;
// and proceed to execute this query
You are open to SQL injection. For example, say someone enters txtSalary:
1; UPDATE employee SET salary = 9999999 WHERE empID = 10; --
1; DROP TABLE employee; --
// etc.
When you execute this query, it will perform a SELECT and an UPDATE or DROP, or whatever they wanted. The -- at the end simply comments out the rest of your query, which would be useful in the attack if you were concatenating anything after txtSalary.Text.
The correct way is to use parameterized queries, eg (C#):
SqlCommand query = new SqlCommand("SELECT empSalary FROM employee
WHERE salary = #sal;");
query.Parameters.AddWithValue("#sal", txtSalary.Text);
With that, you can safely execute the query.
For reference on how to avoid SQL injection in several other languages, check bobby-tables.com, a website maintained by a SO user.
In addition to other answers need to add that parameters not only helps prevent sql injection but can improve performance of queries. Sql server caching parameterized query plans and reuse them on repeated queries execution. If you not parameterized your query then sql server would compile new plan on each query(with some exclusion) execution if text of query would differ.
More information about query plan caching
Two years after my first go, I'm recidivating...
Why do we prefer parameters? SQL injection is obviously a big reason, but could it be that we're secretly longing to get back to SQL as a language. SQL in string literals is already a weird cultural practice, but at least you can copy and paste your request into management studio. SQL dynamically constructed with host language conditionals and control structures, when SQL has conditionals and control structures, is just level 0 barbarism. You have to run your app in debug, or with a trace, to see what SQL it generates.
Don't stop with just parameters. Go all the way and use QueryFirst (disclaimer: which I wrote). Your SQL lives in a .sql file. You edit it in the fabulous TSQL editor window, with syntax validation and Intellisense for your tables and columns. You can assign test data in the special comments section and click "play" to run your query right there in the window. Creating a parameter is as easy as putting "#myParam" in your SQL. Then, each time you save, QueryFirst generates the C# wrapper for your query. Your parameters pop up, strongly typed, as arguments to the Execute() methods. Your results are returned in an IEnumerable or List of strongly typed POCOs, the types generated from the actual schema returned by your query. If your query doesn't run, your app won't compile. If your db schema changes and your query runs but some columns disappear, the compile error points to the line in your code that tries to access the missing data. And there are numerous other advantages. Why would you want to access data any other way?
In Sql when any word contain # sign it means it is variable and we use this variable to set value in it and use it on number area on the same sql script because it is only restricted on the single script while you can declare lot of variables of same type and name on many script. We use this variable in stored procedure lot because stored procedure are pre-compiled queries and we can pass values in these variable from script, desktop and websites for further information read Declare Local Variable, Sql Stored Procedure and sql injections.
Also read Protect from sql injection it will guide how you can protect your database.
Hope it help you to understand also any question comment me.
Old post but wanted to ensure newcomers are aware of Stored procedures.
My 10¢ worth here is that if you are able to write your SQL statement as a stored procedure, that in my view is the optimum approach. I ALWAYS use stored procs and never loop through records in my main code. For Example: SQL Table > SQL Stored Procedures > IIS/Dot.NET > Class.
When you use stored procedures, you can restrict the user to EXECUTE permission only, thus reducing security risks.
Your stored procedure is inherently paramerised, and you can specify input and output parameters.
The stored procedure (if it returns data via SELECT statement) can be accessed and read in the exact same way as you would a regular SELECT statement in your code.
It also runs faster as it is compiled on the SQL Server.
Did I also mention you can do multiple steps, e.g. update a table, check values on another DB server, and then once finally finished, return data to the client, all on the same server, and no interaction with the client. So this is MUCH faster than coding this logic in your code.
Other answers cover why parameters are important, but there is a downside! In .net, there are several methods for creating parameters (Add, AddWithValue), but they all require you to worry, needlessly, about the parameter name, and they all reduce the readability of the SQL in the code. Right when you're trying to meditate on the SQL, you need to hunt around above or below to see what value has been used in the parameter.
I humbly claim my little SqlBuilder class is the most elegant way to write parameterized queries. Your code will look like this...
C#
var bldr = new SqlBuilder( myCommand );
bldr.Append("SELECT * FROM CUSTOMERS WHERE ID = ").Value(myId);
//or
bldr.Append("SELECT * FROM CUSTOMERS WHERE NAME LIKE ").FuzzyValue(myName);
myCommand.CommandText = bldr.ToString();
Your code will be shorter and much more readable. You don't even need extra lines, and, when you're reading back, you don't need to hunt around for the value of parameters. The class you need is here...
using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
public class SqlBuilder
{
private StringBuilder _rq;
private SqlCommand _cmd;
private int _seq;
public SqlBuilder(SqlCommand cmd)
{
_rq = new StringBuilder();
_cmd = cmd;
_seq = 0;
}
public SqlBuilder Append(String str)
{
_rq.Append(str);
return this;
}
public SqlBuilder Value(Object value)
{
string paramName = "#SqlBuilderParam" + _seq++;
_rq.Append(paramName);
_cmd.Parameters.AddWithValue(paramName, value);
return this;
}
public SqlBuilder FuzzyValue(Object value)
{
string paramName = "#SqlBuilderParam" + _seq++;
_rq.Append("'%' + " + paramName + " + '%'");
_cmd.Parameters.AddWithValue(paramName, value);
return this;
}
public override string ToString()
{
return _rq.ToString();
}
}

SQL Server Compact fast insert of data information? c#

I am using SQL Server Compact Edition.
I have a FOR loop for build my queries to insert data so I download as 3000 rows from a full-blown SQL Server.
I need to insert these 3000 rows into SQL Server Compact Edition, and currently I do an insert for every row which takes a lot of time. I need to do this for 12 tables, and that would take a very long time... I need a faster method to do this in less than 5 minutes...
Do you know any way for to get a insert a lot of rows on SQL Server Compact?
I tried do insert of fifty on fifty but I got an ERROR...
insert into table(col1,col2)
values (val1,valo2) ;
insert into table(col1,col2)
values (val1,valo2)
I got error with 2 insert (so its not going to be able for to do 50 inserts),
Is it possible to do OPENROWSET in SQL Server Compact Edition?
MY QUESTION
How can I insert a lot of rows into a SQL Server Compact Edition database?
I get this data from a full-blown SQL Server database.
do you know any way for to get a insert a lot of rows on sql compact?
See below for what I think is the best solution, but if you don't like it (for whatever reason) and your source database is SQL Server you could consider making a linked server in SQL Server to your compact database and letting SQL Server insert the records. From my experience, SQL Server inserts into arbitrary databases pretty quickly.
SQL Compact Bulk Insert Library
.NET Library for loading data fast (doing bulk inserts) into a SQL Server Compact database file. Attempts to mimic the SQLClient SqlBulkCopy API.
Some timings from testing - load 2 column table with no constraints/indexes:
1,000,000 rows: 6 seconds = 166,666 rows/second
5,000,000 rows: 28 seconds = 178,000 rows/second
private static void DoBulkCopy(bool keepNulls, IDataReader reader)
{
SqlCeBulkCopyOptions options = new SqlCeBulkCopyOptions();
if (keepNulls)
{
options = options |= SqlCeBulkCopyOptions.KeepNulls;
}
using (SqlCeBulkCopy bc = new SqlCeBulkCopy(connectionString, options))
{
bc.DestinationTableName = "tblDoctor";
bc.WriteToServer(reader);
}
}
It should take less than a second to insert 3000 rows into 12 tables using plain SQL statements. If you are getting errors, or it is taking minutes, you are doing something wrong. This is a very basic task, if you are getting errors you should find out what is going wrong, not try using different tools.
The simplest way to insert records into a SQL CE database using C# is to use ADO.NET. Here is an example using parameterized queries:
using (var conn = new SqlCeConnection(YourCommandString))
{
var cmd = conn.CreateCommand();
cmd.CommandText = "INSERT INTO table (col1,col2) VALUES (#val1,#val2)";
var param1 = cmd.CreateParameter();
param1.ParameterName = "#val1";
// assign col1 types, constraints, lengths here.
cmd.Parameters.Add(param1);
var param2 = cmd.CreateParameter();
param2.ParameterName = "#val2";
// assign col2 types, constraints, lengths here.
cmd.Parameters.Add(param2);
cmd.Prepare();
conn.Open();
foreach (var s in sourceDataStructure)
{
param1.Value = s.sourceValue1;
param2.Value = s.sourceValue2;
cmd.ExecuteNonQuery();
}
cmd.Dispose();
}
if you do not know what a parameterized query is, you should learn.

Why do we always prefer using parameters in SQL statements?

I am very new to working with databases. Now I can write SELECT, UPDATE, DELETE, and INSERT commands. But I have seen many forums where we prefer to write:
SELECT empSalary from employee where salary = #salary
...instead of:
SELECT empSalary from employee where salary = txtSalary.Text
Why do we always prefer to use parameters and how would I use them?
I wanted to know the use and benefits of the first method. I have even heard of SQL injection but I don't fully understand it. I don't even know if SQL injection is related to my question.
Using parameters helps prevent SQL Injection attacks when the database is used in conjunction with a program interface such as a desktop program or web site.
In your example, a user can directly run SQL code on your database by crafting statements in txtSalary.
For example, if they were to write 0 OR 1=1, the executed SQL would be
SELECT empSalary from employee where salary = 0 or 1=1
whereby all empSalaries would be returned.
Further, a user could perform far worse commands against your database, including deleting it If they wrote 0; Drop Table employee:
SELECT empSalary from employee where salary = 0; Drop Table employee
The table employee would then be deleted.
In your case, it looks like you're using .NET. Using parameters is as easy as:
string sql = "SELECT empSalary from employee where salary = #salary";
using (SqlConnection connection = new SqlConnection(/* connection info */))
using (SqlCommand command = new SqlCommand(sql, connection))
{
var salaryParam = new SqlParameter("salary", SqlDbType.Money);
salaryParam.Value = txtMoney.Text;
command.Parameters.Add(salaryParam);
var results = command.ExecuteReader();
}
Dim sql As String = "SELECT empSalary from employee where salary = #salary"
Using connection As New SqlConnection("connectionString")
Using command As New SqlCommand(sql, connection)
Dim salaryParam = New SqlParameter("salary", SqlDbType.Money)
salaryParam.Value = txtMoney.Text
command.Parameters.Add(salaryParam)
Dim results = command.ExecuteReader()
End Using
End Using
Edit 2016-4-25:
As per George Stocker's comment, I changed the sample code to not use AddWithValue. Also, it is generally recommended that you wrap IDisposables in using statements.
You are right, this is related to SQL injection, which is a vulnerability that allows a malicioius user to execute arbitrary statements against your database. This old time favorite XKCD comic illustrates the concept:
In your example, if you just use:
var query = "SELECT empSalary from employee where salary = " + txtSalary.Text;
// and proceed to execute this query
You are open to SQL injection. For example, say someone enters txtSalary:
1; UPDATE employee SET salary = 9999999 WHERE empID = 10; --
1; DROP TABLE employee; --
// etc.
When you execute this query, it will perform a SELECT and an UPDATE or DROP, or whatever they wanted. The -- at the end simply comments out the rest of your query, which would be useful in the attack if you were concatenating anything after txtSalary.Text.
The correct way is to use parameterized queries, eg (C#):
SqlCommand query = new SqlCommand("SELECT empSalary FROM employee
WHERE salary = #sal;");
query.Parameters.AddWithValue("#sal", txtSalary.Text);
With that, you can safely execute the query.
For reference on how to avoid SQL injection in several other languages, check bobby-tables.com, a website maintained by a SO user.
In addition to other answers need to add that parameters not only helps prevent sql injection but can improve performance of queries. Sql server caching parameterized query plans and reuse them on repeated queries execution. If you not parameterized your query then sql server would compile new plan on each query(with some exclusion) execution if text of query would differ.
More information about query plan caching
Two years after my first go, I'm recidivating...
Why do we prefer parameters? SQL injection is obviously a big reason, but could it be that we're secretly longing to get back to SQL as a language. SQL in string literals is already a weird cultural practice, but at least you can copy and paste your request into management studio. SQL dynamically constructed with host language conditionals and control structures, when SQL has conditionals and control structures, is just level 0 barbarism. You have to run your app in debug, or with a trace, to see what SQL it generates.
Don't stop with just parameters. Go all the way and use QueryFirst (disclaimer: which I wrote). Your SQL lives in a .sql file. You edit it in the fabulous TSQL editor window, with syntax validation and Intellisense for your tables and columns. You can assign test data in the special comments section and click "play" to run your query right there in the window. Creating a parameter is as easy as putting "#myParam" in your SQL. Then, each time you save, QueryFirst generates the C# wrapper for your query. Your parameters pop up, strongly typed, as arguments to the Execute() methods. Your results are returned in an IEnumerable or List of strongly typed POCOs, the types generated from the actual schema returned by your query. If your query doesn't run, your app won't compile. If your db schema changes and your query runs but some columns disappear, the compile error points to the line in your code that tries to access the missing data. And there are numerous other advantages. Why would you want to access data any other way?
In Sql when any word contain # sign it means it is variable and we use this variable to set value in it and use it on number area on the same sql script because it is only restricted on the single script while you can declare lot of variables of same type and name on many script. We use this variable in stored procedure lot because stored procedure are pre-compiled queries and we can pass values in these variable from script, desktop and websites for further information read Declare Local Variable, Sql Stored Procedure and sql injections.
Also read Protect from sql injection it will guide how you can protect your database.
Hope it help you to understand also any question comment me.
Old post but wanted to ensure newcomers are aware of Stored procedures.
My 10¢ worth here is that if you are able to write your SQL statement as a stored procedure, that in my view is the optimum approach. I ALWAYS use stored procs and never loop through records in my main code. For Example: SQL Table > SQL Stored Procedures > IIS/Dot.NET > Class.
When you use stored procedures, you can restrict the user to EXECUTE permission only, thus reducing security risks.
Your stored procedure is inherently paramerised, and you can specify input and output parameters.
The stored procedure (if it returns data via SELECT statement) can be accessed and read in the exact same way as you would a regular SELECT statement in your code.
It also runs faster as it is compiled on the SQL Server.
Did I also mention you can do multiple steps, e.g. update a table, check values on another DB server, and then once finally finished, return data to the client, all on the same server, and no interaction with the client. So this is MUCH faster than coding this logic in your code.
Other answers cover why parameters are important, but there is a downside! In .net, there are several methods for creating parameters (Add, AddWithValue), but they all require you to worry, needlessly, about the parameter name, and they all reduce the readability of the SQL in the code. Right when you're trying to meditate on the SQL, you need to hunt around above or below to see what value has been used in the parameter.
I humbly claim my little SqlBuilder class is the most elegant way to write parameterized queries. Your code will look like this...
C#
var bldr = new SqlBuilder( myCommand );
bldr.Append("SELECT * FROM CUSTOMERS WHERE ID = ").Value(myId);
//or
bldr.Append("SELECT * FROM CUSTOMERS WHERE NAME LIKE ").FuzzyValue(myName);
myCommand.CommandText = bldr.ToString();
Your code will be shorter and much more readable. You don't even need extra lines, and, when you're reading back, you don't need to hunt around for the value of parameters. The class you need is here...
using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
public class SqlBuilder
{
private StringBuilder _rq;
private SqlCommand _cmd;
private int _seq;
public SqlBuilder(SqlCommand cmd)
{
_rq = new StringBuilder();
_cmd = cmd;
_seq = 0;
}
public SqlBuilder Append(String str)
{
_rq.Append(str);
return this;
}
public SqlBuilder Value(Object value)
{
string paramName = "#SqlBuilderParam" + _seq++;
_rq.Append(paramName);
_cmd.Parameters.AddWithValue(paramName, value);
return this;
}
public SqlBuilder FuzzyValue(Object value)
{
string paramName = "#SqlBuilderParam" + _seq++;
_rq.Append("'%' + " + paramName + " + '%'");
_cmd.Parameters.AddWithValue(paramName, value);
return this;
}
public override string ToString()
{
return _rq.ToString();
}
}

Large SQL inserts TVF vs BULK insert

What is the fastest way to insert a huge array (10M elements) from a C# application?
Till now, I used bulk insert. C# app generates a large textual file and I load it with BULK INSERT command . Out of curiosity I wrote a simple user defined CLR table value function.
[SqlFunction(Name = "getArray", FillRowMethodName = "FillRow")]
public static IEnumerable getArray(String name)
{
return my_arrays[name]; // returns the array I want to insert into db
}
public static void FillRow(Object o, out SqlDouble sdo)
{
sdo = new SqlDouble((double)o);
}
And this query:
INSERT INTO my_table SELECT data FROM dbo.getArray('x');
Works almost 2 times faster than bulk equivalent. The exact results are:
BULK - 330s (write to disk + insert)
TVF - 185s
Of course, this is due to write overhead, but I don't know if BULK insert have any in memory equivalent.
So my question is - is TVF better compering to the BULK (which is created for huge inserts), or am I missing something here. Is there any third alternative?
I use a SqlBulkCopy when I really need the very last drop of performance, that way you can skip the overhead of first putting it all on disk.
The SqlBulkCopy accepts an IDataReader that you have to implement, but only a few methods of the interface. What I always do is just create the class MyBulkCopySource : IDataReader, click 'Implement interface' and feed it to the BulkCopy as is to see wich method gets called. Implement that, try again etc. You only need to implement three of four of them, the rest never gets called.
AFAIK this is the fastest way to pump data from a C# program into a SqlDB.
GJ
Use SqlBulkCopy
From multiple threads with blocks like 30.000 rows each time.
NOT to the final table, but to a temporary table
From which you copy over, using a connection setting that does not honor locks.
This totally puts the smallest locking on the end table.

Hibernate and dry-running HQL queries statically

I'd like to "dry-run" Hibernate HQL queries. That is I'd like to know what actual SQL queries Hibernate will execute from given HQL query without actually executing the HQL query against real database.
I have access to hibernate mapping for tables, the HQL query string, the dialect for my database. I have also access to database if that is needed.
Now, how can I find out all the SQL queries Hibernate can generate from my HQL without actually executing the query against any database? Are there any tools for this?
Note, that many SQL queries can be generated from one HQL query and the set of generated SQL queries may differ based on the contents of database.
I am not asking how to log SQL queries while HQL query is executing.
Edit: I don't mind connecting to database to fetch some metadata, I just don't want to execute queries.
Edit: I also know what limits and offsets are applied to query. I also have the actual parameters that will be bind to query.
The short answer is "you can't". The long answer is below.
There are two approaches you can take:
A) Look into HQLQueryPlan class, particularly its getSqlStrings() method. It will not get you the exact SQL because further preprocessing is involved before query is actually executed (parameters are bound, limit / offset are applied, etc...) but it may be close enough to what you want.
The thing to keep in mind here is that you'll need an actual SessionFactory instance in order to construct HQLQueryPlan, which means you won't be able to do so without "connecting to any database". You can, however, use in-memory database (SqlLite and the likes) and have Hibernate auto-create necessary schema for it.
B) Start with ASTQueryTranslatorFactory and descend into AST / ANTLR madness. In theory you may be able to hack together a parser that would work without relying on metadata but I have a hardest time imagining what is it you're trying to do for this to be worth it. Perhaps you can clarify? There has to be a better approach.
Update: for an offline, dry-run of some HQL, using HQLQueryPlan directly is a good approach. If you want to intercept every query in the app, while it's running, and record the SQL, you'll have to use proxies and reflection as described below.
Take a look at this answer for Criteria Queries.
For HQL, it's the same concept - you have to cast to Hibernate implementation classes and/or access private members, so it's not a supported method, but it will work with a the 3.2-3.3 versions of Hibernate. Here is the code to access the query from HQL (query is the object returned by session.createQuery(hql_string):
Field f = AbstractQueryImpl.class.getDeclaredField("session");
f.setAccessible(true);
SessionImpl sessionImpl = (SessionImpl) f.get(query);
Method m = AbstractSessionImpl.class.getDeclaredMethod("getHQLQueryPlan", new Class[] { String.class, boolean.class });
m.setAccessible(true);
HQLQueryPlan plan = (HQLQueryPlan) m.invoke(sessionImpl, new Object[] { query.getQueryString(), Boolean.FALSE });
for (int i = 0; i < plan.getSqlStrings().length; ++i) {
sql += plan.getSqlStrings()[i];
}
I would wrap all of that in a try/catch so you can go on with the query if the logging doesn't work.
It's possible to proxy your session and then proxy your queries so that you can log the sql and the parameters of every query (hql, sql, criteria) before it runs, without the code that builds the query having to do anything (as long as the initial session is retrieved from code you control).