INSERTing data from a text file into SQL server (speed? method?) - sql

Got about a 400 MB .txt file here that is delimited by '|'. Using a Windows Form with C#, I'm inserting each row of the .txt file into a table in my SQL server database.
What I'm doing is simply this (shortened by "..." for brevity):
while ((line = file.ReadLine()) != null)
{
string[] split = line.Split(new Char[] { '|' });
SqlCommand cmd = new SqlCommand("INSERT INTO NEW_AnnualData VALUES (#YR1984, #YR1985, ..., #YR2012)", myconn);
cmd.Parameters.AddWithValue("#YR1984", split[0]);
cmd.Parameters.AddWithValue("#YR1985", split[1]);
...
cmd.Parameters.AddWithValue("#YR2012", split[28]);
cmd.ExecuteNonQuery();
}
Now, this is working, but it is taking awhile. This is my first time to do anything with a huge amount of data, so I need to make sure that A) I'm doing this in an efficient manner, and that B) my expectations aren't too high.
Using a SELECT COUNT() while the loop is going, I can watch the number go up and up over time. So I used a clock and some basic math to figure out the speed that things are working. In 60 seconds, there were 73881 inserts. That's 1231 inserts per second. The question is, is this an average speed, or am I getting poor performance? If the latter, what can I do to improve the performance?
I did read something about SSIS being efficient for this purpose exactly. However, I need this action to come from clicking a button in a Windows Form, not going through SISS.

Oooh - that approach is going to give you appalling performance. Try using BULK INSERT, as follows:
BULK INSERT MyTable
FROM 'e:\orders\lineitem.tbl'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n'
)
This is the best solution in terms of performance. There is a drawback, in that the file must be present on the database server. There are two workarounds for this that I've used in the past, if you don't access to the server's file system from where you're running the process. One is to install an instance of SQL Express on the workstation, add the main server as a linked server to the workstation instance, and then run "BULK INSERT MyServer.MyDatabase.dbo.MyTable...". The other option is to reformat the CSV file as XML, which can be processed very quickly, and then passing the XML to query and processing it using OPENXML. Both BULK INSERT and OPENXML are well documented on MSDN, and you'd do well to read through the examples.

Have a look at SqlBulkCopy on MSDN, or the nice blog post here. For me that goes up to tens of thousands of inserts per second.

I'd have to agree with Andomar. I really quite like SqlBulkCopy. It is really fast (you need to play around with BatchSizes to make sure you find one that suits your situation.)
For a really in depth article discussing the various options, check out Microsoft's "Data Loading Performance Guide";
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Also, take a look at the C# example with SqlBulkCopy of CSV Reader. It isn't free, but if you can write a fast and accurate parser in less time, then go for it. At least, it'll give you some ideas.

I have fonud SSIS to be much faster than this type of method but there are a bunch of variables that can affect performence.
If you want to experiment with SSIS, use the Import and Export wizard in Management Studio to generate a SSIS package that will import a pipe delimited file. You can save out the package and run it from a .NET application
See this article: http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx for info on how to run an SSIS package programatically. It includes options on how to run from the client, from the server, or wherever.
Also, take a look at this article for additional ways you can improve bulk insert performance in general. http://msdn.microsoft.com/en-us/library/ms190421.aspx

Related

What is the fastest way to write to a db table from R?

I have a data table in R with 1.5M rows. I want to export this to a MS SQL db table.
I know I can do it this way:
dbWriteTable(conn,"benefit_custom.Trial_set",trial_set )
But its very slow.
The other option I've tried is to write to a flat file and then create an SSIS pkg to transfer it to the db. This is not a problem, but the issue is that I have string and numeric data in my data table, and when R writes to the file, everything is varchar and is enclosed within quotes.
FileLocation <-"\\Benefit_Analysis_Input.dat"
FileName<- paste( bcpWorkspace,FileLocation,sep = "")
write.table(trial_set,file =FileName,append = FALSE, sep = "\t",col.names = T, row.names = F)
The 1st method preserves the data types like I want to, but the performance is very bad. Does anyone have anything else I can try?
So I guess the data types cant be preserved if I'm writing to a flat file, so I have to go with choosing the data types when I'm importing the flat file into the db
Answering your question: the fastest seems to be rsqlserver
As of now I know about:
rsqlserver: Using System.Data.SqlClient drivers, only on win OS
RSQLServer: Using java drivers to SQLserver from any OS using RJDBC
RODBC: using ODBC drivers, easy setup only on win OS
Still microsoft sql server looks to be quite poorly supported from the R session perspective.
Here is interesting benchmark by rsqlserver project: https://github.com/agstudy/rsqlserver/wiki/benchmarking
Also important to note related to rsqlserver: A linux version using mono is planned.
Finally, my recent presentation on Data Warehousing with R covers DBI, RJDBC, RODBC examples.
I think #rhealitycheck is on the right track - I would use the SQL Import and Export Data wizard to generate an SSIS package. I would save it and customize it later, for example adding an upstream Execute Process Task to call R and write the text file out.
The performance and flexibility of this solution is hard to beat.

Inserting multiple rows in Webmatrix database

I'm sure this is a comparatively simple question, but after several hours, I haven't been able to find the answer, and I'm still somewhat new to the wonderful world of databasing.
I'm starting to use Webmatrix and I'm trying to import a database with 13 columns and a little over 200 rows (just enough so I don't want to type it all out again).
All my work remains offline and won't go online for at least another few months. So far I have worked on a Mac (php, mysql,etc) and just switched to PC to try Webmatrix. Since I couldn't find a simple enough feature to import a database in Webmatrix, I figured the easiest way might be to export an SQL file from phpMyAdmin on my Mac and execute it once via a cshtml page (coding razor).
With a few adjustments I recreated my table in this way and I'm also able to insert values in individual rows, however, I can't seem to be able to insert more than one row at a time.
My intended code is basically:
var db = Database.Open("Database");
var insertQuery = "INSERT INTO table (field1, field2, field3) VALUES ('val1', 'val2', 'val3'),('val4','val5','val6'),(etc)";
db.Execute(insertQuery);
Could anyone shed some light on what might be going wrong here?
I also looked at other methods of importing databases, I read about MySQL benchmark, but admit that's going a bit over my head.
Thanks in advance.
Your options depend on the type of database you are using. If you are using the default WebMatrix database (SQL Compact Edition 4.0), you can save the PHP data to a csv file, and then in WebMatrix, read that using File.ReadAllLines. That will give you an array of strings each containing comma separated values representing a row of data. You can use string.Split to create an array of individual values and insert that within a foreach loop:
var rows = File.ReadAllLines(path_to_csv_file);
foreach(var row in rows){
var vals = row.Split(new[]{','});
var sql = "INSERT INTO table(f1, f2, f3) VALUES (#0, #1, #2)";
db.Execute(sql, data[0], data[1], data[2]);
}
If you are using SQL Server, you can use BULK INSERT for a file. There are also a couple of other approaches for SQL CE too: Bulk Insert In SQL Server CE

What problems may occur while querying SQL databases with big amount of data over internet

I am having this big database on one MSSQL server that contains data indexed by a web crawler.
Every day I want to update SOLR SearchEngine Index using DataImportHandler which is situated in another server and another network.
Solr DataImportHandler uses query to get data from SQL. For example this query
SELECT * FROM DB.Table WHERE DateModified > Config.LastUpdateDate
The ImportHandler does 8 selects of this types. Each select will get arround 1000 rows from database.
To connect to SQL SERVER i am using com.microsoft.sqlserver.jdbc.SQLServerDriver
The parameters I can add for connection are:
responseBuffering="adaptive/all"
batchSize="integer"
So my question is:
What can go wrong while doing this queries every day ? ( except network errors )
I want to know how is SQL Server working in this context ?
Further more I have to take a decicion regarding the way I will implement this importing and how to handle errors, but first I need to know what errors can arise.
Thanks!
Later edit
My problem is that I don't know how can this SQL Queries fail. When i am calling this importer every day it does 10 queries to the database. If 5th query fails I have to options:
rollback the entire transaction and do it again, or commit the data I got from the first 4 queries and redo somehow the queries 5 to 10. But if this queries always fails, because of some other problems, I need to think another way to import this data.
Can this sql queries over internet fail because of timeout operations or something like this?
The only problem i identified after working with this type of import is:
Network problem - If the network connection fails: in this case SOLR is rolling back any changes and the commit doesn't take place. In my program I identify this as an error and don't log the changes in the database.
Thanks #GuidEmpty for providing his comment and clarifying out this for me.
There could be issues with permissions (not sure if you control these).
Might be a good idea to catch exceptions you can think of and include a catch all (Exception exp).
Then take the overall one as a worst case and roll-back (where you can) and log the exception to include later on.
You don't say what types you are selecting either, keep in mind text/blob can take a lot more space and could cause issues internally if you buffer any data etc.
Though just a quick re-read and you don't need to roll-back if you are only selecting.
I think you would be better having a think about what you are hoping to achieve and whether knowing all possible problems will help?
HTH

How to recover the old data from table

I have made an update statement in a table in SQL 2008 which updated the table with some wrong data.
I didn't have a backup for the DB.
It's some important dates which got updated.
Is there anyway where i can recover the old data from the table.
Thanks
SNA
Basically no unless you want to use a commercial log reader and try go through it with a fine tooth comb. No backup of the database can be an 'update resume, leave town' scenario - harsh but it just should not happen.
Andrew basically has called it. I just want to add a few ideas you can consider if you are desperate:
Are there any reports or printouts lying around? Perhaps you can reconstruct the data from there.
Was this data entered via a web application? If so, there is a remote chance you can find the original data in the web server logs, depending upon how the app was constructed, etc.
Does this app interface (pass data to) any other applications? They may have a buffered copy of data...
Can the data be derived from any other existing data? Is there an audit log table, or another date in your schema based on this one, from which you can reconstruct the original date?
Edit:
Some commenters are mentioning that is is a good idea to test your update/delete statements before running them. For this to become habit, it helps if you have an easy method. I usually create my DELETE statements like this:
--delete --select *
from [User]
where UserID=27
To run the select in order to test your query, highlight everything from select onwards. To then run the delete if you are satisfied with the filter criteria, highlight everything from delete onwards. The two dashes in front of delete are so that if the query accidentally gets run, it will just crash due to invalid syntax.
You can use a similar construct for UPDATE statements, although it is not quite as clean.
SQL server keeps log for every transation.So you can recover your modified data from the log as well without backup.
Select [PAGE ID],[Slot ID],[AllocUnitId],[Transaction ID]
,[RowLog Contents 0], [RowLog Contents 1],[RowLog Contents 3],[RowLog Contents 4]
,[Log Record]
FROM sys.fn_dblog(NULL, NULL)
WHERE
AllocUnitId IN
(Select [Allocation_unit_id] from sys.allocation_units allocunits
INNER JOIN sys.partitions partitions ON (allocunits.type IN (1, 3)
AND partitions.hobt_id = allocunits.container_id) OR (allocunits.type = 2
AND partitions.partition_id = allocunits.container_id)
Where object_id=object_ID('' + 'dbo.student' + ''))
AND Operation in ('LOP_MODIFY_ROW','LOP_MODIFY_COLUMNS')
And [Context] IN ('LCX_HEAP','LCX_CLUSTERED')
Here is the artcile, that explains step by step, how to do it.
http://raresql.com/2012/02/01/how-to-recover-modified-records-from-sql-server-part-1/
Imran
Thanks for all the responses.
The problem was actually accidentally ---i missed to select the where condition in the update statement.---Rest !.
It was a quick 5 minutes task --Like just changing the date to test for one customer data--so we didn't think of taking a backup.
Yes of course you are true ..This is a lesson.
Now onwards i will be careful to write "my update statements in a transaction." or "test my update statements"
Thanks once again--for spending your time to give some insight rather ignoring the question since the only answer is "NO".
Thanks
SNA
Always take a backup before major UPDATE statements, even if it's not used, there's the peace of mind
Especially with Red Gate's Object Level Restore, one can restore individual table/row now given a backup file
Good luck, I'd suggest finding an old copy elsewhere (DEV/QA) etc...
Isn't it possible to do a rollback on an UPDATE statement?
Late one but hopefully useful…
If database is in full recovery mode then all transactions are logged in transaction log and can be retrieved. Problem is that this is not natively supported because this is not the main purpose of the transaction log.
Options are:
Commercial tools such as Apex Log (more expensive, more options) or Quest Toad (less expensive, less options for this purpose main focus is on SQL Server management)
Trying to do this yourself, like user1059637 pointed out. Problem with this approach is that it can’t read transaction log backups and is more tedious.
It comes down to how much your data is worth to you in terms of time and $.

Managing and debugging SQL queries in MS Access

MS Access has limited capabilities to manage raw SQL queries: the editor is quite bad, no syntax highlighting, it reformats your raw SQL into a long string and you can't insert comments.
Debugging complex SQL queries is a pain as well: either you have to split it into many smaller queries that become difficult to manage when your schema changes or you end-up with a giant query that is a nightmare to debug and update.
How do you manage your complex SQL queries in MS Access and how do you debug them?
Edit
At the moment, I'm mostly just using Notepad++ for some syntax colouring and SQL Pretty Printer for reformatting sensibly the raw SQL from Access.
Using an external repository is useful but keeping there's always the risk of getting the two versions out of sync and you still have to remove comments before trying the query in Access...
For debugging, I edit them in a separate text editor that lets me format them sensibly. When I find I need to make changes, I edit the version in the text editor, and paste it back to Access, never editing the version in Access.
Still a major PITA.
I have a few tips that are specific to SQL in VBA.
Put your SQL code with a string variable. I used to do this:
Set RS = DB.OpenRecordset("SELECT ...")
That is hard to manage. Do this instead:
strSQL = "SELECT ..."
Set RS = DB.OpenRecordset(strSQL)
Often you can't fix a query unless you see just what's being run. To do that, dump your SQL to the Immediate Window just before execution:
strSQL = "SELECT ..."
Debug.Print strSQL
Stop
Set RS = DB.OpenRecordset(strSQL)
Paste the result into Access' standard query builder (you must use SQL view). Now you can test the final version, including code-handled variables.
When you are preparing a long query as a string, break up your code:
strSQL = "SELECT wazzle FROM bamsploot" _
& vbCrLf & "WHERE plumsnooker = 0"
I first learned to use vbCrLf when I wanted to prettify long messages to the user. Later I found it makes SQL more readable while coding, and it improves the output from Debug.Print. (Tiny other benefit: no space needed at end of each line. The new line syntax builds that in.)
(NOTE: You might think this will let you add add comments to the right of the SQL lines. Prepare for disappointment.)
As said elsewhere here, trips to a text editor are a time-saver. Some text editors provide better syntax highlighting than the official VBA editor. (Heck, StackOverflow does better.) It's also efficient for deleting Access cruft like superfluous table references and piles of parentheses in the WHERE clause.
Work flow for serious trouble shooting:
VBA Debug.Print > (capture query during code operation)
query builder > (testing lab to find issues)
Notepad++ > (text editor for clean-up and review)
query builder > (checking, troubleshooting)
VBA
Of course, trouble shooting is usually a matter of reducing the complexity of a query until you're able to isolate the problem (or at least make it disappear!). Then you can build it back up to the masterpiece you wanted. Because it can take several cycles to solve a sticky problem, you are likely to use this work flow repeatedly.
I wrote Access SQL Editor-- an Add-In for Microsoft Access-- because I write quite a lot of pass-through queries, and more complex SQL within Access. This add-in has the advantage of being able to store formatted SQL (with comments!) within your Access application itself. When queries are copied to a new Access application, formatting is retained. When the built-in editor clobbers your formatting, the tool will show your original query and notify you of the difference.
It currently does not debug; if there was enough interest, I would pursue this-- but for the time being the feature set is intentionally kept small.
It is not free for the time being, but purchasing a license is very cheap. If you can't afford it, you can contact me. There is a free 14-day trial here.
Once it's installed, you can access it through your Add-Ins menu (In Access 2010 it's Database Tools->Add Ins).
Debugging is more of a challenge. If a single column is off, that's usually pretty easy to fix. But I'm assuming you have more complex debugging tasks that you need to perform.
When flummoxed, I typically start debugging with the FROM clause. I trace back to all the tables and sub-queries that comprise the larger query, and make sure that the joins are properly defined.
Then I check my WHERE clause. I run lots of simple queries on the tables, and on the sub-queries that I've already checked or that I already trust, and make sure that when I run the larger query, I'm getting what I expect with the WHERE conditions in place. I double-check the JOIN conditions at the same time.
I double-check my column definitions to make sure I'm retrieving what I really want to see, especially if the formulas involved are complicated. If you have something complicated like a coordinated subquery in a column definition
Then I check to see if I'm grouping data properly, making sure that "DISTINCT"'s and "UNION"'s without UNION ALL don't remove necessary duplicates.
I don't think I've ever encountered a SQL query that couldn't be broken down this way. I'm not always as methodical as this, but it's a good way to start breaking down a real stumper.
One thing I could recommend when you write your queries is this: Never use SELECT * in production code. Selecting all columns this way is a maintenance nightmare, and it leads to big problems when your underlying schemas change. You should always write out each and every column if you're writing SQL code that you'll be maintaining in the future. I saved myself a lot of time and worry just by getting rid of "SELECT *"'s in my projects.
The downside to this is that those extra columns won't appear automatically in queries that refer to "SELECT *" queries. But you should be aware of how your queries are related to each other, anyway, and if you need the extra columns, you can go back and add them.
There is some hassle involved in maintaining a code repository, but if you have versioning software, the hassle is more than worth it. I've heard of ways of versioning SQL code written in Access databases, but unfortunately, I've never used them.
If you're doing really complex queries in MS Access, I would consider keeping a repository of those queries somewhere outside of the Access database itself... for instance, in a .sql file that you can then edit in an editor like Intype that will provide syntax highlighting. It'll require you to update queries in both places, but you may end up finding it handy to have an "official" spot for it that is formatted and highlighted correctly.
Or, if at all possible, switch to SQL Server 2005 Express Edition, which is also free and will provide you the features you desire through the SQL Management Studio (also free).
Expanding on this suggestion from Smandoli:
NO: DoCmd.RunSQL ("SELECT ...")
YES: strSQL = "SELECT ..."
DoCmd.RunSQL (strSQL)
If you want to keep the SQL code in an external file, for editing with your favorite text editor (with syntax coloring and all that), you could do something like this pseudo-code:
// On initialization:
global strSQL
f = open("strSQL.sql")
strSQL = read_all(f)
close(f)
// To to the select:
DoCmd.RunSQL(strSQL)
This may be a bit clunky -- maybe a lot clunky -- but it avoids the consistency issues of edit-copy-paste.
Obviously this doesn't directly address debugging SQL, but managing code in a readable way is a part of the problem.
Similar to recursive, I use an external editor to write my queries. I use Notepad++ with the Light Explorer extension for maintaining several scripts at a time, and Notepad2 for one-off scripts. (I'm kind of partial to Scintilla-based editors.)
Another option is to use the free SQL Server Management Studio Express, which comes with SQL Server Express. (EDIT: Sorry, EdgarVerona, I didn't notice you mentioned this already!) I normally use it to write SQL queries instead of using Access, because I typically use ODBC to link to a SQL Server back end anyway. Beware that the differences in the syntax of T-SQL, used by SQL Server, and Jet SQL, used by Access MDB's, are sometimes substantial.
Are you talking here about what MS-Access calls 'queries' and SQL call 'views' or about the 'MS-Access pass-through' queries which are SQL queries? Someone could get easily lost! My solution is the following
free SQL Server Management
Studio Express, where I will
elaborate and test my queries
a query table on the client
side, with one field for the query
name (id_Query) and another one
(queryText, memo type) for the
query itself.
I then have a small function getSQLQuery in my VBA code to be used when I need to execute a query (either returning a recordset or not):
Dim myQuery as string, _
rsADO as ADODB.recorset
rsADO = new ADODB.recordset
myQuery = getSQLQuery(myId_Query)
'if my query retunrs a recordset'
set rsADO = myADOConnection.Execute myQuery
'or, if no recordset is to be returned'
myADOConnection.Execute myQuery
For views, it is even possible to keep them on the server side and to refer to them from the client side
set rsADO = myADOConnection.execute "dbo.myViewName"
Well to my knowledge there are 2 options:
Notepad++ with Poor man's t-sql formatter plugin ..i know there is already a mention for SQL Pretty Printer but i haven't used it..so my workflow is ..i create the query in Access..i copy paste it to Notepad++ ...i format it..i work on it ...back to Access..only issue..it pads in some cases spaces in this case : [Forms]![AForm].[Ctrl] and they become [Forms] ! [AForm].[Ctrl] but i am used to and it doesn't bother me..
SoftTree SQL Assistant (http://www.softtreetech.com/sqlassist/index.htm) bring just about everything you wanted on a SQL editor...i have worked a bit in the past(trial) but its price tag is a bit stiff