I am asking a question that is related to Execute multiple SQL commands in one round trip but not exactly the same because I am having this problem on a much bigger scale:
I have an input file with many different SQL commands (ExecuteNonQuery) that I have to process with a .net application.
Example:
INSERT INTO USERS (name, password) VALUES (#name, #pw); #name="abc"; #pw="def";
DELETE FROM USERS WHERE name=#name; #name="ghi";
INSERT INTO USERS (name, password) VALUES (#name, #pw); #name="mno"; #pw="pqr";
All of the commands have parameters so I would like the parameter mechanism that .net provides. But my application has to read these statements and execute them within an acceptable time span. There might be multiple thousand statements in one single file.
My first thought was to use SQLCommand with parameters since that would really be the way to do it properly (parameters are escaped by .net) but I can't afford to wait 50msec for each command to complete (network communication with DB server, ...). I need a way to chain the commands.
My second thought was to escape and insert the parameters myself so I could combine multiple commands in one SQLCommand:
INSERT INTO USERS (name, password) VALUES ('abc', 'def'); DELETE FROM USERS WHERE name=#name; #name='ghi'; INSERT INTO USERS (name, password) VALUES ('mno', 'pqr');
However I do feel uneasy with this solution because I don't like to escape the input myself if there are predefined functions to do it.
What would you do? Thanks for your answers, Chris
Assuming everything in the input is valid, what I would do is this:
Parse out the parameter names and values
Rewrite the parameter names so they are unique across all queries (i.e., so you would be able to execute two queries with a #name parameter in the same batch)
Group together a bunch of queries into a single batch and run the batches inside a transaction
The reason why you (likely) won't be able to run this all in a single batch is because there is a parameter limit of 2100 in a single batch (at least there was when I did this same thing with SQL Server); depending on the performance you get, you'll want to tweak the batch separation limit. 250-500 worked best for my workload; YMMV.
One thing I would not do is multi-thread this. If the input is arbitrary, the program has no idea if the order of the operations is important; therefore, you can't start splitting up the queries to run simultaneously.
Honestly, as long as you can get the queries to the server somehow, you're probably in good shape. With only "multiple thousands" of statements, the whole process won't take very long. (The application I wrote had to do this with several million statements.)
Interesting dilemma..
I would suggest any of these:
Have control over sql server? create a stored procedure that loads the file and do the work
Use sqlcommand, but then cache the parameters, and read only command type (delete, insert, etc) and the values from each line to execute. Parameter caching examples here, here, and here.
Use multiple threads.. A parent thread to read the lines and send them over to other threads: one to do the inserts, another to do the deletion, or as many as needed. Look at Tasks
Related
I have a database with lots of text that are all in capital letters and now should be converted into normal lower/upper case writing. This applies to technical stuff, for example
CIRCUIT BREAKER(D)480Y/277VAC,1.5A,2POL
This should be written circuit breaker(D)480Y/277VAC,1.5A,2POL
I think the only Approach is to have a list of word with the correct spelling and then do something like a Search & Replace.
Does anybody can give me a clue to Approach this in MS SQL?
You could do one of two things -
Write a simple script (in whichever language you are comfortable - eg: php, perl, python etc.,) that reads the columns from the DB, does the case-conversion and updates the values back into the DB.
The advantage of this would be that you will have greater flexibility and control on what you want to modify and how you wish to do it.
For this solution to work, you may need to maintain a dict/hash in the script, having the mapping of upper-case to lower-case keyword mapping.
The second probable solution, if you do not wish to write a separate script, would be to create a SQL function instead, that reads the corresponding rows/columns that need to be updated, performs the case-conversion and writes/updates it back into the DB.
This might be slightly inefficient, depending on how you implement the function. But it takes away the dependency of writing another script for you.
For this solution to work, you may need to maintain another table having the mapping of upper-case to lower-case keyword mapping.
Whichever you are more comfortable with.
Create a mapping table for dictionary. It could be temporary table within a session. Load the table with values and use this temporary table for your update. If you want this to be permanant solution to handle new rows coming in, then have permanent table.
CREATE TABLE #dictionaryMap(sourceValue NVARCHAR(4000), targetValue NVARCHAR(4000));
CREATE TABLE #tableForUpdate(Term NVARCHAR(4000));
INSERT INTO #dictionaryMap
VALUES ('%BREAKER%','breaker'),('%CIRCUIT%','circuit');
INSERT INTO #tableForUpdate
VALUES ('CIRCUIT BREAKER(D)480Y/277VAC,1.5A,2POL');
Perform the UPDATE to #tableForUpdate using WhileLoop in TSQL, using #dictionaryMap.
When using SQL in conjunction with another language what data must be escaped? I was just reading the question here and it was my understanding that only data from the user must be escaped.
Also must all SQL statements be escaped? e.g. INSERT, UPDATE and SELECT
EVERY type of query in SQL must be properly escaped. And not only "user" data. It's entirely possible to inject YOURSELF if you're not careful.
e.g. in pseudo-code:
$name = sql_get_query("SELECT lastname FROM sometable");
sql_query("INSERT INTO othertable (badguy) VALUES ('$name')");
That data never touched the 'user', it was never submitted by the user, but it's still a vulnerability - consider what happens if the user's last name is O'Brien.
Most programming languages provide code for connecting to databases in a uniform way (for example JDBC in Java and DBI in Perl). These provide automatic techniques for doing any necessary escaping using Prepared Statements.
All SQL queries should be properly sanitized and there are various ways of doing it.
You need to prevent the user from trying to exploit your code using SQL Injection.
Injections can be made in various ways, for example through user input, server variables and cookie modifications.
Given a query like:
"SELECT * FROM tablename WHERE username= <user input> "
If the user input is not escaped, the user could do something like
' or '1'='1
Executing the query with this input will actually make it always true, possibly exposing sensitive data to the attacker. But there are many other, much worse scenarios injection can be used for.
You should take a look at the OWASP SQL Injection Guide. They have a nice overview of how to prevent those situations and various ways of dealing with it.
I also think it largely depends on what you consider 'user data' to be or indeed orignate from. I personally consider user data as data available (even if this is only through exploitations) in the public domain, i.e. can be changed by 'a' user even if it's not 'the' user.
Marc B makes a good point however that in certain circumstances you may dirty your own data, so I guess it's always better to be safer than sorry in regards to sql data.
I would note that in regards to direct user input (i.e. from web forms, etc) you should always have an additional layer server side validation before the data even gets near a sql query.
I read here that the maximum number of parameters that can be passed to a Stored Procedure is 2100.
I am just curious what kind of system would require a SP with 2100 parameters to be passed, and couldn't one split that into multiple SPs?
I thought that maybe an SP that calls multiple SPs would require a lot of params to be passed, I just can't fathom writing out that disgusting EXEC statment.
The limit of procedures parameters predates both the XML data type and the table valued parameters, so back in those days there was simply no alternative. Having 2100 parameters on a procedure does not necessarily mean a human wrote it nor that a human will call it. It is quite common in generated code, like code created by tools and frameworks, to push such boundaries for any language, since the maintenance and refactoring of the generated code occur in the generating tool, not in the result code.
If you have a stored procedure using 2100 parameters you most likely have some sort of design problem.
Passing in a CSV list of values in a single parameter (and using a table value split function to turn those into rows), or using a table value parameter would be much easier than handling all of those input parameters.
I had a situation where I had to run something quite like the following:
SELECT
...
WHERE
ID IN (?,?,?,?...)
The list of parameters featured all entities the user had permission to use in the system (it was dynamically generated by some underlying framework). Turns out that the SGBD had a limitation on the number of parameters to be passed like this, and it was below 2100 (IIRC, it was Oracle and the maximum were 999 parameters in the IN list).
This would be a nice example of a rather long list of parameters to something that had to be a stored procedure (we had more than 999 and less than 2100 parameters to pass).
Don't know if the 999 constraint apply to sql server, but it is definately a situation where the long list would be useful...
I'm here to share a consolidated analysis for the following scenario:
I've an 'Item' table and I've a search SP for it. I want to be able to search for multiple ItemCodes like:
- Table structure : Item(Id INT, ItemCode nvarchar(20))
- Filter query format: SELECT * FROM Item WHERE ItemCode IN ('xx','yy','zz')
I want to do this dynamically using stored procedure. I'll pass an #ItemCodes parameter which will have comma(',') separated values and the search shud be performed as above.
Well, I've already visited lot of posts\forums and here're some threads:
Dynamic SQL might be a least complex way but I don't want to consider it because of the parameters like performance,security (SQL-Injection, etc..)..
Also other approaches like XML, etc.. if they make things complex I can't use them.
And finally, no extra temp-table JOIN kind of performance hitting tricks please.
I've to manage the performance as well as the complexity.
T-SQL stored procedure that accepts multiple Id values
Passing an "in" list via stored procedure
I've reviewed the above two posts and gone thru some solutions provided, here're some limitations:
http://www.sommarskog.se/arrays-in-sql-2005.html
This will require me to 'declare' the parameter-type while passing it to the SP, it distorts the abstraction (I don't set type in any of my parameters because each of them is treated in a generic way)
http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters
This is a structured approach but it increases complexity, required DB-structure level changes and its not abstract as above.
http://madprops.org/blog/splitting-text-into-words-in-sql-revisited/
Well, this seems to match-up with my old solutions. Here's what I did in the past -
I created an SQL function : [GetTableFromValues] (returns a temp table populated each item (one per row) from the comma separated #ItemCodes)
And, here's how I use it in my WHERE caluse filter in SP -
SELECT * FROM Item WHERE ItemCode in (SELECT * FROM[dbo].[GetTableFromValues](#ItemCodes))
This one is reusable and looks simple and short (comparatively of course). Anything I've missed or any expert with a better solution (obviously 'within' the limitations of the above mentioned points).
Thank you.
I think using dynamic T-SQL will be pragmatic in this scenario. If you are careful with the design, dynamic sql works like a charm. I have leveraged it in countless projects when it was the right fit. With that said let me address your two main concerns - performance and sql injection.
With regards to performance, read T-SQL reference on parameterized dynamic sql and sp_executesql (instead of sp_execute). A combination of parameterized sql and using sp_executesql will get you out of the woods on performance by ensuring that query plans are reused and sp_recompiles avoided! I have used dynamic sql even in real-time contexts and it works like a charm with these two items taken care of. For your satisfaction you can run a loop of million or so calls to the sp with and without the two optimizations, and use sql profiler to track sp_recompile events.
Now, about SQL-injection. This will be an issue if you use an incorrect user widget such as a textbox to allow the user to input the item codes. In that scenario it is possible that a hacker may write select statements and try to extract information about your system. You can write code to prevent this but I think going down that route is a trap. Instead consider using an appropriate user widget such as a listbox (depending on your frontend platform) that allows multiple selection. In this case the user will just select from a list of "presented items" and your code will generate the string containing the corresponding item codes. Basically you do not pass user text to the dynamic sql sp! You can even use slicker JQuery based selection widgets but the bottom line is that the user does not get to type any unacceptable text that hits your data layer.
Next, you just need a simple stored procedure on the database that takes a param for the itemcodes (for e.g. '''xyz''','''abc'''). Internally it should use sp_executesql with a parameterized dynamic query.
I hope this helps.
-Tabrez
Scenario:
I load some data into a local MySQL database each day, about 2 million rows;
I have to (have to - it's an audit/regulatory thing) move to a "properly" administered server, which currently looks to be Oracle 10g;
The server is in a different country: a network round-trip current takes 60-70 ms;
Input is a CSV file in a denormalised form: I normalise the data before loading, each line typically results in 3-8 INSERTs across up to 4 tables;
The load script is currently implemented in Ruby, using ActiveRecord and fastercsv. I've tried the ar-extensions gem, but it assumes that the MySQL style multiple values clause idea will work. It doesn't.
EDIT: Extremely useful answers already - thank-you! More about that pesky input file. The number of fields is variable and positions have changed a few times - my current script determines content by analysing the header row (well, fastercsv and a cunning converter do it). So a straight upload and post-process SQL wouldn't work without several versions of the load file, which is horrible. Also it's a German CSV file: semi-colon delimited (no big deal) and decimals indicated by commas (rather bigger deal unless we load as VARCHAR and text-process afterwards - ugh).
The problem:
Loading 2 million rows at about 7/sec is going to take rather more than 24 hours! That's likely to be a drawback with a daily process, not to mention that the users would rather like to be able to access the data about 5 hours after it becomes available in CSV form!
I looked at applying multiple inserts per network trip: the rather ungainly INSERT ALL... syntax would be fine, except that at present I'm applying a unique id to each row using a sequence. It transpires that
INSERT ALL
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2)
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4)
INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6)
SELECT 1 FROM dual;
(did I say it was ungainly?) tries to use the same id for all three rows. Oracle docus appear to confirm this.
Latest attempt is to send multiple INSERTs in one execution, e.g.:
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6);
I haven't found a way to persuade Oracle to accept that.
The Question(s)
Have I missed something obvious? (I'd be so pleased if that turned out to be the case!)
If I can't send multiple inserts, what else could I try?
Why Accept That One?
For whatever reason, I prefer to keep my code as free from platform-specific constructs as possible: one reason this problem arose is that I'm migrating from MySQL to Oracle; it's possible another move could occur one day for geographical reasons, and I can't be certain about the platform. So getting my database library to the point where it can use a text SQL command to achieve reasonable scaling was attractive, and the PL/SQL block accomplishes that. Now if another platform does appear, the change will be limited to changing the adapter in code: a one-liner, in all probability.
How about shipping the csv file to the oracle db server, use SQLLoader to load the csv file into a staging table and then running a stored procedure to transform and INSERT it in the final tables?
You could use:
insert into tablea (id,b,c)
( select tablea_seq.nextval,1,2 from dual union all
select tablea_seq.nextval,3,4 from dual union all
select tablea_seq.nextval,3,4 from dual union all
select tablea_seq.nextval,3,4 from dual union all
...
)
This works until about up to 1024 lines when I remember correctly.
You could also send it as a PL/SQL batch instruction:
BEGIN
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,1,2);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,3,4);
INSERT INTO tablea (id,b,c) VALUES (tablea_seq.nextval,5,6);
...
COMMIT;
END
I would be loading the raw CSV file to a dedicated table in the database using SQL*Loader without normalising it, then running code against the data in the table to normalise it out to the various required tables. This would minimise the round trips, and is a pretty conventional approach in the Oracle community.
SQL*Loader can be a little challenging in terms of initial learning curve, but you can soon post again if you get stuck.
http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/part_ldr.htm#i436326
I have a quick suggestion. I'm from the MySQL world but was trained on Oracle, I think this will work.
In MySQL you can insert multiple records with a single insert statement. It looks like this:
INSERT INTO table_name (column_one, column_two, column_three, column_four)
VALUES
('a', 'one', 'alpha', 'uno'), // Row 1
('b', 'two', 'beta', 'dos'), // Row 2
('c', 'three', 'gamma', 'tres'), // etc.
....
('z', 'twenty-six', 'omega', 'veintiséis');
Now obviously you can only insert into one table at once, and you wouldn't want to do 2 million records, but you could easily do 10 or 20 or 100 at a time (if you are allowed packets that big). You may have to generate this by hand, I don't know if they frameworks you are using (or any, for that matter) support making this kind of code for you.
In the MySQL world this DRAMATICALLY speeds up inserts. I assume it does all the index updates and such at the same time, but it also prevents it from having to re-parse the SQL on each insert.
If you combine this with prepared statements (so the SQL is cached and it doesn't have to be parsed out each time) and transactions (to make sure things are always in a sane state when you have to do inserts across multiple tables... I think you will be doing pretty good.
NunoG is right that you can load the CSVs directly into Oracle. You may be best off reading in the input file, generating a normalized set of CSV files (one for each table), and then loading in each of those one at a time.
Instead of executing the SQL over the network you could write the Inserts to a text file, move it over the network and run it locally there.
SQL*Loader is an Oracle-supplied utility that allows you to load data from a flat file into one or more database tables. This will be 10-100 times faster than inserts using queries.
http://www.orafaq.com/wiki/SQL*Loader_FAQ
If SQL*Loader doesn't cut it, try a small preprocessor program that formats the file in SQL*Loader readable format.