MS SQL - High performance data inserting with stored procedures - sql

Im searching for a very high performant possibility to insert data into a MS SQL database.
The data is a (relatively big) construct of objects with relations. For security reasons i want to use stored procedures instead of direct table access.
Lets say i have a structure like this:
Document
MetaData
User
Device
Content
ContentItem[0]
SubItem[0]
SubItem[1]
SubItem[2]
ContentItem[1]
...
ContentItem[2]
...
Right now I think of creating one big query, doing somehting like this (Just pseudo-code):
EXEC #DeviceID = CreateDevice ...;
EXEC #UserID = CreateUser ...;
EXEC #DocID = CreateDocument #DeviceID, #UserID, ...;
EXEC #ItemID = CreateItem #DocID, ...
EXEC CreateSubItem #ItemID, ...
EXEC CreateSubItem #ItemID, ...
EXEC CreateSubItem #ItemID, ...
...
But is this the best solution for performance? If not, what would be better?
Split it into more querys? Give all Data to one big stored procedure to reduce size of query? Any other performance clue?
I also thought of giving multiple items to one stored procedure, but i dont think its possible to give a non static amount of items to a stored procedure.
Since 'INSERT INTO A VALUES (B,C),(C,D),(E,F) is more performant than 3 single inserts i thought i could get some performance here.
Thanks for any hints,
Marks

One stored procedure so far as possible:
INSERT INTO MyTable(field1,field2)
SELECT "firstValue", "secondValue"
UNION ALL
SELECT "anotherFirstValue", "anotherSecondValue"
UNION ALL
If you aren't sure about how many items you're inserting you can construct the SQL query witin the sproc and then execute it. Here's a procedure I wrote to take a CSV list of groups and add their relationship to a user entity:
ALTER PROCEDURE [dbo].[UpdateUserADGroups]
#username varchar(100),
#groups varchar(5000)
AS
BEGIN
DECLARE #pos int,
#previous_pos int,
#value varchar(50),
#sql varchar(8000)
SET #pos = 1
SET #previous_pos = 0
SET #sql = 'INSERT INTO UserADGroups(UserID, RoleName)'
DECLARE #userID int
SET #userID = (SELECT TOP 1 UserID FROM Users WHERE Username = #username)
WHILE #pos > 0
BEGIN
SET #pos = CHARINDEX(',',#groups,#previous_pos+1)
IF #pos > 0
BEGIN
SET #value = SUBSTRING(#groups,#previous_pos+1,#pos-#previous_pos-1)
SET #sql = #sql + 'SELECT ' + cast(#userID as char(5)) + ',''' + #value + ''' UNION ALL '
SET #previous_pos = #pos
END
END
IF #previous_pos < LEN(#groups)
BEGIN
SET #value = SUBSTRING(#groups,#previous_pos+1,LEN(#groups))
SET #sql = #sql + 'SELECT ' + cast(#userID as char(5)) + ',''' + #value + ''''
END
print #sql
exec (#sql)
END
This is far faster than individual INSERTS.
Also, make sure you just a single clustered index on the primary key, more indexes will slow the INSERT down as they will need to update.
However, the more complex your dataset is, the less likely it is that you'll be able to do the above so you will simply have to make logical compromises. I actually end up calling the above routine around 8000 times.

Related

Does sp_executesql support multiple values in one parameter and return multiple records?

I have created a stored procedure as shown below, but it's returning only one row instead of 3:
CREATE PROCEDURE [dbo].[tempsp]
(#RecycleIds NVARCHAR(MAX) = NULL)
AS
BEGIN
DECLARE #Err INT
DECLARE #WhereClause NVARCHAR(MAX)
DECLARE #SQLText1 NVARCHAR(MAX)
DECLARE #SQLText NVARCHAR(MAX)
SET #SQLText1 = 'SELECT FROM dbo.SKU '
IF #RecycledSkuIds IS NOT NULL
BEGIN
SET #SQLText = 'SELECT FROM dbo.SKU WHERE SKU.SkuId IN (#RecycleIds)'
EXEC sp_executesql #SQLText, N'#RecycleSkuIds nvarchar', #RecycleIds
END
ELSE
BEGIN
EXEC(#SQLText1)
END
SET #Err = ##ERROR
RETURN #Err
END
-------end of stored procedure--------
EXEC tempsp #RecycleIds = '5,6,7'
After running this SQL statement, it only returns one row instead of 3, with the id's of 5, 6, 7.
Can anyone tell me what I am doing wrong?
i wanted to use sp_executesql, so that it can be safe against sql injection with strong type defined.
Use a table type parameter, with a strongly typed column:
CREATE TYPE dbo.IDs AS table (ID int);
GO
CREATE PROCEDURE [dbo].[tempsp] #RecycleIds dbo.IDs READONLY AS
BEGIN
IF EXISTS (SELECT 1 FROM #RecycleIds)
SELECT * --Replace with needed columns
FROM dbo.SKU S
--Using EXISTS in case someone silly puts in the same ID twice.
WHERE EXISTS (SELECT 1
FROM #RecycleIds R
WHERE R.ID = S.SkuID);
ELSE
SELECT * --Replace with needed columns
FROM dbo.SKU S
END;
GO
Then you could execute it like so:
EXEC dbo.tempsp; --All Rows
GO
DECLARE #RecycleIds dbo.IDs;
INSERT INTO #RecycleIds
VALUES(1),(40),(182);
EXEC dbo.tempsp #RecycleIds;
I was trying to retrive the rows whose id matches within the IN clause.
SET #INClauseIds='''' + replace(#Ids, ',', ''',''') + ''''
Above statement would convert the ID's ='1,2,3' to '1','2','3' which i can directly place in the IN clause.
SET #SQLText1 ='EXEC(''SELECT Name,SEOFriendlyName FROM SKU Where Id IN ( ''+ #Ids+'' ) )'
EXEC sp_executesql #SQLText1 ,N'#INClauseIds nvarchar(max)',#Ids=#INClauseIds
If you want to avoid the usage of Temp Table which would add extra caliculation time. you can you the above strategy to retrive n number of records. Safe with strongly coupled with sp_executesql and without any sql injection.
You cannot use IN. Or, more accurately, you have a string and you are confusing it with a list. One method is to instead use LIKE:
SET #SQLText = '
SELECT *
FROM dbo.SKU
WHERE CONCAT('','', #RecycleIds, '','') LIKE CONCAT(''%,'', SKU.SkuId, '',%'')
';

Break from foreach loop in sql (dbo.sp_MsForEachDb)

I have several databases with the same schema. All of the databases have a table called Invoices. Every invoice has a unique id because its a GUID. Sometimes I am given an invoice ID and I will like to know to what database it belongs. As a result I execute the following query:
DECLARE #sql NVARCHAR(2000)
SET #sql = '
IF OBJECT_ID(''[?].dbo.Invoices'') IS NOT NULL
begin
declare #query NVARCHAR(255)
Select #query = (Select [Id] from [?].dbo.Invoices where [Id] = ''XF4G-XF78-2156-7XH8'')
IF #query IS NOT NULL
begin
print ''Database = '' + ''?''
end
end
'
EXEC dbo.sp_MsForEachDb #sql
Basically EXEC dbo.sp_MsForEachDb #sql executes the #sql query replacing ? for each of the databases. Once I enter the print statement because an invoice was found I will like to stop execution. In other words I was hoping to be able to do something like
begin
print 'Database = ' + ''?''
return
end
If I debug my query I still see that the query gets executed on all databases even though I already found the invoice.
Unfortunately I don't think there is an easy way to do this.
A completely inelegant way would be to close the cursor that the system procedure uses. Instead of return use CLOSE hCForEachDatabase. It would kick you out of the loop, but with an error. Like I said, a horrible way of doing it.
A more elegant way would be to continue the loop through all the databases, but without doing any of your processing once a match has been found. Assuming the user has rights to create a temp table you could do something like this...
DECLARE #sql NVARCHAR(2000)
CREATE TABLE #StopProcess(STOP BIT NULL);
SET #sql = '
IF (SELECT COUNT(*) FROM #StopProcess) = 0
BEGIN
IF OBJECT_ID(''[?].dbo.Invoices'') IS NOT NULL
BEGIN
declare #query NVARCHAR(255)
Select #query = (Select [Id] from [?].dbo.Invoices where [Id] = ''XF4G-XF78-2156-7XH8'')
IF #query IS NOT NULL
END
print ''Database = '' + ''?''
INSERT INTO #StopProcess values (1);
END
END
END'
EXEC dbo.sp_MsForEachDb #sql
DROP TABLE #StopProcess;
If you absolutely need to stop processing through all the databases, the only alternative I can think of is to roll your own version of sp_MSforeachdb and sp_MSforeach_worker.
Hope this helps.

Pass string variable into procedure and add it to a query

I have an application written in C#, which connects to database and analyze its data, database stores information about execution of automated tests, what I would like to do is to retrieve those tests that fulfill the above given conditions. But we are having different projects and will be supporting more and more so I do not want to construct different procedure for each one, but pass the name - 2nd parameter deploy as the parameter so the query will depend on the project and return the data to the application, then I will send it in a report.
For the time being it looks like this:
CREATE PROCEDURE [dbo].[SuspectsForFalsePositive](#build_id INT, #deploy VARCHAR(25))
AS
BEGIN
SET NOCOUNT ON;
DECLARE #i int, #build int, #deployname varchar(25), #SQL varchar(max)
DECLARE #result table (tc int, fp float)
SET #i = 0
SET #build = #build_id
SET #deployname = #deploy
SET #SQL = 'insert '+#result+'select testcase_id, fail_percentage FROM [BuildTestResults].[dbo].['+#deployname+'TestCaseExecution]
where build_id = #build and fail_percentage >= 70'
--INSERT #result select testcase_id, fail_percentage FROM [BuildTestResults]
--.[dbo].[ABCTestCaseExecution]
--where build_id = #build and fail_percentage >= 70
--commented works
EXEC(#SQL)
WHILE (##rowcount = 0)
BEGIN
SET #build = #build - 1
EXEC(#SQL)
--INSERT #result select testcase_id, fail_percentage FROM [BuildTestResults].[dbo]. --[ABCTestCaseExecution]
--where build_id = #build and fail_percentage >= 70
--commented works
END
select * from #result order by fp DESC
END
GO
Thanks for any advice !
In your string you have #build - this is interpreted as a string. At the time you execute the #SQL it doesn't contain such a variable, so you get a failure.
You need to concatenate the value directly:
SET #SQL = 'insert '+#result+'select testcase_id, fail_percentage FROM [BuildTestResults].[dbo].['+#deployname+'TestCaseExecution]
where build_id = '+#build+' and fail_percentage >= 70'
You will need to do that between executions too.
There are a few issues with your example. This is, however, one over-arching consideration.
Variables (table and/or scalar) are only visible in the StoredProcedure they are defined in. And calling EXEC(#SQL) is calling a stored. This means that neither your #result table, not your other parameters are visible to the dynamic SQL you are executing.
In terms of the table, you can get around that by creating a temp table instead. And for the scalar variables, you can pass them around when using SP_EXECUTESQL instead of EXEC.
I don't have access to sql server at present, but maybe somethign like this can start you on your way...
CREATE PROCEDURE [dbo].[SuspectsForFalsePositive](#build_id INT, #deploy VARCHAR(25))
AS
BEGIN
SET NOCOUNT ON;
DECLARE
#i int,
#build int,
#deployname varchar(25),
#SQL varchar(max)
CREATE TABLE #result (
tc int,
fp float
)
SELECT
#i = 0,
#build = #build_id,
#deployname = #deploy
SET #sql = ''
SET #sql = #sql + ' INSERT INTO #result'
SET #sql = #sql + ' SELECT testcase_id, fail_percentage'
SET #sql = #sql + ' FROM [BuildTestResults].[dbo].['+#deployname+'TestCaseExecution]'
SET #sql = #sql + ' WHERE build_id = #build and fail_percentage >= 70'
SP_EXECUTESQL
#SQL,
'#build INT',
#build
WHILE (##rowcount = 0)
BEGIN
SET #build = #build - 1
SP_EXECUTESQL
#SQL,
'#build INT',
#build
END
SELECT * FROM #result ORDER BY fp DESC
END
GO
I also occures to me that ##rowcount may now see the rows being processes within SP_EXECUTESQL. In which case you may need to re-arrange things a little (using an output parameter, or embedding the loop in the #SQL, etc).
Overall it feels a bit clunky. With more information about your schema, etc, it may be possible ot avoid the dynamic SQL. This will have several benefits, but one in particular:
- Right now you're open to SQL Injection Attacks on the #deploy parameter
Anyone that can execute this SP, and/or control the value in the #deploy parameter could wreak havok in your database.
For example... Could you store all the TestCaseExecutions in the same table? But with an extra field: TestCaseID *(Or even TestCaseName)?
Then you wouldn't need to build dynamic SQL to control which data set you are processing. Instead you just add WHERE TestCaseID = #TestCaseID to your query...

How can I spot in what database is a stored procedure with name 'myStoredProcedure'?

There are bunch of databases to the SQL server I am connected.
How should I query the sysobjects in order to spot in what database a stored procedure with name 'myStoredProcedure' is located ?
The query should return the database name.
Thanks
I know you are not asking for this, but I'd really download RedGate's Sql Search add-in for SSMS and use that. It allows you to find any object (proc, table, view, column, etc) on any database easily.
And it's free!
I'd give this a try:
CREATE TABLE ##DatabaseList
(
DatabaseName varchar(50)
)
EXECUTE SP_MSForEachDB 'USE [?]; INSERT INTO ##DatabaseList SELECT DB_NAME() FROM [sys].[objects] WHERE name = "MyStoredProcedure" AND type_desc = "SQL_STORED_PROCEDURE"'
SELECT * FROM ##DatabaseList
DROP TABLE ##DatabaseList
That's using the undocumented/ unsupported system stored procedure SP_MSForEachDb and writing any hits to a global temp table, then outputting the contents to the Results window before dropping the table. If you just need to know which database (or databases - there may of course be more than one) has an appropriately named SP, this should do it. If you want to use the output elsewhere as a parameter, it may take a little more work.
By the way, I'm only learning this stuff myself over the last few months so if anyone can critique the above and suggest a better way to go at it I'm happy to receive feedback. Equally, I can answer any further questions posted here to the best of my ability.
Cheers
So out of curiosity I decided to try write this myself, especially since ADG mentioned his solution was using an unsupported, undocumented procedure. This could also be expanded to take a 2nd parameter so where it checks the type = P (stored Proc) you could probably change it to look for other things like views / tables etc.
My solution is a bit long but here goes:
CREATE PROCEDURE spFindProceduresInDatabases
(
#ProcedureName NVARCHAR(99)
)
AS
BEGIN
-- Get all the database names and put them into a table
DECLARE #Db TABLE (DatabaseName Varchar(99))
INSERT INTO #Db SELECT name FROM Sys.databases
-- Declare a table to hold our results
DECLARE #results TABLE (DatabaseName VARCHAR(99))
-- Make a Loop
-- Declare a variable to be incremented
DECLARE #count INT
SET #count = 0
-- Declare the end condition
DECLARE #endCount INT
SELECT #endCount = COUNT(*) FROM #Db
-- Loop through the databases
WHILE (#count < #endCount )
BEGIN
-- Get the database we are going to look into
DECLARE #dbWeAreChecking VARCHAR(99)
SELECT TOP 1 #dbWeAreChecking = DatabaseName FROM #Db
DELETE FROM #Db WHERE DatabaseName = #dbWeAreChecking
-- Create and execute our query
DECLARE #Query NVARCHAR(3000)
SET #Query = N'SELECT #outParam = COUNT(*) FROM '+#dbWeAreChecking+'.sys.sysobjects WHERE type = ''P'' and name = #ProcedureName'
Declare #outParam INT
print (#Query)
DECLARE #ParmDefinition NVARCHAR(500)
DECLARE #IntVariable INT
SET #ParmDefinition = N'#ProcedureName VARCHAR(99),#outParam INT OUTPUT'
SET #IntVariable = 35
EXECUTE sp_executesql
#Query ,
#ParmDefinition,
#ProcedureName,
#outParam = #outParam OUTPUT
-- If we have a result insert it into the results table
If (#outParam > 0)
BEGIN
INSERT INTO #results(DatabaseName) VALUES(#dbWeAreChecking)
END
-- Increment the counter
SET #count = (#count + 1)
END
-- SELECT ALL OF THE THINGS!!!
SELECT * FROM #results
END

SQL Server Multi-Step Stored Procedure

I've got a software suite that is based off of multiple libraries where:
1 library = 1 SQL Database.
Different users can have different access to different libraries.
In addition, the databases are named in a specific manner to help identify which are "mine" and which aren't.
I'd like to create a stored procedure that takes a variable called #UserName and returns the databases that have a name starting with MYDB, where #UserName is found in a table USERS.
I'm figuring that I'll start with EXEC sp_databases, but I'm unsure how to continue.
What I need to know is:
How do I iterate the results of sp_databases to pull out just the databases that have a name matching my pattern?
How do I then check for #UserName in the [USER NAME] column of the USERS table of each database returned from #1?
I'm guessing it has something to do with temp tables and cursors, but I'm not really sure where to start.
Any help?
Thanks!
Here is some proof of concept code to show you an approach. sys.databases contains a more accessible list of databases. You'll pretty much have to use dynamic sql at some point though.
CREATE PROCEDURE MyDBs #userName VARCHAR(255)
AS
BEGIN
DECLARE #max INT
DECLARE #i INT
DECLARE #sql VARCHAR(500)
CREATE TABLE #SQL
(
rid int identity primary key clustered,
query varchar(500)
)
INSERT INTO #SQL(query)
SELECT 'SELECT * FROM ['+ name '+].USERS WHERE username = #UserName'
FROM master.sys.databases
WHERE NAME LIKE '%yourpattern%'
SELECT #max = ##rowcount, #i = 1
WHILE #i <= #max
BEGIN
SELECT #sql = query FROM #sql WHERE rid = #i
EXEC #sql
SET #i = #i + 1
END
DROP TABLE #SQL
For 1, just look at the sp_databases code, copy it and modify it to your needs. For Example (see last 2 conditions of where clause. This is the actual code of the sp_databases stored proc. You can look at it on the master db):
declare #UserName varchar(50)='someuser'
select
DATABASE_NAME = db_name(s_mf.database_id),
DATABASE_SIZE = convert(int,
case -- more than 2TB(maxint) worth of pages (by 8K each) can not fit an int...
when convert(bigint, sum(s_mf.size)) >= 268435456
then null
else sum(s_mf.size)*8 -- Convert from 8192 byte pages to Kb
end),
REMARKS = convert(varchar(254),null)
from
sys.master_files s_mf
where
s_mf.state = 0 and -- ONLINE
has_dbaccess(db_name(s_mf.database_id)) = 1 and
--db_name(s_mf.database_id) like '%'+#UserName+'%' and exists -- you may or may not want to leave this condition here. You'll figure out what condition to use
(select 1 from databasename.dbo.Users where [UserName]=#UserName)
group by s_mf.database_id
order by 1