Stored Procedure vs Direct Query in Excel - sql

I have an excel file that will select roughly 1100 rows with 5 columns of data. Most columns are 5 digits long and are integers. I am using a macro to connect to a SQL server database and insert these rows into one maybe two tables. This is all its doing and then it closes the connection. So the user opens an excel file that has the rows, clicks a button and it executes the macro.
My question is, should the query be written in Excel since its simple and merely inserts the data into a few tables. Or is it more efficient calling a stored procedure and passing all of the values in the stored procedure and have it allocate where the values go in the different tables. When I mean efficient, i mean which is the quickest? I know this will probably take a few seconds to complete. I just feel going to a stored procedure is an extra point along the path that the data has to get to before it reaches the tables. Am I wrong? Any thoughts?

There are some advantages to using stored procedures in SQL Server. One is that SQL Server precompiles and saves the query execution plan, which increases performance. With your current method, SQL Server will generally need to generate the execution plan each time. Stored procedures can also reduce client/server network traffic.
So, even though it may seem like an extra point along the path, it actually can be faster.

In addition to #mark d.'s answer, another reason for using a stored procedure is security.
Your comment says that a customer is entering the data into Excel, so if you are putting direct SQL into your spreadsheet, then there is a risk that someone will open your spreadsheet and find out information about your database. But if you use a stored procedure then there is far less that can be learned.
Either way, make sure that you aren't hardcoding any connection string/account credentials into the spreadsheet.

Related

Executing T-sql query from hard drive

I have T-SQL queries stored on a hard drive: I:\queries\query1.sql and I:\queries\query2.sql.
I usually work in a way that I execute a query from a drive, and then I copy results into Excel, and then I work on it.
My problem here is that query1.sql is already long, and now I would like to extend it by getting a result of query2.sql, and join it with a result of query1.sql.
What I could do is appending a code from query2.sql to query1.sql. But then the query is getting really long and hard to maintain.
I would like to do something like this:
SELECT * FROM ("Result of I:\queries\query1.sql") q1
LEFT JOIN ("Result of I:\queries\query2.sql") q2 ON q1.ID=q2.ID
Is there any way to write a query or stored procedure, which will be again stored on a drive to do this?
Basically, you need to ask your DBA for a database when you are able to store things in the database. This can be on the same system where the data is stored. Or, it could be on a linked system. Gosh, you could run SQL Server locally and store the information and data there.
Then, the queries that you are storing in files should be views in the database. You can then run the queries and store and combine the results locally.
You are essentially recreating database functionality using text files and data files -- going through a lot of effort when SQL Server already supports this functionality.
To expand on Gordon's comment (+1), why are you running scripts off of a drive? Most DBA's I've known would treaten bodily harm over this as executing code that they can't control / troubleshoot / see source code control on brings a whole host of security and supportability issues.
Far better to store this code in a Stored Procedure, which will have a saved query execution plan, can be tracked using various DMV's, and have permissions assigned to it, then your outside Excel doc can just set a connection and execute the SP.

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

Stored Procedure vs direct SQL command in SSIS data flow source

I'm providing maintenance support for some SSIS packages. The packages have some data flow sources with complex embedded SQL scripts that need to be modified from time to time. I'm thinking about moving those SQL scripts into stored procedures and call them from SSIS, so that they are easier to modify, test, and deploy. I'm just wondering if there is any negative impact for the new approach. Can anyone give me a hint?
Yes there are issues with using stored procs as data sources (not in using them in Execute SQL tasks though in the control flow)
You might want to read this:
http://www.jasonstrate.com/2011/01/31-days-of-ssis-no-more-procedures-2031/
Basically the problem is that SSIS cannot always figure out the result set and thus the columns from a stored proc. I personally have run into this if you write a stored proc that uses a temp table.
I don't know that I would go as far as the author of the article and not use procs at all, but be careful that you are not trying to do too much with them and if you have to do something complicated, do it in an execute sql task before the dataflow.
I can honestly see nothing but improvements. Stored procedures will offer better security, the possibility for better performance due to cached execution plans, and easier maintenance, like you pointed out.
Refactor away!
You will not face issues using only simple stored procedures as data source. If procedure is using temp tables and CTE - there is no guarantee you will not face issues. Even when you can preview results in design time - you may get errors in a run time.
My experience has been that trying to get a sproc to function as a data source is just not worth the headache. Maybe some simple sprocs are fine, and in some cases TVFs will work well instead, but if you need to do some complex operations there's no alternative to a sproc.
The best workaround I've found is to create an output table for each sproc you need to use in SSIS.
Modify the sproc to truncate the new output table at start, and to write its output to this instead of (or in addition to) ending with a SELECT statement.
Call the sproc with an Exec SQL task before your data flow.
Have your data flow read from the output table - a much simpler task.
If you want to save space, truncate the output table again with another Exec SQL. I prefer to leave it, as it lets me examine the data later and lets me rerun the output data flow if it fails without calling the sproc again.
This is certainly less elegant than reading directly from a sproc's output, but it works. FWIW, this pattern follows the philosophy (obligate in Oracle) that a sproc should not try to be a parameterized view.
Of course, all this assumes that you have privs to adjust the sproc in question. If necessary, you could write a new wrapper sproc which truncates the output table, then calls the old sproc and redirects its output to the new table.

(SQL Server 2005) Good way to manage a large number of insert statements?

I have a stored procedure that takes a table name and writes out a series of INSERT statements, one for each row in the table. It's being used to provide sample, "real world" data for our test environment.
It works well but some of these sample rowsets are 10, 20k records. The stored proc writes them out using the PRINT statement and it's hard to copy that many lines and paste them into the management studio to run them. Is there a SQL redirect feature I might be able to use, perhaps to write this output to a table and a way to loop through and run each statement that way? Just a thought.
I'd like to do this all from within the management studio and not have to write a C# program to create a dataset and loop over it, etc. I'm essentially looking for suggestions on a good approach. Thanks very much.
Use EXEC:
PRINT #INSERT_statement
EXEC #INSERT_statement
...to run the query.
But I'd recommend looking at bulk insertion to make the data load faster:
BULK INSERT
Where is your stored procedure getting this data?
You may want to look into importing it as a table and then running your stored procedure against that inserted table. SQL Server Management studio has many options for importing data.
If your stored proc is generating the data - then that's a whole other issue.

SQL Timeouts and SSIS

I've an SSIS package that runs a stored proc for exporting to an excel file. Everything worked like a champ until I needed to a do a bit of rewriting on the stored proc. The proc now takes about 1 minute to run and the exported columns are different, so my problems are the following;
1) SSIS complains when I hit the preview button "No column information returned by command"
2) It times out after about 30 seconds.
What I've done.
Tried to clean up/optimize the query. That helped a bit, but it still is doing some major calculations and it runs just fine in SSMS.
Changed the timeout values to 90 seconds. Didn't seem to help. Maybe someone here can?
Thanks,
Found this little tidbit which helped immensely.
No Column Names
Basically all you need to do is add the following to your SQL query text in SSIS.
SET FMTONLY OFF
SET NOCOUNT ON
Only problem now is it runs slow as molasses :-(
EDIT: It's running just too damn slow.
Changed from using #tempTable to tempTable. Adding in appropriate drop statements. argh...
Although it appears you may have answered part of your own question, you are probably getting the "No column information returned by command" error because the table doesn't exist at the time it tries to validate the metadata. Creating the tables as non-temporary tables resolves this issue.
If you insist on using temporary tables, you can create the temporary tables in the step preceeding the data flow. You would need to create it as a ## table and turn off connection sharing for the connection for this to work, but it is an alternative to creating permanent tables.
A shot in the dark based on something obscure I hit years ago: When you modified the procedure, did you add a call to a second procedure? This might mess up SSIS's ability to determine the returned data set.
As for (2), does the procedure take 30+ or 90+ seconds to run in SSMS? If not, do you know that the query is actually getting into SQL from SSIS? Might be worth firing up SQL Profiler to see what's actually being sent to SQL Server. [Which was the way I found out my obscure factoid.]