Stored procedures vs. parameter binding - sql

I am using SQL server and ODBC in visual c++ for writing to the database. Currently i am using parameter binding in SQL queries ( as i fill the database with only 5 - 6 queries and same is true for retrieving data). I dont know much about stored procedures and I am wondering how much if any performance increase stored procedures have over parameter binding as in parameter binding we prepare the query only once and just execute it later in the program for diferent set of values of variables.

Stored procedures should be more performant for a few reasons:
Less network traffic - the query is on the DB and you just send a small command to the DB with params vs sending the entire query every time
The query is pre compiled on the server and can be cached as well by the DB
Another advantage is that you can alter the query on the DB without having to recompile the code. This is an additional layer of abstraction that I find very useful.

Related

Generic SQL that both Access and ODBC/Oracle can understand

I have a MS Access query that is based on a linked ODBC table (Oracle).
I'm troubleshooting the poor performance of the query here: Access not properly translating TOP predicate to ODBC/Oracle SQL.
SELECT ri.*
FROM user1_road_insp AS ri
WHERE ri.insp_id = (
select
top 1 ri2.insp_id
from
user1_road_insp ri2
where
ri2.road_id = ri.road_id
and year(insp_date) between [Enter a START year:] and [Enter a END year:]
order by
ri2.insp_date desc,
ri2.length desc,
ri2.insp_id
);
The documentation says:
When you spot a problem, you can try to resolve it by changing the local query. This is often difficult to do successfully, but you may
be able to add criteria that are sent to the server, reducing the
number of rows retrieved for local processing.
In many cases you will find that, despite your best efforts, Office Access still retrieves some entire tables unnecessarily and
performs final query processing locally.
However, it's occurred to me that I don't really understand what sort of SQL I should be writing to make both Access and ODBC/Oracle happy.
Should I be writing some sort of generic SQL that Access can understand in a local query AND that can be easily translated to ODBC/Oracle SQL? Is generic SQL a real thing?
What kind of SQL does the ODBC driver use? It depends as typically MS Access has three types of external data connections that interfaces with different SQL dialects each with the ODBC API.
Linked tables that acts like local tables but are ODBC connected data sources and not stored locally. Once they are incorporated in an Access app, these tables can only use MS Access' SQL dialect. They can be joined with local or even other backend tables from other sources.
Hence, why TOP is available in MS Access and not Oracle. You are essentially using Access SQL to manipulate Oracle data. ODBC serves as the origin point of data while Access' Jet/ACE SQL engine does the processing and resultset viewing in cached memory.
Pass-through queries that do not see local tables or anything else in local app's environment. Such queries use the SQL dialect of the connected database here being Oracle.
Hence, why TOP is NOT available in Oracle and double quotes are allowed in column identifiers. Such quoting would fail in MS Access. Essentially, you are using Oracle SQL to manipulate Oracle data in an Access app. You can take the output of the sqlout.txt log and run it in a pass-through query ODBC-connected to your Oracle database.
ADO/DAO Recordsets that are run entirely via code such as VBA and are direct connections to data sources and uses the connecting database's dialect.
Here, you using Oracle SQL to manipulate Oracle data in an Access app via the ODBC API.
In each one of these types, you will have to connect to a backend ODBC data source. You do not even need to use the GUI but can use Access' object library to create linked tables (see DoCmd.TransferDatabase) and pass through querydefs (see QueryDef.Connect or .Execute).
I suspect the sqlout.txt log you see are translations of the ODBC calls to its native dialect.
To build on #Parfait's point #1:
From Microsoft Access Developer's Guide to SQL Server by Mary Chipman and Andy Baron:
Optimizing Access Queries:
There's a common misconception that the Jet engine always retrieves all the data in linked SQL Server tables and then processes the data locally. This is not usually true. Jet is perfectly capable of sending efficient queries to SQL Server over ODBC and retrieving only the rows required. However, in some cases, Jet will in fact be forced to fetch all the data in certain tables first and then process it. You should be aware of when you are forcing Jet to do this and be sure that it is justified. The following are some general guidelines to follow when creating your Access queries:
Using expressions that can't be evaluated by the server will cause Jet to retrieve all the data required to evaluate those expressions locally. The impact of using Access-specific expressions, such as domain aggregate functions, Access financial functions, or custom VBA functions will vary depending on where in your query the expressions are used. Using such an expression in the SELECT clause will usually not cause a problem because no extra data will be returned. However, if the expression is in the WHERE clause, that criterion cannot be applied on the server, and all the data evaluated by the expression will have to be returned.
With multiple criteria, as many as possible will be processed on the server. This means that even if you use criteria that you know include functions that will need to be processed by Jet, adding other criteria that can be handled by the server will reduce the number of records that Jet has to process. Adding criteria on indexed columns is especially helpful.
Query syntax that includes an Access-specific extension to SQL, not supported by the ODBC driver, may force processing to be done on the client by Access. For example, even though SELECT TOP 5 PERCENT is now supported by SQL Server, it is not supported by the ODBC driver. If you use that syntax in an Access query, Jet will need to retrieve all the records and calculate which ones are in the top 5 percent. On the other hand, even though crosstab queries are specific to Access, Jet will translate them into simple GROUP BY queries and fetch just the required data in one trip to the server unless problematic criteria is used.
Heterogeneous joins between local and remote tables or between remote tables that are in different data sources will, of course, have to be processed by Jet after the source data is retrieved. However, if the remote join field is indexed and the table is large, Jet will often use the index to retrieve only the required rows by making multiple calls to the remote table, one fore each row required.
Jet allows you to mix data types within [typo - fix later] of UNION queries and within expressions, but SQL server doesn't. Such mixing of data types will force processing to be done locally.
Multiple outer joins in one query will be processed locally.
The most important factor is reducing the total number of records being fetched. Jet will retrieve multiple batches of records in the background until the result set is complete, so even though you may seem to get results back immediately, a continuing load is being placed on the server for large result sets.
Note: this book is quite old (published in 2000) and is in reference to Jet Engine. I imagine things might be slightly different in newer versions of Access which use ACE, although I don't have a source to back this up.

sql temp table join between servers

So I have a summary i need to return to the end user application.
It should accept 3 parameters DateType, StartDate, EndDate.
Date Type will determine the date field I use to filter the data.
The way i accomplished this was putting all the IDs of the records for a datetype into a TEMP table and then joining my summary to the list of IDs.
This worked fine when running on the query on the SQL server that houses the data.
However, that is a replicated server, so when I compiled to a stored proc that would be on the server with the rest of the application data, it slowed the query down. IE 2 seconds vs 50 seconds.
I think the cross join from the temp table that is created on the SQL server then joining to the tables on the replciation server, is causing the slow down.
Are there any methods or techniques that I can use to get around this and build this all in one stored procedure?
If I create 3 stored procedures with their own date range, then they are fast again. However, this means maintaining multiple stored procs for the same thing.
First off, if you are running a version of SQL Server older than 2012 SP1, one problem is that users who aren't allowed to run DBCC SHOW_STATISTICS (which is most users who aren't sysadmins, see the "Permissions" section in the documentation) don't get access to statistics on remote tables. This can severely cripple the optimizer's ability to generate a good execution plan. Upgrading SQL Server or granting more permissions can help there.
If your query involves filtering or joining on a character column, make sure the remote server is flagged in the linked server options as "collation compatible". If this option is off, SQL Server can't assume strings can be compared across the servers and it will start pumping entire tables up and down just to make sure the data ends up where the comparison has to be made.
If the execution plan is as good as it gets and it's still not good enough, one general (lame) technique is to transfer all data locally first (SELECT * INTO #localtable FROM remote.db.schema.table), then run the query as a non-distributed query. Obviously, in order for this to work, the remote table cannot be "too big" and in some cases this actually has worse performance, depending on how many rows are involved. But it's always worth considering, because the optimizer does a better job with local tables.
Another approach that avoids pulling tables together across servers is packing up data in parameters to remote stored procedure calls. Entire tables can be passed as XML through an NVARCHAR(MAX), since neither XML columns nor table-valued parameters are supported in distributed queries. The basic idea is the same: avoid the need for the the optimizer to figure out an efficient distributed query. The best approach greatly depends on your data and your query, obviously.

Best practice: sending a stored procedure for "SQL Command from Variable" in OLE DB Source?

In a SSIS ETL, I have a query that I need to run on a server/db that does not allow us to create stored procedures.
I would normally use the stored procedure in my variable as the source for my OLE DB source:
However, since we can't put the stored procedure on this server, I was going to store the code for the stored procedure into a variable by executing a SQL statement, retrieving the text from our home database, then use the text stored in this variable as the SQL command for the source:
This way, I can still remotely change the SSIS OLE DB Source object WHERE clause (as long as I don't change the SELECT portion).
I can't imagine that this is very common, so I wanted to get some opinions - is there a better way to do this? I don't want to put all of the code for this SP into the OLE DB Source editor directly because we can't afford to redeploy in case of a WHERE clause update.
You've got the part down that many folks don't do and that's using Variables to drive your package execution. You are further correct in that you can't exactly swap out your columns. To be pedantic, which I am, you can completely change out the query as long as the same metadata is presented.
So, then this question becomes how best to accomplish allowing a package to have a query's filter driven by an external force. Factoring in maintainability, ease of debugging, etc.
My gut reaction is 3 Variables
QueryBase: String. Hardcoded. SELECT * FROM MyTable except of course I'd enumerate my columns
Query: String. EvaluateAsExpression = True Expression: #[User::QueryBase] + #[User::QueryFilter]
QueryFilter: String
So, we use Query in the OLE DB Source much as you have your longer variable name in there. The only downside to this approach, pre SSIS-2012 is the limitation on string length in an expression. It was ... 4k I believe. If you assign a value of 5k characters, it's fine. It's just in the expression language, adding two strings together can't exceed 4k.
I didn't specify what QueryFilter is going to have in it or the magic to get it there. That, I would base on the bigger picture of your environment, usage, etc. but the general concept is that it will eventually turn into WHERE Condition1 IS NOT NULL but maybe in a full reload situation, it becomes an empty string.
So, what are our options for changing the value of QueryFilter
/SET is an optional parameter passed to the invoking process (dtexec.exe) that makes SSIS packages go. If you have a very limited set of choices and aren't interested in building additional infrastructure out to support the parameters, just hard code some examples. Approximately dtexec /file p1.dtsx /set \Package.Variables[User::QueryFilter].Properties[Value];" WHERE Condition1 IS NOT NULL" Save it into .bat files, different sql agent jobs, whatever. Click and run and you're done.
Configuration approach. SSIS offers native ability to use configurations from a SQL Server table, XML, Registry, Parent Package and Environment Variable for 2005 to current edition. The only downside to this approach is that it would not support concurrent execution with different parameters like the first would.
Environment approach. 2012 and 2014, with their new Project Deployment Model, give us the concept of Environments within the SSISDB catalog which is similar to configuration with a SQL Server table but it is done after development is complete and the packages are deployed. It's rather nice as it builds out a history of values used so if someone asks why is the data all wrong, you can write a query to pull back the parameters used and Oh look someone used the initial load filter instead of the daily. Whoopsidaisy. Same concern over concurrent execution and changing values.
Table driven approach. Instead of using the Configuration with SQL Server table backing, you roll your own table and then add into your package an Execute SQL Task to retrieve the filter, Single Row, into our QueryFilter Variable.
Script Task. Use whatever floats your boat to determine what the filter should be.
Message Queue. They have built in a Message Queue Task and might be of use here if you're already doing it. Otherwise, too much effort to manage

Stored Procedure vs Direct Query in Excel

I have an excel file that will select roughly 1100 rows with 5 columns of data. Most columns are 5 digits long and are integers. I am using a macro to connect to a SQL server database and insert these rows into one maybe two tables. This is all its doing and then it closes the connection. So the user opens an excel file that has the rows, clicks a button and it executes the macro.
My question is, should the query be written in Excel since its simple and merely inserts the data into a few tables. Or is it more efficient calling a stored procedure and passing all of the values in the stored procedure and have it allocate where the values go in the different tables. When I mean efficient, i mean which is the quickest? I know this will probably take a few seconds to complete. I just feel going to a stored procedure is an extra point along the path that the data has to get to before it reaches the tables. Am I wrong? Any thoughts?
There are some advantages to using stored procedures in SQL Server. One is that SQL Server precompiles and saves the query execution plan, which increases performance. With your current method, SQL Server will generally need to generate the execution plan each time. Stored procedures can also reduce client/server network traffic.
So, even though it may seem like an extra point along the path, it actually can be faster.
In addition to #mark d.'s answer, another reason for using a stored procedure is security.
Your comment says that a customer is entering the data into Excel, so if you are putting direct SQL into your spreadsheet, then there is a risk that someone will open your spreadsheet and find out information about your database. But if you use a stored procedure then there is far less that can be learned.
Either way, make sure that you aren't hardcoding any connection string/account credentials into the spreadsheet.

Stored Procedure vs direct SQL command in SSIS data flow source

I'm providing maintenance support for some SSIS packages. The packages have some data flow sources with complex embedded SQL scripts that need to be modified from time to time. I'm thinking about moving those SQL scripts into stored procedures and call them from SSIS, so that they are easier to modify, test, and deploy. I'm just wondering if there is any negative impact for the new approach. Can anyone give me a hint?
Yes there are issues with using stored procs as data sources (not in using them in Execute SQL tasks though in the control flow)
You might want to read this:
http://www.jasonstrate.com/2011/01/31-days-of-ssis-no-more-procedures-2031/
Basically the problem is that SSIS cannot always figure out the result set and thus the columns from a stored proc. I personally have run into this if you write a stored proc that uses a temp table.
I don't know that I would go as far as the author of the article and not use procs at all, but be careful that you are not trying to do too much with them and if you have to do something complicated, do it in an execute sql task before the dataflow.
I can honestly see nothing but improvements. Stored procedures will offer better security, the possibility for better performance due to cached execution plans, and easier maintenance, like you pointed out.
Refactor away!
You will not face issues using only simple stored procedures as data source. If procedure is using temp tables and CTE - there is no guarantee you will not face issues. Even when you can preview results in design time - you may get errors in a run time.
My experience has been that trying to get a sproc to function as a data source is just not worth the headache. Maybe some simple sprocs are fine, and in some cases TVFs will work well instead, but if you need to do some complex operations there's no alternative to a sproc.
The best workaround I've found is to create an output table for each sproc you need to use in SSIS.
Modify the sproc to truncate the new output table at start, and to write its output to this instead of (or in addition to) ending with a SELECT statement.
Call the sproc with an Exec SQL task before your data flow.
Have your data flow read from the output table - a much simpler task.
If you want to save space, truncate the output table again with another Exec SQL. I prefer to leave it, as it lets me examine the data later and lets me rerun the output data flow if it fails without calling the sproc again.
This is certainly less elegant than reading directly from a sproc's output, but it works. FWIW, this pattern follows the philosophy (obligate in Oracle) that a sproc should not try to be a parameterized view.
Of course, all this assumes that you have privs to adjust the sproc in question. If necessary, you could write a new wrapper sproc which truncates the output table, then calls the old sproc and redirects its output to the new table.