Best practice between these two queries

Best practice between these two queries - sql

I was in a user group meeting yesterday and they pointed out that using parameterized queries is better than harcoding the query. That got me to thinking, does this do anything beneficial(obviously on a much bigger scale than this though):
DECLARE #Client1 UNIQUEIDENTIFIER,
#Client2 UNIQUEIDENTIFIER
SET #ClientId1 ='41234532-2342-3456-3456-123434543212';
SET #ClientId2 = '12323454-3432-3234-5334-265456787654';
SELECT ClientName
FROM dbo.tblclient
WHERE id IN (#Client1,#Client2)
As opposed to:
SELECT ClientName
FROM dbo.tblclient
WHERE id IN ('41234532-2342-3456-3456-123434543212','12323454-3432-3234-5334-265456787654')

Parametrized queries and IN clause are actually not trivially implemented together if your IN list changes from time to time.
Read this SO question and answers: Parameterize an SQL IN clause
Parameters, by design, are one value only. Everything else other than that must be manually implemented, having in mind security issues, such as SQL Injection.
From a performance perspective, you will have better performance for parametrized queries, specially if a same query is ran repeatedly, but with different parameters values. However, if you have a dynamic IN list (sometimes 2 items, sometimes 3), you might not get the advantage of using parametrized queries.
Do not lose hope, though. Some folks have been able to implement that (Parametrized queries and IN clause). It's, again, not trivial, though.

On huge databases and complex queries with many joins the database can use time building an execute plan. When using parameterized queries the execute plan stays in the database cache for some time when calling the query multiple times with different parameters

It shouldn't hurt, but you're going to get the most effect from prepared statements when you use queries that are generated by user input. If they're clicking a button to "show all", it's not a big deal; however, if you're prompting for a user to enter their name, you seriously need to parameterize the input before inserting/updating/selecting/etc.
For example, if I entered my name as "Mike DROP TABLE MASTER);" or whatever a big table name is in your DB, It could get really ugly for you. Better safe than sorry, right?
EDIT: OP commented here and asked a question. Updated with a code example.
public int myNum;
SqlParameter spNum=new SqlParameter("#myNum", SqlDbType.Int);
//you can also check for null here (but not really relevant in this case)
command.Parameters.Add(spNum);
string sql="INSERT INTO Table(myNum)";
sql+=" VALUES(#myNum)";
command.CommandText = sql;
int resultsCt = command.ExecuteNonQuery();
See how the code is forcing the input to be an integer BEFORE it does any work with the database? That way if anybody tries any shenanigans it's rejected before it can do harm to the DB.

Related

Performance in these two SQL scenarios

I had assumed the first call below was more efficient, because it only makes this check once to see if #myInputParameter is less than 5000.
If the check fails, I avoid a query altogether. However, I've seen other people code like the second example, saying it's just as efficient, if not more.
Can anyone tell me which is quicker? It seems like the second one would be much slower, especially if the call is combing-through a large data set.
First call:
IF (#myInputParameter < 5000)
BEGIN
SELECT
#myCount = COUNT(1)
FROM myTable
WHERE someColumn=#someInputParameter
AND someOtherColumn='Hello'
--and so on
END
Second call:
SELECT
#myCount = COUNT(1)
FROM myTable
WHERE someColumn=#someInputParameter
AND someOtherColumn='Hello'
AND #myInputParameter < 5000
--and so on
Edit: I'm using SQL Server 2008 R2, but I'm really asking to get a feel for which query is "best practice" for SQL. I'm sure the difference in query-time between these two statements is a thousandth of a second, so it's not THAT critical. I'm just interested in writing better SQL code in general. Thanks

Sometimes, SQL Server is clever enough to transform the latter into the former. This manifests itself as a "startup predicate" on some plan operator like a filter or a loop-join. This causes the query to evaluate very quickly in small, constant time. I just tested this, btw.
You can't rely on this for all queries but once having verified that it works for a particular query through testing I'd rely on it.
If you use OPTION (recompile) it becomes even more reliable because this option inlines the parameter values into the query plan causing the entire query to turn into a constant scan.

SQL Server is well designed. It will do literal evaluations without actually scanning the table/indexes if it doesn't need to. So, I would expect them to perform basically identical.
From a practice standpoint, I think one should use the if statement, especially if it wraps multiple statements. But, this really is a matter of preference to me. For me, code that can't be executed would logically be faster than code that "should" execute without actually hitting the data.
Also, there is the possibility that SQL Server should create a bad plan and actually do a hit the data. I've never seen this specific scenario with literals, but I've had bad execution plans be created.

is "where (ParamID = #ParamID) OR (#ParamID = -1)" a good practice in sql selection

i used to write sql statments like
select * from teacher where (TeacherID = #TeacherID) OR (#TeacherID = -1)
read more
and pass #TeacherID value = -1 to select all teachers
now i'm worry about the performance
can you tell me is that a good practice or bad one?
many thanks

If TeacherID is indexed and you are passing a value other than -1 as TeacherID to search for details of a specific teacher then this query will end up doing a full table scan rather than the potentially far more efficient option of seeking into the index to retrieve the details of the specific teacher...
... Unless you are on SQL 2008 SP1 CU5 and later and use the OPTION (RECOMPILE) hint. See Dynamic Search Conditions in T-SQL for the definitive article on the topic.

We use this in a very limited fashion in stored procedures.
The problem is that the database engine isn't able to keep a good query plan for it. When dealing with a lot of data this can have a serious negative performance impact.
However, for smaller data sets (I'd say less than 1000 records, but that's a guess) it should be fine. You'll have to test in your particular environment.
If it's in a stored procedure, you might want to include something like a WITH RECOMPILE option so that the plan is regenerated on each execution. This adds (slightly) to the time for each run, but over several runs can actually reduce the average execution time. Also, this allows the database to inspect the actual query and "short circuit" the parts that aren't necessary on each call.
If you are directly creating your SQL and passing it through, then I'd suggest you make the part that builds your sql a little smarter so that it only includes the part of the where clause you actually need.
Another path you might consider is using UNION ALL queries as opposed to optional parameters. For example:
SELECT * FROM Teacher WHERE (TeacherId = #TeacherID)
UNION ALL
SELECT * FROM Teacher WHERE (#TeacherId = -1)
This actually accomplishes the exact same thing; however, the query plan is cacheable. We've used this method in a few places as well and saw performance improvements over using WITH RECOMPILE. We don't do this everywhere because some of our queries are extremely complicated and I'd rather have a performance hit than to complicate them further.
Ultimately though, you need to do a lot of testing.
There is a second part here that you should reconsider. SELECT *. It is ALWAYS preferable to actually name the columns you want returned and to make sure that you are only returning the ones you will actually need. Moving data across network boundaries is very expensive and you can generally get a fair amount of performance boost simply by specifying exactly what you want. In addition if what you need is very limited you can sometimes do covering indexes so that the database engine doesn't even have to touch the underlying tables to get the data you want.

If you're really worried about performance, you could break up your procedure to call on two different procs: one for all records, and one based on the parameter.
If #TeacherID = -1
exec proc_Get_All_Teachers
else
exec proc_Get_Teacher_By_TeacherID #TeacherID
Each one can be optimized individually.
It's your system, compare the performance. Consider optimizing on the most popular choice. If most users are going to select a single record, why hider their preformance just to accomodate the few that selct all teachers (And should have a reasonable expectation of performance.).
I know a single select query is easier to maintain, but at some point ease of maintenance eventually gives way to performance.

Keeping dynamic out of SQL while using specifications with stored procedures

A specification essentially is a text string representing a "where" clause created by an end user.
I have stored procedures that copy a set of related tables and records to other places. The operation is always the same, but dependent on some crazy user requirements like "products that are frozen and blue and on sale on Tuesday".
What if we fed the user specification (or string parameter) to a scalar function that returned true/false which executed the specification as dynamic SQL or just exec (#variable).
It could tell us whether those records exist. We could add the result of the function to our copy products where clause.
It would keep us from recompiling the copy script each time our where clauses changed. Plus it would isolate the product selection in to a single function.
Anyone ever do anything like this or have examples? What bad things could come of it?
EDIT:
This is the specification I simply added to the end of each insert/select statement:
and exists (
select null as nothing
from SameTableAsOutsideTable inside
where inside.ID = outside.id and -- Join operations to outside table
inside.page in (6, 7) and -- Criteria 1
inside.dept in (7, 6, 2, 4) -- Criteria 2
)
It would be great to feed a parameter into a function that produces records based on the user criteria, so all that above could be something like:
and dbo.UserCriteria( #page="6,7", #dept="7,6,2,4")

Dynamic Search Conditions in T-SQL
When optimizing SQL the important thing is optimizing the access path to data (ie. index usage). This trumps code reuse, maintainability, nice formatting and just about every other development perk you can think of. This is because a bad access path will cause the query to perform hundreds of times slower than it should. The article linked sums up very well all the options you have, and your envisioned function is nowhere on the radar. Your options will gravitate around dynamic SQL or very complicated static queries. I'm afraid there is no free lunch on this topic.

It doesn't sound like a very good idea to me. Even supposing that you had proper defensive coding to avoid SQL injection attacks it's not going to really buy you anything. The code still needs to be "compiled" each time.
Also, it's pretty much always a bad idea to let users create free-form WHERE clauses. Users are pretty good at finding new and innovative ways to bring a server to a grinding halt.
If you or your users or someone else in the business can't come up with some concrete search requirements then it's likely that someone isn't thinking about it hard enough and doesn't really know what they want. You can have pretty versatile search capabilities without letting the users completely loose on the system. Alternatively, look at some of the BI tools out there and consider creating a data mart where they can do these kinds of ad hoc searches.

How about this:
You create another store procedure (instead of function) and pass the right condition to it.
Based on that condition it dumps the record ids to a temp table.
Next you move procedure will read ids from that table and do the needful things?
Or you could create a user function that returns a table which is nothing but the ids of the records that matches your criteria (dynamic)
If I am totally off, then please clarify me.
Hope this helps.

If you are forced to use dynamic queries and you don't have any solid and predefined search requirements, it is strongly recommended to use sp_executesql instead of EXEC . It provides parametrized queries to prevent SQL Injection attacks (to some extent) and It makes use of execution plans to speed up performance. (More info)

SQL Dynamic query for searching

I am working on a problem that I'm certain someone has seen before, but all I found across the net was how not to do it.
Fake table example and dynamic searching.
(Due to my low rating I cannot post images. I know I should be ashamed!!)
Clicking the add button automatically creates another row for adding more criteria choices.
(Note: My table is most definitely more complex)
Now to my issue, I thought I knew how to handle the SQL for this task, but I really don't. The only examples of what I should do are not meant for this sort of dynamic table querying. The examples didn't have the ability to create as as many search filters as a user pleases (or perhaps my understanding was lacking).
Please let me know if my uploaded image is not of good enough quality or if I have not given enough information.
I'm really curious about the best practice for this situation. Thank you in advance.

I had a similar question. You can use dynamic sql with the sp_executesql stored proc where you actually build your select statement as a string and pass it in.
Or you might be able to write a stored proc kinda like the one I created where you have all of the conditions in the where clause but the NULL values are ignored.
Here's the stored proc I came up with for my scenario:
How do I avoid dynamic SQL when using an undetermined number of parameters?
The advantage with the parameterized stored proc I wrote is that I'm able to avoid the SQL injection risks associated with dynamic SQL.

Two main choices:
Linq to Sql allows you to compose a query, add to it, add to it again, and it won't actually compile and execute a SQL statement until you iterate the results.
Or you can use dynamic SQL. The trick to making this easy is the "WHERE (1=1)" technique, but you do have to be careful to use parameters (to avoid SQL injection attacks) and build your sql statements carefully.

The original post:
Write a sql for searching with multiple conditions
select * from thetable
where (#name='' or [name]=#name) and (#age=0 or age=#age)
However, the above query forces table scan. For better performance and more complex scenario (I guess you simplified the question in you original post), consider use dynamic sql. By the way, Linq to SQL can help you build dynamic SQL very easily, like the following:
IQueryable<Person> persons = db.Persons;
if (!string.IsNullOrEmpty(name)) persons = persons.Where(p=>p.Name==name);
if (age != 0) persons = persons.Where(p=>p.Age=age);

Check out SqlBuilder, a utility for Dynamic SQL.

SQL Query theory question - single-statement vs multi-statement queries

When I write SQL queries, I find myself often thinking that "there's no way to do this with a single query". When that happens I often turn to stored procedures or multi-statement table-valued functions that use temp tables (of one sort or another) and end up simply combining the results and returning the result table.
I'm wondering if anyone knows, simply as a matter of theory, whether it should be possible to write ANY query that returns a single result set as a single query (not multiple statements). Obviously, I'm ignoring relevant points such as code readability and maintainability, maybe even query performance/efficiency. This is more about theory - can it be done... and don't worry, I certainly don't plan to start forcing myself to write a single-statement query when multi-statement will better suit my purpose in all cases, but it might make me think twice or a little bit longer on whether there is a viable way to get the result from a single query.
I guess a few parameters are in order - I'm thinking of a relational database (such as MS SQL) with tables that follow common best practices (such as all tables having a primary key and so forth).
Note: in order to win 'Accepted Answer' on this, you'll need to provide a definitive proof (reference to web material or something similar.)

I believe it is possible. I've worked with very difficult queries, very long queries, and often, it is possible to do it with a single query. But most of the time, it's harder to mantain, so if you do it with a single query, make sure you comment your query carefully.
I've never encountered something that could not be done in a single query.
But sometimes it's best to do it in more than one query.

At least with the a recent version of Oracle is absolutely possible. It has a 'model clause' which makes sql turing complete. ( http://blog.schauderhaft.de/2009/06/18/building-a-turing-engine-in-oracle-sql-using-the-model-clause/ ). Of course this is all with the usual limitation that we don't really have unlimited time and memory.
For a normal sql dialect without these abdominations I don't think it is possible.
A task that I can't see how to implement in 'normal sql' would be:
Assume a table with a single column of type integer
For every row
'take the value at the current row and go that many rows back, fetch that value, go that many rows back, and continue until you fetch the same value twice consecutively and return that as the result.'

I can't prove it, but I believe the answer is a cautious yes - provided your database design is done properly. Usually being forced to write multiple statements to get a certain result is a sign that your schema may need some improvements.

I'd say "yes" but can't prove it. However, my main thought process:
Any select should be a set based operation
Your assumption is that you are dealing with mathematically correct sets (ie normalised correctly)
Set theory should guarantee it's possible
Other thoughts:
Multiple SELECT statement often load temp tables/table variables. These can be derived or separated in CTEs.
Any RBAR processing (for good or bad) now be dealt with CROSS/OUTER APPLY onto derived tables
UDFs would be classed as "cheating" in this context I feel, because it allows you to put a SELECT into another module rather than in your single one
No writes allowed in your "before" sequence of DML: this changes state from SELECT to SELECT
Have you seen some of the code in our shop?
Edit, glossary
RBAR = Row By Agonising Row
CTE = Common Table Expression
UDF = User Defined Function
Edit: APPLY: cheating?
SELECT
*
FROM
MyTable1 t1
CROSS APPLY
(
SELECT * FROM MyTable2 t2
WHERE t1.something = t2.something
) t2

In theory yes, if you use functions or a torturous maze of OUTER APPLYs or sub-queries; however, for readability and performance, we have always ended up going with temp tables and multi-statement stored procedures.
As someone above commented, this is usually a sign that your data structure is starting to smell; not that it's bad, but that maybe it's time to denormalise for performance reasons (happens to the best of us), or maybe put a denormalised querying layer in front of your normalised "real" data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best practice between these two queries - sql

On huge databases and complex queries with many joins the database can use time building an execute plan. When using parameterized queries the execute plan stays in the database cache for some time when calling the query multiple times with different parameters

Related

Performance in these two SQL scenarios

is "where (ParamID = #ParamID) OR (#ParamID = -1)" a good practice in sql selection

Keeping dynamic out of SQL while using specifications with stored procedures

SQL Dynamic query for searching

SQL Query theory question - single-statement vs multi-statement queries

Categories

Resources