Linq to SQL Performance using contains - vb.net

I'm overloading a vb.net search procedure which queries a SQL database.
One of the older methods i'm using as a comparison uses a Stored Procedure to perform the search and return the query.
My new method uses linq.
I'm slightly concerned about the performance when using contains queries with linq. I'm looking at equally comparable queries using both methods.
Basically having 1 where clause to
Here are some profiler results;
Where name = "ber10rrt1"
Linq query : 24reads
Stored query : 111reads
Where name = "%ber10%"
Linq query : 53174reads
Stored proc query : 23386reads
Forgetting for a moment, the indexes (not my database)... The fact of the matter is that both methods are fundamentally performing the same query (albeit the stored procedure does reference a view for [some] of the tables).
Is this consitent with other peoples experiance of linq to sql?
Also, interestingly enough;
Using like "BER10%"
resultset.Where(Function(c) c.ci.Name.StartsWith(name))
Results in the storedproc using 13125reads and linq using 8172reads

I'm not sure there is enough there for a complete analysis... I'm assuming we are talking about string.Contains/string.StartsWith here (not List<T>.Contains).
If the generated TSQL is similar, then the results should be comparable. There are a few caveats to this - for example, is the query column a calculated+persisted value? If so, the SET options must be exact matches for it to be usable "as is" (otherwise it has to re-calculate per row).
So: what is the TSQL from the SP and LINQ? Are they directly comparable?
You mention a VIEW - I'm guessing this could make a big difference if (for example) it filters out data (either via a WHERE or an INNER JOIN).
Also - LIKE clauses starting % are rarely a good idea - not least, it can't make effective use of any index. You might have better performance using "full text search"; but this isn't directly available via LINQ, so you'll have to wrap it in an SP and expose the SP via the LINQ data-context (just drag the SP into the designer).
My money is on the VIEW (and the other code in the SP) being the main difference here.

Related

Rails - find_by_sql vs active record querying

I would like to use the find_by_sql method in place of the active record query interface because of the flexibility I get in writing my own SQL.
For example,
It is considered a good practice in SQL to order the tables in your query from smallest to largest. When using the active record query interface, you will have to start with the result model which could be the largest table.
I can also easily avoid the N+1 problem by including the target table in the join itself and including the required columns from multiple tables.
I would like to know if there are reasons I should not be using the find_by_sql option, especially when I will be writing ANSI SQL that should be compatible with all, if not most databases.
Thanks!
Writing SQL code directly is normally not encouraged because it will prevent you from accessing features such as the lazy-execution of SQL queries and may potentially lead to invalid or unsafe queries.
In fact, ActiveRecord automatically converts Ruby objects into corresponding SQL statements (such as ranges into BETWEEN or arrays into IN), filters strings from SQL injections and ActiveRecord::Relations provide lazy query executions.
That said, if you know what you are doing or using ActiveRecord would be extremely complex to achieve a specific query, there is nothing wrong to write your SQL.
My suggestion is to avoid doing that for queries that can easily be reproduced in ActiveRecord and AREL.

Sql Join on User Defined Function: how to optimize

I'm trying to optimize a query in a database. That query is similar to the following:
select * from Account
inner join udf_Account('user') udfAccount
on Account.Id = udfAccount.AccountId
Actually the real query is much longer but the most important point is that it contains a few inner join on user defined functions (udf) which depend on the user id. (So this is constant parameter which do not change during the query evaluation).
Due to a large amount of data, my query takes approximatively 20 seconds on a production database which is not acceptable.
I have already seen that by storing the results of the functions in temporary tables and using these tables in the query reduces a lot the duration of the query.
I'm asking the following questions:
Can I avoid the temporary tables. Isn't it a way to tell sql that the function can be evaluated only once ? Using temporary tables would imply some important changes in my code this is why I would be happy if I had another solution.
Are there any other ways to optimize my query ?
In SQL Server, if your functions are Inline rather than Multi-Statement, SQL Server explands tham (macro-like) into your queries. It's just like they become sub-queries in your main query.
This notionally allows the optimiser to make a 'better' execution plan.
For example; Provided that the fields you are joining on are directly derived from their source tables, this should make indexes on those fields available.
Without looking at the whole query and your individual functions, it appears that you're already in a good place with regards to your syntax. The next place to look is at the indexes that exist, and aim for index-seeks rather than table-scans or index-scans.
(That's all a bit simplistic, but it's a good start for query optimisation, which is an immense topic.)
Another option is to look at using CROSS APPLY with your inline table valued functions.
(Available in SQL Server 2005 onwards)
This allows the values from tables in your queries to be used as parameters to your functions. Again, provided that the functions are inline, SQL Server expands the function inline when building the execution plan.
An example could be...
SELECT
Account.AccountID,
subAccount.AccountID AS SubAccountID,
Balance.currentAvailable AS SubAccountBalance
FROM
Account
CROSS APPLY
dbo.getSubAccounts('User', Account.AccountID) AS SubAccount
CROSS APPLY
dbo.getCurrentBalance(SubAccount.AccountID) AS Balance
WHERE
Account.AccountID = 1234
I believe you want to define what mysql calls a "deterministic" function. Depending on your flavor of SQL this will have different syntax. But ultimately the biggest optimisation would be to not use a function at all, but simply add an account column to the user table.

SQL Dynamic query for searching

I am working on a problem that I'm certain someone has seen before, but all I found across the net was how not to do it.
Fake table example and dynamic searching.
(Due to my low rating I cannot post images. I know I should be ashamed!!)
Clicking the add button automatically creates another row for adding more criteria choices.
(Note: My table is most definitely more complex)
Now to my issue, I thought I knew how to handle the SQL for this task, but I really don't. The only examples of what I should do are not meant for this sort of dynamic table querying. The examples didn't have the ability to create as as many search filters as a user pleases (or perhaps my understanding was lacking).
Please let me know if my uploaded image is not of good enough quality or if I have not given enough information.
I'm really curious about the best practice for this situation. Thank you in advance.
I had a similar question. You can use dynamic sql with the sp_executesql stored proc where you actually build your select statement as a string and pass it in.
Or you might be able to write a stored proc kinda like the one I created where you have all of the conditions in the where clause but the NULL values are ignored.
Here's the stored proc I came up with for my scenario:
How do I avoid dynamic SQL when using an undetermined number of parameters?
The advantage with the parameterized stored proc I wrote is that I'm able to avoid the SQL injection risks associated with dynamic SQL.
Two main choices:
Linq to Sql allows you to compose a query, add to it, add to it again, and it won't actually compile and execute a SQL statement until you iterate the results.
Or you can use dynamic SQL. The trick to making this easy is the "WHERE (1=1)" technique, but you do have to be careful to use parameters (to avoid SQL injection attacks) and build your sql statements carefully.
The original post:
Write a sql for searching with multiple conditions
select * from thetable
where (#name='' or [name]=#name) and (#age=0 or age=#age)
However, the above query forces table scan. For better performance and more complex scenario (I guess you simplified the question in you original post), consider use dynamic sql. By the way, Linq to SQL can help you build dynamic SQL very easily, like the following:
IQueryable<Person> persons = db.Persons;
if (!string.IsNullOrEmpty(name)) persons = persons.Where(p=>p.Name==name);
if (age != 0) persons = persons.Where(p=>p.Age=age);
Check out SqlBuilder, a utility for Dynamic SQL.

What do we consider as a dynamic sql statement?

a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
thanx
A dynamic SQL statement is a statement that is built at execution time. The emphasis lies on statement. So, it isn't dynamic SQL if you just supply a value at execution time.
Dynamic SQL statements generally refers those that are constructed using string concatenation.
"SELECT name FROM names WHERE id=" + this.id;
"SELECT name FROM names WHERE id=" + this.id + " AND age=" this.age;
Parameterized queries are also dynamic but not in terms of construct. You can only change parameters but you can't change the structure of the statement i.e add WHERE clauses.
Parameterized queries are often at the database level so the database can cache the execution plan of the query and use it over and over. Not quite possible in the first case since a simple change in the text or the order of the where clauses can cause the database to not recognize the previously cached execution plan and start over.
The first construct is also vulnerable to SQL injection since it is hard to validate input values for attempts to inject rogue SQL.
Certainly anything involving EXEC (#sql) or EXEC sp_ExecuteSQL #sql, ... (i.e. dynamic at the database itself) would qualify, but I guess you could argue that any SQL generated at runtime (rather than fixed at build / install) would qualify.
So yes, you could argue that a runtime-generated, yet correctly parameterized query is "dynamic" (for example, LINQ-to-SQL generated queries), but to be honest as long as it doesn't expose you to injection attacks I don't care about the name ;-p
Dynamic sql is basically just any sql that is not fully constructed until runtime. Its generated on-the-fly by concatenating runtime values into a statement. Could be any part of an sql statement
A. Anything that will cause the DB server to evaluate strings as SQL.
B. No, as they still go through the DB driver/provider and get cleaned up.
For point b) You already know the statement and you pass in known parameters (hopefully type safe and not string literals).
I consider a dynamic SQL statement to be one that accepts new values at runtime in order to return a different result. "New values", by my reckoning, can be a different ORDER BY, new WHERE criteria, different field selections, etc.
a) What do we consider as a dynamic sql statement?
Any sql statement that dynamically adds a clause(s) or even just a part of a clause to a SQL string?
Both - any query altered/tailored prior to execution.
b)Aren’t then parameterized strings that use placeholders for dynamically supplied values also considered dynamic sql statements?
Parameterized queries, AKA using bind variables, are supplying different filter criteria to the query. You can use (my_variable IS NULL OR ...), but OR performance is generally terrible on any db & the approach destroys sargability.
Dynamic SQL generally deals with tailoring the query to include other logic, like JOINs that only need to be included if a specific parameter is set. However, there are limitations like the IN clause not supporting converting a comma delimited string as it's list of options - for this you would have to use dynamic SQL, or handle the comma delimited list in another fashion (CLR, pipelined into a temp table, etc).
I see where you're going with this, but one somewhat objective criteria which defines a particular situation as Dynamic SQL vs. say a prepared statement is...
...the fact that dynamic statements cause the SQL server to fully evaluate the query, to define a query plan etc.
With prepared statements, SQL can (and does unless explicitly asked) cache the query plan (and in some cases, even gather statistics about the returns etc.).
[Edit: effectively, even dynamic SQL statements are cached, but such cached plans have a much smaller chance of being reused because the exact same query would need to be received anew for this to happen (unlike with parametrized queries where the plan is reused even with distinct parameter values and of course, unless "WITH RECOMPILE")]
So in the cases from the question, both a) and b) would be considered dynamic SQL, because, in the case of b), the substitution takes place outside of SQL. This is not a prepared statement. SQL will see it as a totally novel statement, as it doesn't know that you are merely changing the search values.
Now... beyond the SQL-centric definition of dynamic SQL, it may be useful to distinguish between various forms of dynamic SQL, such as say your a) and b) cases. Dynamic SQL has had a bad rep for some time, some of it related to SQL injection awareness. I've seen applications where dynamic SQL is produced, and executed, within a Stored Procedure, and because it is technically "SQL-side" some people tended to accept it (even though SQL-injecting in particular may not have been addressed)...
The point I'm trying to make here is that building a query, dynamically, from various contextual elements is often needed, and one may be better advised to implement this in "application" layers (well, call it application-side layers, for indeed, it can and should be separate from application per-se), where the programming language and associated data structures are typically easier and more expressive than say T-SQL and such. So jamming this into SQL for the sake of calling it "Data-side" isn't a good thing, in my opinion.

How to create dynamic and safe queries

A "static" query is one that remains the same at all times. For example, the "Tags" button on Stackoverflow, or the "7 days" button on Digg. In short, they always map to a specific database query, so you can create them at design time.
But I am trying to figure out how to do "dynamic" queries where the user basically dictates how the database query will be created at runtime. For example, on Stackoverflow, you can combine tags and filter the posts in ways you choose. That's a dynamic query albeit a very simple one since what you can combine is within the world of tags. A more complicated example is if you could combine tags and users.
First of all, when you have a dynamic query, it sounds like you can no longer use the substitution api to avoid sql injection since the query elements will depend on what the user decided to include in the query. I can't see how else to build this query other than using string append.
Secondly, the query could potentially span multiple tables. For example, if SO allows users to filter based on Users and Tags, and these probably live in two different tables, building the query gets a bit more complicated than just appending columns and WHERE clauses.
How do I go about implementing something like this?
The first rule is that users are allowed to specify values in SQL expressions, but not SQL syntax. All query syntax should be literally specified by your code, not user input. The values that the user specifies can be provided to the SQL as query parameters. This is the most effective way to limit the risk of SQL injection.
Many applications need to "build" SQL queries through code, because as you point out, some expressions, table joins, order by criteria, and so on depend on the user's choices. When you build a SQL query piece by piece, it's sometimes difficult to ensure that the result is valid SQL syntax.
I worked on a PHP class called Zend_Db_Select that provides an API to help with this. If you like PHP, you could look at that code for ideas. It doesn't handle any query imaginable, but it does a lot.
Some other PHP database frameworks have similar solutions.
Though not a general solution, here are some steps that you can take to mitigate the dynamic yet safe query issue.
Criteria in which a column value belongs in a set of values whose cardinality is arbitrary does not need to be dynamic. Consider using either the instr function or the use of a special filtering table in which you join against. This approach can be easily extended to multiple columns as long as the number of columns is known. Filtering on users and tags could easily be handled with this approach.
When the number of columns in the filtering criteria is arbitrary yet small, consider using different static queries for each possibility.
Only when the number of columns in the filtering criteria is arbitrary and potentially large should you consider using dynamic queries. In which case...
To be safe from SQL injection, either build or obtain a library that defends against that attack. Though more difficult, this is not an impossible task. This is mostly about escaping SQL string delimiters in the values to filter for.
To be safe from expensive queries, consider using views that are specially crafted for this purpose and some up front logic to limit how those views will get invoked. This is the most challenging in terms of developer time and effort.
If you were using python to access your database, I would suggest you use the Django model system. There are many similar apis both for python and for other languages (notably in ruby on rails). I am saving so much time by avoiding the need to talk directly to the database with SQL.
From the example link:
#Model definition
class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()
def __unicode__(self):
return self.name
Model usage (this is effectively an insert statement)
from mysite.blog.models import Blog
b = Blog(name='Beatles Blog', tagline='All the latest Beatles news.')
b.save()
The queries get much more complex - you pass around a query object and you can add filters / sort elements to it. When you finally are ready to use the query, Django creates an SQL statment that reflects all the ways you adjusted the query object. I think that it is very cute.
Other advantages of this abstraction
Your models can be created as database tables with foreign keys and constraints by Django
Many databases are supported (Postgresql, Mysql, sql lite, etc)
DJango analyses your templates and creates an automatic admin site out of them.
Well the options have to map to something.
A SQL query string CONCAT isn't a problem if you still use parameters for the options.