Query fast without search, slow with search, but with search fast in SSMS - asp.net-core

I have this function that takes data from the database and also has search. The problem is that when I search with Entity framework it's slow, but if I use the same query I got from the log and use it in SSMS it's fast. I must also say that there are allot of movies, 388262. I also tried adding an index on title at movie, but didn't help.
Query I use in SSMS:
SELECT *
FROM Movie
WHERE title LIKE '%pirate%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
Entity code (_movieRepository.GetAll() returns Queryable not all movies):
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetAll().AsNoTracking();
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.ToLower().Contains(vm.Search.ToLower()));
}
vm.TotalItemCount = query.Count();
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize);
vm.PageSize = _pageSize;
return View(vm);
}

Caveat: I don't have much experience with the Entity framework.
However, you might find useful debugging tips available in the Entity Framework Performance Article from Simple talk. Looking at what you've posted you might be able to improve your query performance by:
Choosing only the specific column you're interested in (it sounds like you're only interested in querying for the 'Title' column).
Pay special attention to your data-types. You might want to convert your NVARCHAR variables to VARCHAR(40) (or some appropriate character limit)

try removing all of the ToLower() stuff,
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search)));
}
sql server (unlike c#) is not case sensitive by default (though you can configure it to be that way). Your query is forcing sql server to lower case every record in the table and then do the comparison.

Related

How to limit SQL query results for any query from code

I need to write a function in my app, that gets as input SQL query and limits the results to 10 rows.
It needs to work with as many edge cases as possible.
What I had in mind is (Pseudo code):
func String limitQuery(String query) {
if (query.endsWith(';'))
// TODO remove trailing semicolon
}
query = query + " LIMIT 10;";
return query;
}
There are many edge cases that this code is missing (like what if the query already has LIMIT at the end).
I am looking for suggestions how to do it the right way and cover as many edges cases as possible that will also work with as many types of relational databases possible, and for very complicated queries.

Parameterizing 'limit' and 'order' in sqlite3

I have a sqlite query that I'm looking into parameterization to avoid bad sql injection things on the internet...
So things like:
Select * From myTable Where id = $id
are fine if I have $id defined somewhere and pass that as a parameter to my db calls.
paramters.$id = 150;
db.all(myQuery, parameters, function (err, rows) {
results = rows;
});
I wonder if I need to go out of my way to also parameterize things that are sorted and paginated (both are inputs that users can give)...
I tried to do something like:
var sorter = JSON.parse(value);
parameters.$sortMethod = sorter.method;
parameters.$sortOrder = sorter.order;
sort_filter += 'ORDER BY $sortMethod $sortOrder';
No dice though. I'm guessing sqlite3 just doesn't let you parameterize things that are in ORDER, LIMIT and OFFSET. I thought there was something really sneaky maybe folks out there could do by ending a sqlite statement prematurely in the order and then creating a new malicious statement, but maybe SQLITE3 only lets you exercise one statement at a time (http://www.qtcentre.org/threads/54748-Execute-multiple-sql-command-in-SQLITE3)
Should I not worry about parameterizing things in order limit and offset? For reference, I'm running this on node.js with this sqlite library: https://github.com/mapbox/node-sqlite3
Thanks much in advance!
SQLite (and any other database) allows you to parameterize expressions, that is, any numbers, strings, blobs, or NULL values that appear in a statement.
This includes the values in the LIMIT/OFFSET clauses.
Anything else cannot be parameterized.
This would be table and column names, operators, or any other keyword (like SELECT, ORDER BY, or ASC).
If you need to change any parts of your SQL statements that are not expressions, you have to create the statement on the fly.
(There is no danger of SQL injection as long as your code constructs the statement by itself, not using any unchecked user data.)

Query very fast but mapping slow with dapper

I'm using dapper for a new project and love it but I don't understand why my queries are really slow. The execution time is very fast, almost instant, but the connection stays open much longer while dapper is mapping the result to my object I guess.
Here is an example in glimpse :
This query is just a SELECT on something like 15 fields with a where on the primary key, so it's really fast to execute and it doesn't return that much of data.
My code to execute it is :
using (var conn = GetConnection())
{
obj = conn.Get<T>(id);
}
And the object is a very basic poco with Strings and Ints.
So why do I waste 220 ms doing this while the query execution itself takes 3 ms ? Where is the difference ?
Thanks for your help !
UPDATE
There was one field that was causing problems for me in the selection part of my SQL statement. I just went by removing each field one by one and then found the one which was causing the problem.
I had to cast one of my fields to nvarchar like this:
CAST(my_field AS nvarchar(max)) as my_field
ORIGINAL ANSWER
It has to do something with the mapping. Because if I change it from "Strongly Typed" (which is taking for ever, almost 1 minute):
var products = connection.Query<Product>(sql).ToList();
to "Anonymous":
var products = connection.Query(sql).ToList();
then it executes really fast (1 second).
I tried and executed the SQL statement directly in "SQL Server Management Studio" as a query and it finishes in less then 1 second.
So my suggestion is, that you use the "anonymous mapping" until dapper guys fix this if they will be able to.
I had a similar experience with Dapper as I was trying to project from a View to an POCO object.
The problem ended up being for me that I did not have a column for each property on my object, so the Convert.ChangeType() was very slow, I added a column to my View that would always return NULL, and the Query<T>() call sped up dramatically.
In my example, the database had an indexed column of type VARCHAR(10). I was attempting to filter via dapper parameter, like so:
DbConnection con = ...
string filterParam = "test";
var results = con.Query("SELECT IndexColumn, Column1, ... FROM MyTable WHERE IndexColumn = #filterParam", new { filterParam });
The issue was dapper (or possibly ADO.Net) converting my filterParam to NVARCHAR(MAX) data type. Sql Server then casts IndexColumn to NVARCHAR, and was doing a full table scan rather than indexed lookup. Code was fixed by casting the parameter before comparison:
var results = con.Query("SELECT IndexColumn, Column1, ... FROM MyTable WHERE IndexColumn = CAST(#filterParam AS VARCHAR(10))", new { filterParam });
In my case the poor performance seems to have been caused by the fact I was using an asterisk rather than a list of fields when doing the SELECT (i.e. SELECT * instead of SELECT Foo, Bar, Baz, ...).

CakePHP query additions in controller

I am migrating raw PHP code to CakePHP and have some problems. As I have big problems with query to ORM transformation I temporary use raw SQL. All is going nice, but I met the ugly code and don't really know how to make it beautiful. I made DealersController and added function advanced($condition = null) (it will be called from AJAX with parameters 1-15 and 69). function looks like:
switch ($condition) {
case '1':
$cond_query = ' AND ( (d.email = \'\' OR d.email IS NULL) )';
break;
case '2':
$cond_query = ' AND (d.id IN (SELECT dealer_id FROM dealer_logo)';
break;
// There are many cases, some long, some like these two
}
if($user_group == 'group_1') {
$query = 'LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} elseif ($user_group == 'group_2'){
$query = 'A LITLE BIT DIFFERENT LONG QUERY WITH 6+ TABLES JOINING' . $cond_query;
} else {
$query = 'A LITLE MORE BIT DIFFERENT LONG QUERY WITH 10+ TABLES JOINING' . $cond_query;
}
// THERE IS $this->Dealer->query($query); and so on
So.. As you see code looks ugly. I have two variants:
1) get out query addition and make model methods for every condition, then these conditions seperate to functions. But this is not DRY, because main 3 big queries is almost the same and if I will need to change something in one - I will need to change 16+ queries.
2) Make small reusable model methods/queries whitch will get out of DB small pieces of data, then don't use raw SQL but play with methods. It would be good, but the performance will be low and I need it as high as possible.
Please give me advice. Thank you!
If you're concerned about how CakePHP makes a database query for every joined table, you might find that the Linkable behaviour can help you reduce the number of queries (where the joins are simple associations on the one table).
Otherwise, I find that creating simple database querying methods at the Model level to get your smaller pieces of information, and then combining them afterwards, is a good approach. It allows you to clearly outline what your code does (through inline documentation). If you can migrate to using CakePHP's find methods instead of raw queries, you will be using the conditions array syntax. So one way you could approach your problem is to have public functions on your Model classes which append their appropriate conditions to an inputted conditions array. For example:
class SomeModel extends AppModel {
...
public function addEmailCondition(&$conditions) {
$conditions['OR'] = array(
'alias.email_address' => null,
'alias.email_address =' => ''
);
}
}
You would call these functions to build up one large conditions array which you can then use to retrieve the data you want from your controller (or from the model if you want to contain it all at the model layer). Note that in the above example, the conditions array is being passed by reference, so it can be edited in place. Also note that any existing 'OR' conditions in the array will be overwritten by this function: your real solution would have to be smarter in terms of merging your new conditions with any existing ones.
Don't worry about 'hypothetical' performance issues - if you've tried to queries and they're too slow, then you can worry about how to increase performance. But for starters, try to write the code as cleanly as possible.
You also might want to consider splitting up that function advanced() call into multiple Controller Actions that are grouped by the similarity of their condition query.
Finally, in case you haven't already checked it out, here's the Book's entry on retrieving data from models. There might be some tricks you hadn't seen before: http://book.cakephp.org/view/1017/Retrieving-Your-Data
If the base part of the query is the same, you could have a function to generate that part of the query, and then use other small functions to append the different where conditions, etc.

How bad is my query?

Ok I need to build a query based on some user input to filter the results.
The query basically goes something like this:
SELECT * FROM my_table ORDER BY ordering_fld;
There are four text boxes in which users can choose to filter the data, meaning I'd have to dynamically build a "WHERE" clause into it for the first filter used and then "AND" clauses for each subsequent filter entered.
Because I'm too lazy to do this, I've just made every filter an "AND" clause and put a "WHERE 1" clause in the query by default.
So now I have:
SELECT * FROM my_table WHERE 1 {AND filters} ORDER BY ordering_fld;
So my question is, have I done something that will adversely affect the performance of my query or buggered anything else up in any way I should be remotely worried about?
MySQL will optimize your 1 away.
I just ran this query on my test database:
EXPLAIN EXTENDED
SELECT *
FROM t_source
WHERE 1 AND id < 100
and it gave me the following description:
select `test`.`t_source`.`id` AS `id`,`test`.`t_source`.`value` AS `value`,`test`.`t_source`.`val` AS `val`,`test`.`t_source`.`nid` AS `nid` from `test`.`t_source` where (`test`.`t_source`.`id` < 100)
As you can see, no 1 at all.
The documentation on WHERE clause optimization in MySQL mentions this:
Constant folding:
(a<b AND b=c) AND a=5
-> b>5 AND b=c AND a=5
Constant condition removal (needed because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6)
-> B=5 OR B=6
Note 5 = 5 and 5 = 6 parts in the example above.
You can EXPLAIN your query:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
and see if it does anything differently, which I doubt. I would use 1=1, just so it is more clear.
You might want to add LIMIT 1000 or something, when no parameters are used and the table gets large, will you really want to return everything?
WHERE 1 is a constant, deterministic expression which will be "optimized out" by any decent DB engine.
If there is a good way in your chosen language to avoid building SQL yourself, use that instead. I like Python and Django, and the Django ORM makes it very easy to filter results based on user input.
If you are committed to building the SQL yourself, be sure to sanitize user inputs against SQL injection, and try to encapsulate SQL building in a separate module from your filter logic.
Also, query performance should not be your concern until it becomes a problem, which it probably won't until you have thousands or millions of rows. And when it does come time to optimize, adding a few indexes on columns used for WHERE and JOIN goes a long way.
TO improve performance, use column indexes on fields listen in "WHERE"
Standard SQL Injection Disclaimers here...
One thing you could do, to avoid SQL injection since you know it's only four parameters is use a stored procedure where you pass values for the fields or NULL. I am not sure of mySQL stored proc syntax, but the query would boil down to
SELECT *
FROM my_table
WHERE Field1 = ISNULL(#Field1, Field1)
AND Field2 = ISNULL(#Field2, Field2)
...
ORDRE BY ordering_fld
We've been doing something similiar not too long ago and there're a few things that we observed:
Setting up the indexes on the columns we were (possibly) filtering, improved performance
The WHERE 1 part can be left out completely if the filters're not used. (not sure if it applies to your case) Doesn't make a difference, but 'feels' right.
SQL injection shouldn't be forgotten
Also, if you 'only' have 4 filters, you could build up a stored procedure and pass in null values and check for them. (just like n8wrl suggested in the meantime)
That will work - some considerations:
About dynamically built SQL in general, some databases (Oracle at least) will cache execution plans for queries, so if you end up running the same query many times it won't have to completely start over from scratch. If you use dynamically built SQL, you are creating a different query each time so to the database it will look like 100 different queries instead of 100 runs of the same query.
You'd probably just need to measure the performance to find out if it works well enough for you.
Do you need all the columns? Explicitly specifying them is probably better than using * anyways because:
You can visually see what columns are being returned
If you add or remove columns to the table later, they won't change your interface
Not bad, i didn't know this snippet to get rid of the 'is it the first filter 3' question.
Tho you should be ashamed of your code ( ^^ ), it doesn't do anything to performance as any DB Engine will optimize it.
The only reason I've used WHERE 1 = 1 is for dynamic SQL; it's a hack to make appending WHERE clauses easier by using AND .... It is not something I would include in my SQL otherwise - it does nothing to affect the query overall because it always evaluates as being true and does not hit the table(s) involved so there aren't any index lookups or table scans based on it.
I can't speak to how MySQL handles optional criteria, but I know that using the following:
WHERE (#param IS NULL OR t.column = #param)
...is the typical way of handling optional parameters. COALESCE and ISNULL are not ideal because the query is still utilizing indexes (or worse, table scans) based on a sentinel value. The example I provided won't hit the table unless a value has been provided.
That said, my experience with Oracle (9i, 10g) has shown that it doesn't handle [ WHERE (#param IS NULL OR t.column = #param) ] very well. I saw a huge performance gain by converting the SQL to be dynamic, and used CONTEXT variables to determine what to add. My impression of SQL Server 2005 is that these are handled better.
I have usually done something like this:
for(int i=0; i<numConditions; i++) {
sql += (i == 0 ? "WHERE " : "AND ");
sql += dbFieldNames[i] + " = " + safeVariableValues[i];
}
Makes the generated query a little cleaner.
One alternative i sometimes use is to build the where clause an an array and then join them together:
my #wherefields;
foreach $c (#conditionfields) {
push #wherefields, "$c = ?",
}
my $sql = "select * from table";
if(#wherefields) { $sql.=" WHERE " . join (" AND ", #wherefields); }
The above is written in perl, but most languages have some kind of join funciton.