Query very fast but mapping slow with dapper - sql

I'm using dapper for a new project and love it but I don't understand why my queries are really slow. The execution time is very fast, almost instant, but the connection stays open much longer while dapper is mapping the result to my object I guess.
Here is an example in glimpse :
This query is just a SELECT on something like 15 fields with a where on the primary key, so it's really fast to execute and it doesn't return that much of data.
My code to execute it is :
using (var conn = GetConnection())
{
obj = conn.Get<T>(id);
}
And the object is a very basic poco with Strings and Ints.
So why do I waste 220 ms doing this while the query execution itself takes 3 ms ? Where is the difference ?
Thanks for your help !

UPDATE
There was one field that was causing problems for me in the selection part of my SQL statement. I just went by removing each field one by one and then found the one which was causing the problem.
I had to cast one of my fields to nvarchar like this:
CAST(my_field AS nvarchar(max)) as my_field
ORIGINAL ANSWER
It has to do something with the mapping. Because if I change it from "Strongly Typed" (which is taking for ever, almost 1 minute):
var products = connection.Query<Product>(sql).ToList();
to "Anonymous":
var products = connection.Query(sql).ToList();
then it executes really fast (1 second).
I tried and executed the SQL statement directly in "SQL Server Management Studio" as a query and it finishes in less then 1 second.
So my suggestion is, that you use the "anonymous mapping" until dapper guys fix this if they will be able to.

I had a similar experience with Dapper as I was trying to project from a View to an POCO object.
The problem ended up being for me that I did not have a column for each property on my object, so the Convert.ChangeType() was very slow, I added a column to my View that would always return NULL, and the Query<T>() call sped up dramatically.

In my example, the database had an indexed column of type VARCHAR(10). I was attempting to filter via dapper parameter, like so:
DbConnection con = ...
string filterParam = "test";
var results = con.Query("SELECT IndexColumn, Column1, ... FROM MyTable WHERE IndexColumn = #filterParam", new { filterParam });
The issue was dapper (or possibly ADO.Net) converting my filterParam to NVARCHAR(MAX) data type. Sql Server then casts IndexColumn to NVARCHAR, and was doing a full table scan rather than indexed lookup. Code was fixed by casting the parameter before comparison:
var results = con.Query("SELECT IndexColumn, Column1, ... FROM MyTable WHERE IndexColumn = CAST(#filterParam AS VARCHAR(10))", new { filterParam });

In my case the poor performance seems to have been caused by the fact I was using an asterisk rather than a list of fields when doing the SELECT (i.e. SELECT * instead of SELECT Foo, Bar, Baz, ...).

Related

R function that takes in CL argument and queries SQL database

Brand new to SQL and SQLite here. I'm trying to create a function in R studio that takes in an argument from the command line and queries an SQL database to see if the record exists in one specific column, displaying a message to the user whether the record was found or not (I have a table within this database, lets call it my_table, that has 2 columns, we'll name them column_1 and column_2. column_1 has ID numbers and column_2 has names that are associated with those ID numbers, and there are a total of about 700 rows).
So far I have something that looks like this:
my_function() <- function(CL_record) { db <- dbConnect(SQLite(), dbname = "mysql.db") dbGetQuery(db, sql("SELECT * FROM my_table WHERE column_2 == #CL_record")) }
But this is obviously not the right way to go about it and I keep getting errors thrown regarding invalid (NULL) left side of assignment.
Any help here would be much appreciated.
I recommend using parameterized queries, something like:
my_function <- function(CL_record) {
db <- dbConnect(SQLite(), dbname = "mysql.db")
on.exit(dbDisconnect(db), add = TRUE)
dbGetQuery(db, "SELECT * FROM my_table WHERE column_2 = ?",
params = list(CL_record))
}
The params= argument does not need to be a list(.), it works just fine here as params=CL_record, but if you have two or more, and especially if they are of different classes (strings vs numbers), then you really should list them.
You'll see many suggestions, or even howtos/tutorials that suggest using paste or similar for adding parameters to the query. There are at least two reasons for not using paste or sprintf:
Whether malicious or inadvertent (e.g., typos), sql injection is something that should be actively avoided. Even if your code is not public-facing, accidents happen.
Most (all?) DBMSes have query optimization/compiling. When it receives a query it has not seen before, it parses and attempts to optimize the query; when it sees a repeat query, it can re-use the previous optimization, saving time. When you paste or sprintf an argument into a query, each query is different from the previous, so it is overall less efficient. When using bound parameters, the query itself does not change (just its parameters), so we can take advantage of the compiling/optimization.

Performance issue when using bind variable for a large list inside the IN clause

I'm using Sybase and had some code that looked like this:
String[] ids = ... an array containing 80-90k strings, which is retrieved from another table and varies.
for (String id : ids) {
// wrap every id with single-quotes
}
String idsAsString = String.join(",", ids);
String query = String.format("select * from someTable where idName in (%s)", idsAsString);
getNamedParameterJDBCTemplate().query(query, resultSetExtractor ->{
// do stuff with results
});
I've timed how long it took to get to the inner body of the resultSetExtractor and it never took longer than 4 seconds.
But to secure the code, I tried going the bind variable route. Thus, that code looked like the following:
String[] ids = ... an array containing 80-90k strings, which is retrieved from another table and varies.
String query = "select * from someTable where idName in (:ids)";
Map<String, Object> params = new HashMap<>();
params.put("ids", Arrays.asList(ids));
getNamedParameterJDBCTemplate().query(query, params, resultSetExtractor ->{
// do stuff with results
});
But doing it this way will take up to 4-5 minutes to finally spew out the following exception:
21-10-2019 14:04:01 DEBUG DefaultConnectionTester:126 - Testing a Connection in response to an Exception:
com.sybase.jdbc4.jdbc.SybSQLException: The token datastream length was not correct. This is an internal protocol error.
I also have other bits of code where I pass in arrays of sizes 1-10 as bind variables and noticed that those queries went from being instantaneous to taking up to 10 seconds.
I'm surprised doing it the bind variable way is at all different, let alone that drastically different. Can someone explain what is going on here? Is it that bind variable does something different underneath the hood as opposed to sending a formatted string through JDBC? Is there another way to secure my code without drastically slowing performance?
You should verify what's actually happening at the database end via a showplan/query plan, but using an 'in' clause will at best usually do one index search for every value in the 'in' clause, therefore 10 values does ten searches, 80k searches does 80k of them and thus massively slower. Oracle actually prohibits putting more than 1000 values in an 'in clause and whilst Sybase is not so restrictive that doesn't mean its a good idea. You risk stack and other issues in your database by putting massive amounts of values in this way I've seen such a query take out a production database instance with a stack failure.
It's much better to create a temporary table, load the 80k values into there and do an inner join between the temporary table and the main table using the column which previously you searched with the in clause.

Query fast without search, slow with search, but with search fast in SSMS

I have this function that takes data from the database and also has search. The problem is that when I search with Entity framework it's slow, but if I use the same query I got from the log and use it in SSMS it's fast. I must also say that there are allot of movies, 388262. I also tried adding an index on title at movie, but didn't help.
Query I use in SSMS:
SELECT *
FROM Movie
WHERE title LIKE '%pirate%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
Entity code (_movieRepository.GetAll() returns Queryable not all movies):
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetAll().AsNoTracking();
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.ToLower().Contains(vm.Search.ToLower()));
}
vm.TotalItemCount = query.Count();
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize);
vm.PageSize = _pageSize;
return View(vm);
}
Caveat: I don't have much experience with the Entity framework.
However, you might find useful debugging tips available in the Entity Framework Performance Article from Simple talk. Looking at what you've posted you might be able to improve your query performance by:
Choosing only the specific column you're interested in (it sounds like you're only interested in querying for the 'Title' column).
Pay special attention to your data-types. You might want to convert your NVARCHAR variables to VARCHAR(40) (or some appropriate character limit)
try removing all of the ToLower() stuff,
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search)));
}
sql server (unlike c#) is not case sensitive by default (though you can configure it to be that way). Your query is forcing sql server to lower case every record in the table and then do the comparison.

Using SilverStripe's SQLQuery object without selecting * fields

Looking at the documentation, there are fairly good instructions on how to build up a SQL query.
My code looks like this:
$sqlQuery = new SQLQuery();
$sqlQuery->setFrom('PropertyPage')->selectField('PropertyType')->setDistinct(true);
I'm aiming to get the following SQL:
SELECT DISTINCT PropertyType FROM PropertyPage
but instead I'm getting back this:
SELECT DISTINCT *, PropertyType FROM PropertyPage
Even their own example seems to give back a 'SELECT DISTINCT *. How can I avoid this?
Why do you use SQLQuery directly?
With pure ORM it should go like this:
$result = PropertyPage::get()->setQueriedColumns(array('PropertyType'))->distinct();
which returns a DataList you can loop over.
See setQueriedColumns() and distinct()
What version of SS framework are you using? Distinct was added in 3.1.7 iirc.
Just to add to #wmk's answer as well as directly address how to do it with SQLQuery, your call to selectField is the reason why that query happened. The documentation to selectField shows that it adds an additional field to select, not restrict the list to just that field.
The reason why their examples (A and B) also had the issue is that the first parameter for the constructor for SQLQuery is a default value of "*" for select statement.
To still use your code as a base, replace your use of selectField with setSelect. Your query will then look like this:
SELECT DISTINCT PropertyType FROM PropertyPage
It isn't bad to directly query the database using SQLQuery especially if you really just want the raw data of a particular column or the result itself cannot be expressed against a DataObject (eg. using column aliases). You would also have a small performance improvement that PHP would not have to instantiate a DataObject.
While saying that, it can be far more useful having a DataObject and the various functions that it exposes per record.

Linq to SQL nvarchar problem

I have discovered a huge performance problem in Linq to SQL.
When selecting from a table using strings, the parameters passed to sql server are always nvarchar, even when the sql table is a varchar. This results in table scans instead of seeks, a massive performance issue.
var q = (
from a in tbl
where a.index == "TEST"
select a)
var qa = q.ToArray();
The parameter is passed through as a nvarchar, which results in the entire index being converted from varchar to nvarchar before being used.
If the parameter is a varchar it's a very fast seek.
Is there any way to override or change this?
Thanks
Regards
Craig.
Hmmm. This was a known bug with pre-RTM builds of LINQ-to-SQL, but from what I read online this was a fixed problem for equality comparisons in RTM (although still broken for Contains() comparisons).
Regardless, here's a thread on MSDN forums with some workarounds detailed:
http://social.msdn.microsoft.com/Forums/en-US/linqtosql/thread/4276ecd2-31ff-4cd0-82ea-7a22ce25308b
The workaround I like most is this one:
//define a query
IQueryable<Employee> emps = from emp in dc2.Employees where emp.NationalIDNumber == "abc" select emp;
//get hold of the SQL command translation of the query...
System.Data.Common.DbCommand command = dc2.GetCommand(emps);
//change param type from "string" (nvarchar) to "ansistring" (varchar)
command.Parameters[0].DbType = DbType.AnsiString;
command.Connection = dc2.Connection;
//run
IEnumerable<Employee> emps2 = dc2.Translate<Employee>(command.ExecuteReader());
BTW, another case I saw this happening was in a table with odd distribution of values (e.g. 50% of table had the same value) meaning that, given the parameter is unknown to SQL Server at plan compilation time, a table scan was the best plan available. If your distribution is also unusual, then the workarounds above won't work, since the scan won't be coming from the missing conversion but rather from the parameterization itself. In that case, the only workaround I'd know would be to use an OPTIMIZE FOR hint and manually specify the SQL.