Why does limit make this query faster - optimization

Another question I asked here sparked another question.
I've got this query
SELECT `Qty`, `Invt`, `ClassNr`, `SubPartCode`, `Description`, `DesignCode`, `Measure`, `Remark`, `PartMnem`
FROM (`loodvrij_receptuur` lr)
JOIN `loodvrij_artikel` la ON `la`.`PartCode` = `lr`.`SubPartCode`
WHERE `lr`.`PartCode` = 'M2430A'
ORDER BY `SubPartCode`, `Qty` desc
The problem I had was that it executed faster in phpmyadmin than CodeIgniter. And I found out it had to do with phpmyadmin auto adding LIMIT 30
But what I wonder is, the query only produces 27 results. How does LIMIT 30 make the query faster? I can understand that a query that returns 30 or more results can be made faster with LIMIT 30, but how does a query with less then 30 results get faster when you add LIMIT 30?

I think you're assuming the limit 30 only comes into play at the end of the query. If I asked you to look at all the people in the phonebook who's last name started with the letter T, and lived in New York, you'd have to pull out 50,000 names and then look at which ones are in New York. If I told you I just needed 30, you'd not have to pull out 50,000 names, you could look at people in New York, and then pick the first 30 who's name starts with T.
The SQL query will be resolved generally in the most efficient manner, so by adding LIMIT 30, things under the hood will change around to make your query execute more quickly.

The question is not how many results the query returns but how many items the query have to looks over. With LIMIT 30, what you ask is to stop running through after getting your first 30 results.
If you dont limit the results, sql will still searching until it reaches the end of the base. So even if you get less than 30 results, it had to work harder.
I don't know if I make myself clear, sorry for my poor english.

Related

How to get a count of the number of times a sql statement has executed in X hours?

I'm using oracle db. I want to be able to count the number of times that a SQL statement was executed in X hours. For instance, how many times has the statement Select * From ExampleTable been executed in the past 5 hours?
I tried looking in V$SQL, V$SQLSTATS, V$SQLAREA, but they only keep a record of a statement's total amount of executions. They don't store what times the individual executions occurred. Is there any view I missed, or something else that does keep track of each individual statement execution + timestamp so that I can query by which have occurred X hours ago? Thanks for the help.
The views in the Active Workload Repository store historical SQL execution information, specifically the view DBA_HIST_SQLSTAT.
The view is not perfect; it contains a summary of the top SQL statements. This is almost perfect information for performance tuning - in practice, sampling will catch any performance problem. But if you're looking for a perfect record of every SQL execution, as far as I know the only way to get that information is through tracing, which is buggy and slow.
Hopefully this query is good enough:
select begin_interval_time, end_interval_time, executions_delta, dba_hist_sqlstat.*
from dba_hist_sqlstat
join dba_hist_snapshot
on dba_hist_sqlstat.snap_id = dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number = dba_hist_snapshot.instance_number
order by begin_interval_time desc, sql_id;
Apologies for putting this in an answer instead of a comment (I don't have the required reputation), but I think you may be out of luck. Here is an AskTOM asking basically the same question: AskTOM. Tom says unless you are using ASH that just isn't something the database is designed to do.

How to do server paging in SQL correct?

My situation: My application is slow. As slow as it gets... mostly because I have the feeling my Server paging for my dataTables / grids are wrongly implemented.
Let's start:
I have a SQL Server 2008 database, one table with all the information, 10 columns in it, at the moment 19K rows
My application is based on a JavaScript and ASP.Net backend code.
My SQL query is:
WITH Ordered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY Created DESC) AS 'RowNumber'
FROM Meetings
WHERE State IN ('Appointed', 'Accepted')
AND [xxx] LIKE '%1%'
AND [yyy] LIKE '%2%'
)
SELECT *
FROM Ordered
WHERE RowNumber BETWEEN 1 AND 41;
So at the moment this query runs around 27 to 32 seconds, which means over 30 seconds I got a timeout... on 19k rows in 1 year... which means in 1 month latest every query will run against dead...
As far as I am understand the order for this query is the problem: No index done here.
Because the query first sorts, then selects all with a manual row number, then selects only 40... (of course on page 2 of my grid it gets Rows 41 to 81...)
I COULD do an Index on my "Created desc" and the query would be much much faster, BUT every column is sortable for my grid which means "Created desc" could be every other column of my table and of course desc and asc order!
So, how to improve this?
//Edit:
Sorry to forget that:
The inner query (Inner Select) runs 6 seconds, while the total query runs 31 seconds...
Which means the "WITH ORDERES AS" is the problem here!
First things first: you have a performance problem, approach it with a proper methodology and measure appropriately. The inner query (Inner Select) runs 6 seconds, while the total query runs 31 seconds... Which means ... is amateurism. Read How to analyse SQL Server performance for correct ways to measure performance. And before we continue, if you start from 6 seconds you have already lost the game.
Now, on to the question.
WHERE State in('Appointed','Accepted') AND [xxx] LIKE '%1%' AND [yyy] LIKE '%2%'
This expression is basically non-indexable. Even if you add an index on State it will not help because of the low cardinality (few values with many rows each). And like '% ... %' is unindexable because it searches for values in the middle of the text.
You could try to replace like '% ... %' with a full-text search like CONTAINS ... which will be faster, provider you search for specific enough terms. But it does require you to deploy and configure properly the full-text indexes.
As for the paging, I do not favor much the ROWNUMBER approach. Even when a sort column exists, it involves a scan and count to skip the number of rows and gets slower and slower as you go to higher pages. I much more recommend the key based approach:
SELECT TOP (page size) ...
WHERE keys > <last row>
ORDER BY...
but this approach is more difficult to implement as it requires keeping track of keys rather than the page number.
But expect no miracles. You are asking a relational OLTP system to do the work of an ElasticSearch/Solr. It will never work as you expect. Use a tool appropriate for the job (a Search engine). Also read Dynamic Search Conditions in T‑SQL for a more thorough discussion, but again, expect no miracles.

SQL Wildcard Search - Efficiency?

There has been a debate at work recently at the most efficient way to search a MS SQL database using LIKE and wildcards. We are comparing using %abc%, %abc, and abc%. One person has said that you should always have the wildcard at the end of the term (abc%). So, according to them, if we wanted to find something that ended in "abc" it'd be most efficient to use `reverse(column) LIKE reverse('%abc').
I set up a test using SQL Server 2008 (R2) to compare each of the following statements:
select * from CLMASTER where ADDRESS like '%STREET'
select * from CLMASTER where ADDRESS like '%STREET%'
select * from CLMASTER where ADDRESS like reverse('TEERTS%')
select * from CLMASTER where reverse(ADDRESS) like reverse('%STREET')
CLMASTER holds about 500,000 records, there are about 7,400 addresses that end "Street", and about 8,500 addresses that have "Street" in it, but not necessarily at the end. Each test run took 2 seconds and they all returned the same amount of rows except for %STREET%, which found an extra 900 or so results because it picked up addresses that had an apartment number on the end.
Since the SQL Server test didn't show any difference in execution time I moved into PHP where I used the following code, switching in each statement, to run multiple tests quickly:
<?php
require_once("config.php");
$connection = odbc_connect( $connection_string, $U, $P );
for ($i = 0; $i < 500; $i++) {
$m_time = explode(" ",microtime());
$m_time = $m_time[0] + $m_time[1];
$starttime = $m_time;
$Message=odbc_exec($connection,"select * from CLMASTER where ADDRESS like '%STREET%'");
$Message=odbc_result($Message,1);
$m_time = explode(" ",microtime());
$m_time = $m_time[0] + $m_time[1];
$endtime = $m_time;
$totaltime[] = ($endtime - $starttime);
}
odbc_close($connection);
echo "<b>Test took and average of:</b> ".round(array_sum($totaltime)/count($totaltime),8)." seconds per run.<br>";
echo "<b>Test took a total of:</b> ".round(array_sum($totaltime),8)." seconds to run.<br>";
?>
The results of this test was about as ambiguous as the results when testing in SQL Server.
%STREET completed in 166.5823 seconds (.3331 average per query), and averaged 500 results found in .0228.
%STREET% completed in 149.4500 seconds (.2989 average per query), and averaged 500 results found in .0177. (Faster time per result because it finds more results than the others, in similar time.)
reverse(ADDRESS) like reverse('%STREET') completed in 134.0115 seconds (.2680 average per query), and averaged 500 results found in .0183 seconds.
reverse('TREETS%') completed in 167.6960 seconds (.3354 average per query), and averaged 500 results found in .0229.
We expected this test to show that %STREET% would be the slowest overall, while it was actually the fastest to run, and had the best average time to return 500 results. While the suggested reverse('%STREET') was the fastest to run overall, but was a little slower in time to return 500 results.
Extra fun: A coworker ran profiler on the server while we were running the tests and found that the use of the double wildcard produced a significant increase CPU usage, while the other tests were within 1-2% of each other.
Are there any SQL Efficiency experts out that that can explain why having the wildcard at the end of the search string would be better practice than the beginning, and perhaps why searching with wildcards at the beginning and end of the string was faster than having the wildcard just at the beginning?
Having the wildcard at the end of the string, like 'abc%', would help if that column were indexed, as it would be able to seek directly to the records which start with 'abc' and ignore everything else. Having the wild card at the beginning means it has to look at every row, regardless of indexing.
Good article here with more explanation.
Only wildcards at the end of a Like character string will use an index.
You should look at using FTS Contains if you want to improve speed of wildcards at the front and back of a character string. Also see this related SO post regarding Contains versus Like.
From Microsoft it is more efficient to leave the closing wildcard because it can, if one exists, use an index rather than performing a scan. Think about how the search might work, if you have no idea what's before it then you have to scan everything, but if you are only searching the tail end then you can order the rows and even possible (depending on what you're looking for) do a quasi-binary search.
Some operators in joins or predicates tend to produce resource-intensive operations. The LIKE operator with a value enclosed in wildcards ("%a value%") almost always causes a table scan. This type of table scan is a very expensive operation because of the preceding wildcard. LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right.
So, the above quote also explains why there was a huge processor spike when running two wildcards. It completed faster only by happenstance because there is enough horsepower to cover up the inefficiency. When trying to determine performance on a query you want to look at the execution of the query rather than the resources of the server because those can be misleading. If I have a server with enough horsepower to serve a weather vain and I'm running queries on tables as small as 500,000 rows the results are going to be misleading.
Less the fact that Microsoft quoted your answer, when doing performance analysis, consider taking the dive into learning how to read the execution plan. It's an investment and very dry, but it will be worth it in the long run.
In short though, whoever was indicating that the trailing wildcard only is more efficient, is correct.
In MS SQL, if you want to have the names those are ending with 'ABC', then u can have the query like below(suppose table name is student)
select * from student where student_name like'%[ABC]'
so it will give those names which ends with 'A' ,'B','C'.
2) if u want to have names which are starting with 'ABC' means-
select * from student where student_name like '[ABC]%'
3) if u want to have names which in middle have 'ABC'
select * from student where student_name like '%[ABC]%'

Need help in optimizing the query in ruby on rails

I have some 30,000 records in my Raw_deals table and some raw_cities table has some 30 records and each deal is linked with some 5-8 cities.
Now i want to fetch any random deal within some specific cities.
List of those cities can be fetched like this:
#raw_cities = RawCity.where('disabled = ?', 0).map(&:id)
Now i need a deal. I wrote a query but its taking too much time.
#raw_deal = RawDeal.order("RAND()").find(:first,:joins=>[:raw_cities], :conditions=>["raw_cities.id IN (?)",#raw_cities])
The order("RAND()") is probably what's slowing your query down, and since you're only looking for one single deal, you can use a combination of limit and offset to simulate a random order.
Try something like this:
#raw_deal = RawDeal.offset(rand(RawDeal.count)).
joins(:raw_cities).
where(raw_cities: #raw_cities).
first

SQL or Ruby on Rails question: Limit results from a search to the greater of two conditions

Is it possible to impose conditions on a SQL query to the greater of two conditions?
For example, imagine a site that has Posts. Is there a way to request posts such that the result will be guaranteed to contain both of at least all posts made in the past 24 hours, and at least 10 posts, but not unnecessarily exceeding either limit?
I.e., if it already has all posts made in the past 24 hours, but not 10 posts, it will continue adding posts until it reaches 10, or if it already has 10 posts, but not all posts made in the past 24 hours, it will continue until it covers the past 24 hours, but it will not ever have both posts from 25 hours ago and more than 10 posts.
Edit: In response to request to post my model. I'm not exactly sure how to post the model, since my models are created by Rails. However, in short, a Post has a created_at datetime row, and there is an index on it.
I think LIMIT requires a constant, but if you have an index on the created field, pulling a few records from one end will be pretty efficient, right? What about something like (in SQL-ish pseudocode):
select * from posts
where created > min(one_day_ago,
(select created from posts
order by created desc
limit 1 offset 10));
Probably the clearest solution is to make a select that returns the most recent 10, another select that returns everything for the last day and use the union of these two selects.
One of these unioned selects is always going to completely contain the other, so it may not be the most efficient, but I think it's what I would do.