I have some 30,000 records in my Raw_deals table and some raw_cities table has some 30 records and each deal is linked with some 5-8 cities.
Now i want to fetch any random deal within some specific cities.
List of those cities can be fetched like this:
#raw_cities = RawCity.where('disabled = ?', 0).map(&:id)
Now i need a deal. I wrote a query but its taking too much time.
#raw_deal = RawDeal.order("RAND()").find(:first,:joins=>[:raw_cities], :conditions=>["raw_cities.id IN (?)",#raw_cities])
The order("RAND()") is probably what's slowing your query down, and since you're only looking for one single deal, you can use a combination of limit and offset to simulate a random order.
Try something like this:
#raw_deal = RawDeal.offset(rand(RawDeal.count)).
joins(:raw_cities).
where(raw_cities: #raw_cities).
first
Related
I'm using Active Record to send a few queries to a postgres database.
I have a User model, a Business model and a joiner named UserBiz.
The queries I mentioned go through the entire UserBiz collection and then filter out the businesses that match user provided categories and searches.
if !params[:category].blank?
dataset = UserBiz.all.includes(:business, :user).select{|ub| (ub.business.category.downcase == params[:category].downcase) }
else
dataset = UserBiz.all.includes(:business, :user)
end
if !params[:search].blank?
dataset = dataset.select{|ub| (ub.business.summary.downcase.include? params[:search].downcase) || (ub.business.name.downcase.include? params[:search].downcase) }
end
These "work" but the problem is when I threw a quarter million UserBizs into my database to see what happens, one search or category change takes 15 seconds. How can I make these queries faster?
select in your code loads everything into memory which is very bad for performance when dealing with a lot of records.
You have to do the filtering in the database, with something like this.
UserBiz
.includes(:business, :user)
.where("LOWER(businesses.category) = LOWER(?)", params[:category])
it's slow because you selecting all data from UserBiz.
Try out pagination. pagy, will_paginate etc.
So, here is my problem.
I've got a database which imports data from CSV that is huge. It contains around 32000 entries, but has around 200 header columns, hence the standard select is slow.
When I do:
MyModel.all or MyModel.eager_load.all it takes anywhere from 45 seconds up to a minute to load all the entries.
The idea was to use limit to pull maybe 1000 entries like:
my_model = MyModel.limit(1000)
This way I can get the last id like:
last_id = my_model.last.id
To load next 1000 queries I literally use
my_model.where('id > ?', last_entry).limit(1000)
# then I set last_entry again, and keep repeating the process
last_entry = my_model.last.id
But this seems like an overkill, and doesn't seem right.
Is there any better or easier way to do this?
Thank you in advance.
Ruby on Rails has the find_each method that does exactly what you try to do manually. It loads all records from the database in batches of 1000.
MyModel.find_each do |instance|
# do something with this instance, for example, write into the CVS file
end
Rails has an offset method that you can combine with limit.
my_model = MyModel.limit(1000).offset(1000)
You can see the API documentation here: https://apidock.com/rails/v6.0.0/ActiveRecord/QueryMethods/offset
Hope that helps :)
I have a listing of ~10,000 apps and I'd like to order them by certain columns, but I want to give certain columns more "weight" than others.
For instance, each app has overall_ratings and current_ratings. If the app has a lot of overall_ratings, that's worth 1.5, but the number of current_ratings would be worth, say 2, since the number of current_ratings shows the app is active and currently popular.
Right now there are probably 4-6 of these variables I want to take into account.
So, how can I pull that off? In the query itself? After the fact using just Ruby (remember, there are over 10,000 rows that would need to be processed here)? Something else?
This is a Rails 3.2 app.
Sorting 10000 objects in plain Ruby doesn't seem like a good idea, specially if you just want the first 10 or so.
You can try to put your math formula in the query (using the order method from Active Record).
However, my favourite approach would be to create a float attribute to store the score and update that value with a before_save method.
I would read about dirty attributes so you only perform this scoring when some of you're criteria is updated.
You may also create a rake task that re-scores your current objects.
This way you would keep the scoring functionality in Ruby (you could test it easily) and you could add an index to your float attribute so database queries have better performance.
One attempt would be to let the DB do this work for you with some query like: (can not really test it because of laking db schema):
ActiveRecord::Base.connection.execute("SELECT *,
(2*(SELECT COUNT(*) FROM overall_ratings
WHERE app_id = a.id) +
1.5*(SELECT COUNT(*) FROM current_ratings
WHERE app_id = a.id)
AS rating FROM apps a
WHERE true HAVING rating > 3 ORDER BY rating desc")
Idea is to sum the number of ratings found for each current and overall rating with the subqueries for an specific app id and weight them as desired.
There has been a debate at work recently at the most efficient way to search a MS SQL database using LIKE and wildcards. We are comparing using %abc%, %abc, and abc%. One person has said that you should always have the wildcard at the end of the term (abc%). So, according to them, if we wanted to find something that ended in "abc" it'd be most efficient to use `reverse(column) LIKE reverse('%abc').
I set up a test using SQL Server 2008 (R2) to compare each of the following statements:
select * from CLMASTER where ADDRESS like '%STREET'
select * from CLMASTER where ADDRESS like '%STREET%'
select * from CLMASTER where ADDRESS like reverse('TEERTS%')
select * from CLMASTER where reverse(ADDRESS) like reverse('%STREET')
CLMASTER holds about 500,000 records, there are about 7,400 addresses that end "Street", and about 8,500 addresses that have "Street" in it, but not necessarily at the end. Each test run took 2 seconds and they all returned the same amount of rows except for %STREET%, which found an extra 900 or so results because it picked up addresses that had an apartment number on the end.
Since the SQL Server test didn't show any difference in execution time I moved into PHP where I used the following code, switching in each statement, to run multiple tests quickly:
<?php
require_once("config.php");
$connection = odbc_connect( $connection_string, $U, $P );
for ($i = 0; $i < 500; $i++) {
$m_time = explode(" ",microtime());
$m_time = $m_time[0] + $m_time[1];
$starttime = $m_time;
$Message=odbc_exec($connection,"select * from CLMASTER where ADDRESS like '%STREET%'");
$Message=odbc_result($Message,1);
$m_time = explode(" ",microtime());
$m_time = $m_time[0] + $m_time[1];
$endtime = $m_time;
$totaltime[] = ($endtime - $starttime);
}
odbc_close($connection);
echo "<b>Test took and average of:</b> ".round(array_sum($totaltime)/count($totaltime),8)." seconds per run.<br>";
echo "<b>Test took a total of:</b> ".round(array_sum($totaltime),8)." seconds to run.<br>";
?>
The results of this test was about as ambiguous as the results when testing in SQL Server.
%STREET completed in 166.5823 seconds (.3331 average per query), and averaged 500 results found in .0228.
%STREET% completed in 149.4500 seconds (.2989 average per query), and averaged 500 results found in .0177. (Faster time per result because it finds more results than the others, in similar time.)
reverse(ADDRESS) like reverse('%STREET') completed in 134.0115 seconds (.2680 average per query), and averaged 500 results found in .0183 seconds.
reverse('TREETS%') completed in 167.6960 seconds (.3354 average per query), and averaged 500 results found in .0229.
We expected this test to show that %STREET% would be the slowest overall, while it was actually the fastest to run, and had the best average time to return 500 results. While the suggested reverse('%STREET') was the fastest to run overall, but was a little slower in time to return 500 results.
Extra fun: A coworker ran profiler on the server while we were running the tests and found that the use of the double wildcard produced a significant increase CPU usage, while the other tests were within 1-2% of each other.
Are there any SQL Efficiency experts out that that can explain why having the wildcard at the end of the search string would be better practice than the beginning, and perhaps why searching with wildcards at the beginning and end of the string was faster than having the wildcard just at the beginning?
Having the wildcard at the end of the string, like 'abc%', would help if that column were indexed, as it would be able to seek directly to the records which start with 'abc' and ignore everything else. Having the wild card at the beginning means it has to look at every row, regardless of indexing.
Good article here with more explanation.
Only wildcards at the end of a Like character string will use an index.
You should look at using FTS Contains if you want to improve speed of wildcards at the front and back of a character string. Also see this related SO post regarding Contains versus Like.
From Microsoft it is more efficient to leave the closing wildcard because it can, if one exists, use an index rather than performing a scan. Think about how the search might work, if you have no idea what's before it then you have to scan everything, but if you are only searching the tail end then you can order the rows and even possible (depending on what you're looking for) do a quasi-binary search.
Some operators in joins or predicates tend to produce resource-intensive operations. The LIKE operator with a value enclosed in wildcards ("%a value%") almost always causes a table scan. This type of table scan is a very expensive operation because of the preceding wildcard. LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right.
So, the above quote also explains why there was a huge processor spike when running two wildcards. It completed faster only by happenstance because there is enough horsepower to cover up the inefficiency. When trying to determine performance on a query you want to look at the execution of the query rather than the resources of the server because those can be misleading. If I have a server with enough horsepower to serve a weather vain and I'm running queries on tables as small as 500,000 rows the results are going to be misleading.
Less the fact that Microsoft quoted your answer, when doing performance analysis, consider taking the dive into learning how to read the execution plan. It's an investment and very dry, but it will be worth it in the long run.
In short though, whoever was indicating that the trailing wildcard only is more efficient, is correct.
In MS SQL, if you want to have the names those are ending with 'ABC', then u can have the query like below(suppose table name is student)
select * from student where student_name like'%[ABC]'
so it will give those names which ends with 'A' ,'B','C'.
2) if u want to have names which are starting with 'ABC' means-
select * from student where student_name like '[ABC]%'
3) if u want to have names which in middle have 'ABC'
select * from student where student_name like '%[ABC]%'
Another question I asked here sparked another question.
I've got this query
SELECT `Qty`, `Invt`, `ClassNr`, `SubPartCode`, `Description`, `DesignCode`, `Measure`, `Remark`, `PartMnem`
FROM (`loodvrij_receptuur` lr)
JOIN `loodvrij_artikel` la ON `la`.`PartCode` = `lr`.`SubPartCode`
WHERE `lr`.`PartCode` = 'M2430A'
ORDER BY `SubPartCode`, `Qty` desc
The problem I had was that it executed faster in phpmyadmin than CodeIgniter. And I found out it had to do with phpmyadmin auto adding LIMIT 30
But what I wonder is, the query only produces 27 results. How does LIMIT 30 make the query faster? I can understand that a query that returns 30 or more results can be made faster with LIMIT 30, but how does a query with less then 30 results get faster when you add LIMIT 30?
I think you're assuming the limit 30 only comes into play at the end of the query. If I asked you to look at all the people in the phonebook who's last name started with the letter T, and lived in New York, you'd have to pull out 50,000 names and then look at which ones are in New York. If I told you I just needed 30, you'd not have to pull out 50,000 names, you could look at people in New York, and then pick the first 30 who's name starts with T.
The SQL query will be resolved generally in the most efficient manner, so by adding LIMIT 30, things under the hood will change around to make your query execute more quickly.
The question is not how many results the query returns but how many items the query have to looks over. With LIMIT 30, what you ask is to stop running through after getting your first 30 results.
If you dont limit the results, sql will still searching until it reaches the end of the base. So even if you get less than 30 results, it had to work harder.
I don't know if I make myself clear, sorry for my poor english.