In which order Rails does the DB queries - sql

In Select n objects randomly with condition in Rails Anurag kindly proposed this answer to randomly select n posts with votes >= x
Post.all(:conditions => ["votes >= ?", x], :order => "rand()", :limit => n)
my concern is that the number of posts that have more than x votes is very big.
what is the order the DB apply this criteria to the query?
Does it
(a) select n posts with votes > x and then randomises it? or
(b) select all posts with votes > x and then randomises and then selects the n first posts?
(c) other?

The recommendation to check the development log is very useful.
However, in this case, the randomisation is happening on the MySQL end, not inside Active Record. In order to see how the query is being run inside MySQL, you can copy the query from the log and paste it into your MySQL tool of choice (console, GUI, whatever) and add "EXPLAIN" to the front of it.
You should end up with something like:
EXPLAIN SELECT * FROM posts WHERE votes >= 'x' ORDER BY rand() LIMIT n
When I try a similar query in MySQL, I am told:
Select Type: SIMPLE
Using where; Using temporary; Using filesort
Then you should do a search for some of the excellent advice on SO on how to optimise MySQL queries. If there is an issue, adding an index on the votes column may improve performance. situation.

As Toby already pointed out, this is purely up to SQL server, everything being done in the query itself.
However, I am afraid that you can't get truly randomized output unless the database gets the whole resultset first, and then randomises it. Although, you should check the EXPLAIN result anyway.

Look in development.log for the generated query, should give you a clue.

Related

Querying time higher with 'Where' than without it

I have something what I think is a srange issue. Normally, I think that a Query should last less time if I put a restriction (so that less rows are processed). But I don't know why, this is not the case. Maybe I'm putting something wrong, but I don't get error; the query just seems to run 'till infinity'.
This is the query
SELECT
A.ENTITYID AS ORG_ID,
A.ID_VALUE AS LEI,
A.MODIFIED_BY,
A.AUDITDATETIME AS LAST_DATE_MOD
FROM (
SELECT
CASE WHEN IFE.NEWVALUE IS NOT NULL
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE')
ELSE NULL
end as ID_TYPE,
case when IFE.NEWVALUE is not null
then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_VALUE')
ELSE NULL
END AS ID_VALUE,
(select u.username from admin.users u where u.userid = ife.analystuserid) as Modified_by,
ife.*
FROM ife.audittrail ife
WHERE
--IFE.AUDITDATETIME >= '01-JUN-2016' AND
attributeid = 499
AND ROWNUM <= 10000
AND (CASE WHEN IFE.NEWVALUE IS NOT NULL then EXTRACTVALUE(xmltype(IFE.NEWVALUE), '/DocumentElement/ORG_IDENTIFIERS/ID_TYPE') ELSE NULL end) = '38') A
--WHERE A.AUDITDATETIME >= '01-JUN-2016';
So I tried with the two clauses commented (one per each time of course).
And with both of them happens the same; the query runs for so long time that I have to abort it.
Do you know why this could be happening? How could I do, maybe in a different way, to put the restriction?
The values of the field AUDITDATETIME are '06-MAY-2017', for example. In that format.
Thank you very much in advance
I think you may misunderstand how databases work.
Firstly, read up on EXPLAIN - you can find out exactly what is taking time, and why, by learning to read the EXPLAIN statement.
Secondly - the performance characteristics of any given query are determined by a whole range of things, but usually the biggest effort goes not in processing rows, but finding them.
Without an index, the database has to look at every row in the database and compare it to your where clause. It's the equivalent of searching in the phone book for a phone number, rather than a name (the phone book is indexed on "last name").
You can improve this by creating indexes - for instance, on columns "AUDITDATETIME" and "attributeid".
Unlike the phone book, a database server can support multiple indexes - and if those indexes match your where clause, your query will be (much) faster.
Finally, using an XML string extraction for a comparison in the where clause is likely to be extremely slow unless you've got an index on that XML data.
This is the equivalent of searching the phone book and translating the street address from one language to another - not only do you have to inspect every address, you have to execute an expensive translation step for each item.
You probably need index(es)... We can all make guesses on what indexes you already have, and need to add, but most dbms have built in query optimizers.
If you are using MS SQL Server you can execute query with query plan, that will tell you what index you need to add to optimize this particular query. It will even let you copy /paste the command to create it.

Pagination with SQLite using LIMIT

I'm writing my own SQLIteBrowser and I have one final problem, which apparently is quite often discussed on the web, but it doesn't seem to have a good general solution.
So currently I store the SQL which the user entered. Whenever I need to fetch rows, I execute the SQL by adding "Limit n, m` at the end of the SQL.
For normal SQLs, which I mostly use, this seems good enough. However if I want to use limit myself in the query, this will obviously give an error, because the resulting sql can look like this:
select * from table limit 30 limit 1,100
which is obviously wrong. Is there some better way to do this?
My idea was, that I could scan the SQL and check if there is a limit clause already used and then ignore it. Of course it's not as simlpe as that, because if I have an sql like this:
select * from a where a.b = ( select x from z limit 1)
it obviously should still apply my limit in such a case, so I could scan the string from the end and look if there is a limit somehwere. My question now is, how feasable this is. As I don't know who the SQL parser works, I'm not sure if LIMIT has to be at the end of SQL or if there can be other commands at the end as well.
I tested it with order byand group by and I get SQL errors if limit is not at the end, so my assumption seems to be true.
I found now a much better solution which is quite simple and doesn't require me to parse the SQL.
The user can enter an arbitrary sql. The result is loaded into a table. Since we don't want to load the whole result at once, as this can return millions of records, only N records are retriueved. When the user scroll to the bottom of the table the next N items are fetched and loaded into the table.
The solution is, to wrapt the SQL into an outer sql with my page size limits.
select * from (arbitrary UserSQL) limit PageSize, CurrentOffset
I tested it with SQLs I regularly use, and this seem to work quite nicely and is also fast enough for my purpose.
However, I don't know wether SQLite has a mechanism to fetch the new rows faster, or if the sql has to be rerun every time. In that case it might not be a good solution fo rrealy complex queries with a long response time.

Active Record: delete_all with limit

Trying to get a definitive answer on whether it's possible to limit a delete_all to X number of records.
I'm trying the following:
Model.where(:account_id => account).order(:id).limit(1000).delete_all
but it doesn't seem to respect the limit and instead just deletes all Model where :account_id => account.
I would expect it to generate the following:
delete from model where account_id = ? order by id limit 1000
This seems to work fine when using destroy_all but I want to delete in bulk.
This one also worked pretty well to me (and my needs)
Model.connection.exec_delete('DELETE FROM models ORDER BY id LIMIT 10000', 'DELETE', [])
I know it might seem a bit cumbersome but it'll return the affected rows AND also will log the query through the rails logger. ;)
Try:
Model.delete(Model.where(:account_id => account).order(:id).limit(1000).pluck(:id))
This was the best solution I used to delete millions of rows:
sql = %{ DELETE FROM model WHERE where_clause LIMIT 1000 }
results = 1
while results > 0 do
results = ActiveRecord::Base.connection.exec_delete(sql)
end
This performed much faster than deleting in batches where the IDs were being used in the SQL.
ActiveRecord::Base.connection.send(:delete_sql,'delete from table where account_id = <account_id> limit 1000')
You have to use send because 'delete_sql' is protected, but this works.
I found that removing the 'order by' significantly sped it up too.
I do think it's weird that using .limit works with destroy_all but not delete_all
Or Model.where(:account_id => account).order(:id).limit(1000).map(&:delete), although it is not the best approach if you have thousands of records to delete/destroy.
Model.delete_all() seems to be the best option as it delegates to SQL the task of selecting the records and mass delete them.

activerecord equivalent to SQL 'minus'

What's the rails way to subtract a query result from another? A database specific SQL example would be:
SELECT Date FROM Store_Information
MINUS
SELECT Date FROM Internet_Sales
I'll throw this into the mix - not a solution, but might help with the progress:
Best I can think of is to use NOT IN:
StoreInformation.where('date NOT IN (?)', InternetSale.all)
That's Rails 3 - Rails 2 would be:
StoreInformation.all(:conditions => ['date NOT IN(?)', InternetSale.all])
But both of these will first select everything from internet_sales; what you really want is a nested query to do the whole thing in the database engine. For this, I think you'll have to break into find_by_sql and just give a nested query.
Obviously this assumes you're using MySQL! HTH.
Late answer but I think you meant :
activities = Activity.all
searches = activites.where(key: 'search')
(activites - searches).each do |anything_but_search|
p anything_but_search
end
You can substract two ActiveRecordsRelation and get the MINUS result, just like SQL would.
I am using Rails 4.2 so anything beyond that version should do the trick.

MS Access Limit

What's the equivalent of mysql Limit in ms access. TOP is not sufficient since I'm going to use it for pagination.
Thanks
There isn't one. Your best bet is to add an ID column as a primary key (if you don't already have one) and chunk output by looping through:
SELECT * FROM table
WHERE id >= offset AND id <= offset + chunk_size - 1
until you get all the rows.
Curiously, there are a few references in Microsoft documentation to a LIMIT TO nn ROWS syntax for the Access Database Engine:
ACC2002: Setting ANSI 92 Compatibility in a Database Does Not Allow DISTINCT Keyword in Aggregate Functions
About ANSI SQL query mode (MDB)
However, actual testing seems to confirm that this syntax has never existed in a release version of the Access Database Engine. Perhaps this is one of those features that the SQL Server team wanted to put into Jet 4.0 but were ordered to rollback by the Windows team? Whatever, it seem we must simply put it down to a bad documentation error that Microsoft won't take the time to correct :(
If you need to do pagination on the server** side then I suggest you consider a more capable, modern SQL product with better documentation ;)
** conceptually, that is: the Access Database Engine is not a server DBMS.
Since it doesn't appear that you have any type of sequencial unique key number for these rows, you'll need to create a ranking column: How to Rank Records Within a Query
You need to determine how many rows at a time you will return N = (10, 25,100).
You need to keep track of what "page" the user is on and the values of the first and last rank.
Then when you make the call for the next page it is either the next N rows that are > or < the first and last ranks (depending if the users is going to the previous or next page.).
I'm sure there is a way to calculate the last page, first page, etc.
Only way to achive paging SQL similar to Limit statement by using TOP keywords is as follows:
First Step:
sql = "select top "&LESS_COUNT&" * from (SELECT top "&(PAGE_COUNT*getPage)&" * FROM (SELECT "&COLUMNS&" FROM "&TABLENAME&") AS TBL "&getWhere&getOrderby("asc")&") as TBL "&getOrderby("desc")
Second step:
sql = "SELECT TOP "&PAGE_COUNT&" * FROM (" & sql & ") as TBL "&getOrderby("asc")
To summarize; you should re-order and make the results upside down for 3 times.
port your project to PHP & MySQL. Better support for these type of actions and queries and much much better online documentation. As a 16 year veteran DB developer, I have grown to dispise MS Access and MS SQL with a passion unmatched by anything else. This is due exclusively to their lack of support and documentation.