I'm not particularly familiar with Ruby on Rails, but I'm troubleshooting an issue we're experiencing with a rake job that is supposed to be cleaning database tables. The tables grow very large very quickly, and the query generated by ActiveRecord doesn't seem to be efficient enough to handle it.
The Ruby calls looks like this:
Source.where("id not IN (#{Log.select('DISTINCT source_id').to_sql})").delete_all
and this:
Log.joins(:report).where(:report_id => Report.where(cond)).delete_all
I'm trying to get at the SQL, so we can have our DBA's attempt to optimize it better. I've noticed if I drop the ".delete_all" I can add a ".to_sql" which gives me the SELECT statement of the query, prior to the call to ".delete_all". I'd like to see what SQL is being generated by that delete_all method though.
Is there a way to do that?
Another option is to use raw Arel syntax, similar to a simplified version of what ActiveRecord::Relation#delete_all does.
relation = Model.where(...)
arel = relation.arel
stmt = Arel::DeleteManager.new
stmt.from(arel.join_sources.empty? ? Model.arel_table : arel.source)
stmt.wheres = arel.constraints
sql = Model.connection.to_sql(stmt, relation.bound_attributes)
print sql
This will give you the generated delete sql. Here's an example using postgres as the sql adapter
relation = User.where('email ilike ?', '%#gmail.com')
arel = relation.arel
stmt = Arel::DeleteManager.new
stmt.from(arel.join_sources.empty? ? User.arel_table : arel.source)
stmt.wheres = arel.constraints
sql = User.connection.to_sql(stmt, relation.bound_attributes)
=> DELETE FROM "users" WHERE (email ilike '%#gmail.com')
From the fine manual:
delete_all(conditions = nil)
Deletes the records matching conditions without instantiating the records first, and hence not calling the destroy method nor invoking callbacks. This is a single SQL DELETE statement that goes straight to the database, much more efficient than destroy_all.
So a Model.delete_all(conditions) ends up as
delete from models where conditions
When you say Model.where(...).delete_all, the conditions for the delete_all come from the where calls so these are the same:
Model.delete_all(conditions)
Model.where(conditions).delete_all
Applying that to your case:
Source.where("id not IN (#{Log.select('DISTINCT source_id').to_sql})").delete_all
you should see that you're running:
delete from sources
where id not in (
select distinct source_id
from logs
)
If you run your code in a development console you should see the SQL in the console or the Rails logs but it will be as above.
As far as optimization goes, my first step would be to drop the DISTINCT. DISTINCT usually isn't cheap and IN doesn't care about duplicates anyway so not in (select distinct ...) is probably pointless busy work. Then maybe an index on source_id would help, the query optimizer might be able to slurp the source_id list straight out of the index without having to do a table scan to find them. Of course, query optimization is a bit of a dark art so these simple steps may or may not work.
ActiveRecord::Base.logger = Logger.new(STDOUT) should show you all the SQL generated by rails on your console.
Related
We have to work with older version of an ERP system (1993).
It has multiple modules. These modules have windows(tabs). Tabs have cols (obviously).
In this tabs the USER can make a "new column" -> it's like a subquery. Query can be used only in parentheses ().
I'm just curious, is it possible to make an injection by user.
e.g.:
--basic query (self join)
(select i.my_col from my_table i where my_pk = i.pk)
--illlustrating
(select replace(i.my_col, 'UPDATE...') from my_table i where my_pk = i.pk)
Is there any way to make the second query workable ? I mean, can the user somehow update columns whit this method ?
How can i test it ?
Dynamic values can be handled for where condition through preparedStatement and setParameter however unfortunately that option is not available for dynamic column selection.
The best thing can be done is to have all possible/applicable column names before you pass to the query.
// check if my_col is possible values else throw the error.
(select replace(i.my_col, 'UPDATE...') from my_table i where my_pk = i.pk)
Avoiding SQL injection is down to the mechanism which turns the user's input into executable statements. The actual example you posted won't run, but I can think of ways it might be possible to hijack the SELECT to run malicious DML. It depends on the framework: given that the underlying software is ancient I suspect it might be extremely vulnerable.
Generally speaking, if you're worried about SQL Injection you should investigate using Oracle's built-in DBMS_ASSERT package to verify your SQL strings. Find out more
I am new to learning Documentum and we came across this query being run by the system that we are looking at how to potentially speed up:
SELECT ALL dm_document.r_object_id
FROM dm_document_sp dm_document
WHERE (
dm_document.object_name = :"SYS_B_0"
AND dm_document.r_object_id IN (
SELECT r_object_id
FROM dm_sysobject_r
WHERE i_folder_id = :"SYS_B_1"
)
)
AND (
dm_document.i_has_folder = :"SYS_B_2"
AND dm_document.i_is_deleted = :"SYS_B_3"
)
We looked at adding an index or using a SQL profile. However, the index would be somewhat large and will continue to grow. The SQL profile also would need to be re-examined periodically.
We thought it would be better to look at re-writing the SQL itself. Is there a way to override the system to use custom SQL (i.e. SQL written by the developers) for specific queries that Documentum auto-generates?
Unfortunately there is no way how to alter the default Documentum behavior of translation of DQL into result SQL.
But you can directly execute SQL in your custom applications, jobs, BOFs, components, etc using JDBC. For other than SELECT queries can be also used DQL EXECUTE statement like this:
EXECUTE exec_sql WITH query = 'sql_query'
Another option is to register specific *_s or *_r tables and access them directly in DQL. For example you can register dm_sysobject_s like this:
REGISTER TABLE dm_dbo.dm_sysobject_s ("r_object_id" CHAR(16))
And then you can use it in DQL:
SELECT object_name FROM dm_sysobject_s
And you can also normally join the registered table with Documentum types in DQL, for example:
SELECT object_name FROM dm_sysobject_s s, dmi_queue_item q WHERE s.r_object_id = q.item_id
But keep in mind that this is not recommended approach by Documentum to directly access their internal tables but when you really need to speed up your application then you have to use alternative ways.
Anyway I would recommend to use indexes at first and if it is not suficent then you can continue with steps described above.
I'm working on a basic Rails 4.0 app to learn how it works, and I've run into something that I can't seem to figure out. I've been doing queries to the default Sqlite DB via ActiveRecord, and for most queries, according to the debug output, it seems to generate parameterized queries, like so:
2.0.0-p247 :070 > file.save
(0.2ms) begin transaction
SQL (0.6ms) UPDATE "rep_files" SET "report_id" = ?, "file_name" = ?, "updated_at" = ?
WHERE "rep_files"."id" = 275 [["report_id", 3], ["file_name", "hello.jpg"],
["updated_at", Mon, 09 Sep 2013 04:30:19 UTC +00:00]]
(28.8ms) commit transaction
However, whenever I do a query using find_by, it seems to just stick the provided parameters into the generated SQL:
2.0.0-p247 :063 > file = RepFile.find_by(report_id: "29", file_name: "1.png")
RepFile Load (6.2ms) SELECT "rep_files".* FROM "rep_files" WHERE
"rep_files"."report_id" = 29 AND "rep_files"."file_name" = '1.png' LIMIT 1
It does seem to be escaping the parameters properly to prevent SQL injection:
2.0.0-p247 :066 > file = RepFile.find_by(report_id: "29", file_name: "';")
RepFile Load (0.3ms) SELECT "rep_files".* FROM "rep_files" WHERE
"rep_files"."report_id" = 29 AND "rep_files"."file_name" = ''';' LIMIT 1
However, it was my understanding that sending parameterized queries to the database was considered a better option than trying to escape query strings, since the parameterized option will cause the query data to bypass the database's parsing engine entirely.
So what's going on here? Is this some oddity in the Sqlite adapter or the way that the debug output is generated? If ActiveRecord is actually working like this, is there some reason for it? I can't find anything about this anywhere I've looked. I've started looking through the ActiveRecord code, but haven't figured anything out yet.
If we look at find_by in the source, we see this:
def find_by(*args)
where(*args).take
end
The take just tacks the limit 1 onto the query so we're left with where. The where method can deal with arguments in various forms with various placeholder formats, in particular, you can call where like this:
where('c = :pancakes', :pancakes => 6)
Using named placeholders is quite nice when you have a complicated query that is best expressed with an SQL snippet or a query that uses the same value several times so named placeholders are quite a valuable feature. Also, you can apply where to the ActiveRecord::Relation that you got from a where call and you can build the final query in pieces spread across several methods and scopes that don't know about each other. So, where has a problem: multiple things that don't know about each other can use the same named placeholder and conflicts can arise. One way around this problem would be to rename the named placeholders to ensure uniqueness, another way is to manually fill in the placeholders through string wrangling. Another problem is that different databases support different placeholder syntaxes. ActiveRecord has chosen to manually fill in the placeholders.
Summary: find_by doesn't use placeholders because where doesn't and where doesn't because it is easier to build the query piecemeal through string interpolation than it is to keep track of all the placeholders and database-specific syntaxes.
.Where(x => !x.Rated)
This creates sql that looks like:
not (cdrcalltmp0_.Rated=1)
Our dba says I have to remove the not for some filtered index to work.
.Where(x => x.Rated == false)
This creates sql that looks like:
cdrcalltmp0_.Rated=#p2 order by cdrcalltmp0_.Created asc'
This doesn't work because of the parameter.
He would like this sql:
cdrcalltmp0_.Rated=0 order by cdrcalltmp0_.Created asc'
Is it possible to make nhibernate not use parameters?
So that a filtered index works.
Preface:
The following answer is assuming you are using SQL Server 2008. If you are not, then it is quite possible that the database technology in question does not support indexes when using the NOT operator. So, if your using SQL Server 2008...
Your DBA doesn't know what he's talking about.
The following syntax
NOT ( SomeTableAlias.SomeTableColumn = 1 )
will absolutely be understood by the SQL Server Query Analyzer. I've got queries from NHibernate that look exactly like the above syntax and they are indeed using the proper indexes.
And to answer your question, no. NHibernate always uses parameters when it creates the SQL for you. Parameterized queries are extremely common place, even when using traditional ADO.NET yourself.
The only way to get NHibernate to not use parameters is if you supply the SQL it needs to execute yourself using the session.CreateSQLQuery() method.
At any rate, the above line you posted:
Where(x => x.Rated == false) This creates sql that looks like: cdrcalltmp0_.Rated=#p2 order by cdrcalltmp0_.Created asc'
is completely valid. When SQL Server receives the parameterized query, it will use whatever index is on your Rated column.
If your DBA still doubts you, tell him to run the query in Sql Server Management Studio with the "Display Estimated Execution Plan" feature on. That will prove that the query is using the index.
The following SQL I am trying to run is returning sql_string of "SELECT id FROM people WHERE id IN ("16")":
#ids = ["1", "6"]
sql_string = <<-SQL
SELECT id
FROM people
WHERE id IN ("#{#ids}")
SQL
Can someone please help modify the above query so it will create the sql_string of "SELECT id FROM people WHERE id IN (1, 6)"
Just throwing #ids in the query will concatenate the array and give you "16". You'll want to run #ids.join(',') to comma separate them. Plus you need to wrap the expression part of the string in #{}. Otherwise it will treat it as literal.
#ids = ["1", "6"]
sql_string = <<-SQL
SELECT id
FROM people
WHERE id IN (#{#ids.join(',')})
SQL
P.S. There are very few valid reasons for manually writing a whole SQL query in Rails. You should look into using ActiveRecord to do something like People.find_all_by_id(#ids) instead.
With the code fragment in one of the answers above, "#ids" is not sanitised. This is fine if your code 'knows' that "#ids" contains only valid integer IDs, but very dangerous if any ID came in from user input or a URL. See:
http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M001831
...for a possible solution. This is a protected method so we have to call via 'send' to demonstrate its use at the console:
>> ActiveRecord::Base.send(:sanitize_sql_for_conditions, { :id => [1,6] }, :people)
=> "people.\"id\" IN (1,6)"
...i.e. insert the above result after the SQL WHERE keyword. As the previous answer says, unless you have a really complex case which can't be built up using standard Rails calls (which is indeed the case for Coderama but may not be for future readers), you should always try to avoid writing SQL by hand.
Bearing this in mind, an alternative way to build up complex queries is the "ez_where" plugin which is worth a look if anyone reading is thinking of resorting to SQL:
http://github.com/ezmobius/ez-where