Rails3: SQL execution with hash substitution like .where() - sql

With a simple model like that
class Model < ActiveRecord::Base
# ...
end
we can do queries like that
Model.where(["name = :name and updated_at >= :D", \
{ :D => (Date.today - 1.day).to_datetime, :name => "O'Connor" }])
Where the values in the hash will be substituted into the final SQL statement with proper escaping depending on the underlying database engine.
I would like to know a similar feature for SQL execution like:
ActiveRecord::Base.connection.execute( \
["update models set name = :name, hired_at = :D where id = :id;"], \
{ :id => 73465, :D => DateTime.now, :name => "O'My God" }] \
) # THIS CODE IS A FANTASY. NOT WORKING.
(Please do not solve the example with loading a Model object, modifying and then saving! The example is only an illustration for the feature I would like to have / know. Concentrate on the subject!)
The original problem is that I want to insert large amount (many thousand lines) of data into the database. I want to use some features of the SQL abstraction of the ActiveRecord framework but I don't want to use model objects based on ActiveRecord::Base because they are damn slow! (8 queries per second for my current problem.)

query = ActiveRecord::Base.connection.raw_connection.prepare("INSERT INTO users (name) VALUES(:name)")
query.execute(:name => 'test_name')
query.close

Extending the #peufeu solution with concrete code example for bulk insert:
users_places = []
users_values = []
timestamp = Time.now.strftime('%Y-%m-%d %H:%M:%S')
params[:users].each do |user|
users_places << "(?,?,?,?)"
users_values << user[:name] << user[:punch_line] << timestamp << timestamp
end
bulk_insert_users_sql_arr = ["INSERT INTO users (name, punch_line, created_at, updated_at) VALUES #{users_places.join(", ")}"] + users_values
begin
sql = ActiveRecord::Base.send(:sanitize_sql_array, bulk_insert_users_sql_arr)
ActiveRecord::Base.connection.execute(sql)
rescue
"something went wrong with the bulk insert sql query"
end
Here is the reference to sanitize_sql_array method in ActiveRecord::Base, it generates the proper query string by escaping the single quotes in the strings. For example the punch_line "Don't let them get you down" will become "Don\'t let them get you down".

Yes you could do raw SQL, but checkout the ar-extensions gem that helps with batch inserts:
https://github.com/zdennis/ar-extensions
Here's a post on it, and various other techniques:
http://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/

For INSERTs, batching them using a long VALUES clause (as shown by Simon's link) is the fastest way (unless you want to generate a text file and load it in your database with MySQL's LOAD DATA INFILE). But you have to be very careful about escaping your text values (which is not done in the example).
I was asking "what database are you using" because it does matter for mass UPDATEs.
For instance, you can do this on postgres (and I believe SQL Server changing "columnX" to "colX" ):
UPDATE foo
JOIN (VALUES (1,2),(3,4),... long list) v ON (foo.id=v.column1)
SET foo.bar = v.column2
And you can update a load of rows using a single statement, very fast.
If you don't need Ruby to perform some Ruby-specific magic on your data, the fastest way to transfer data from one DB to a different one is to export as a text file (CSV or tab separated), load it on the other DB (LOAD DATA INFILE on MySQL), perhaps in a temporary table, and bulk process using SQL.
EDIT : Here's how I do this in Python :
sql = [ "INSERT INTO foo (column list) VALUES " ]
values = []
for tuple in tuple_list:
append "(?,?,?,?)" to sql
extend values list with tuple
Then join sql into a string, you get "INSERT INTO foo (column list) VALUES (?,?,?,?),(?,?,?,?),(?,?,?,?)" with the "(?,?,?,?)" repeated as many times as you have lines to insert.
Then "values" contains a list of (a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3) with an,bn,cn,dn being the tuples you want to insert for line n. Each one corresponds to a placeholder in the sql string.
Then pass this to the usual "execute query with parameters" function which will handle quoting and escaping as usual.

I encountered a similar issue recently when tying to insert 100K+ records into a MySQL database for a Rails 4 app using mysql2 gem. The data included characters that had to be sanitized prior to insert.
The solution I ended going with was a slightly modified version of Option 3 described at https://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/
Here's the relevant code block from the above link:
TIMES = 10000
inserts = []
TIMES.times do
inserts.push "(3.0, '2009-01-23 20:21:13', 2, 1)"
end
sql = "INSERT INTO user_node_scores (`score`, `updated_at`, `node_id`, `user_id`) VALUES #{inserts.join(", ")}"
The modification I made was using the public method ActiveRecord::Base.sanitize() on values that required it.
inserts = []
created = Time.now.strftime "%Y-%m-%d %H:%M:%S"
params[:audits].each do |audit|
inserts.push "(#{audit.user_id), #{created}," + ActiveRecord::Base.sanitize(audit.comment) + ", #{audit.status})"
end
sql = "INSERT INTO user_audits (`user_id`, `created_at`, `comment`, `status`) VALUES #{inserts.join(", ")}"

Related

SQL statement to convert jsonb hash to json string

I have a rails data migration (postgres db) where I have to use pure sql to convert the data due to some model restrictions. The data is stored as json as a string, but I need it to be a usable hash for other purposes.
My migration works to convert it to the hash. However, my down method ends up just deleting the data or leaving it as an empty {}. Btw to clear up any confusion, my column name is actually saved as data in table Games
Based on my up method, how would i properly reverse the migration using sql only?
class ConvertGamesDataToJson < ActiveRecord::Migration[6.0]
def up
statement = <<~SQL
update games set data = regexp_replace(trim(both '"' from data::text), '\\\\"', '"', 'g')::jsonb;
SQL
ActiveRecord::Base.connection.execute(statement)
# this part works!
end
def down
statement = <<~SQL
update games set data = to_json(data::text)::jsonb;
SQL
ActiveRecord::Base.connection.execute(statement)
end
end
Here is how the it looks after properly converting it
data: {
"id"=>"d092a-f2323",
"recent"=>'yes',
"note"=>"some text",
"order"=>1
}
how it is before the migration and what it needs to rollback to:
data:
"{
\"id\":\"d092a-f2323\",
\"recent\":\"yes\",
\"note\":\"some text\",
\"order\":1,
}"
If you're displaying a data structure in the rails console, those \" aren't really there. They're just formatting because the console has wrapped the string in ". For example...
[2] pry(main)> %{"up": "down"}
=> "\"up\": \"down\""
But if we print it...
[3] pry(main)> puts %{"up": "down"}
"up": "down"
Given that is a JSON string, you can simply change the type of the column to jsonb and be done with it.
-- up
alter table games alter column data type jsonb USING data::jsonb;
-- down
alter table games alter column data type text;
Postgres doesn't know how to automatically cast text to jsonb, so we need to tell it. using data::jsonb does a simple cast of the text to jsonb. It can cast jsonb to text just fine.
You can do this in a migration with change_column.
def up
change_column :users, :data, :jsonb, using: 'data::jsonb'
end
def down
change_column :users, :data, :text
end

SQL injections in Rails 4 issue

I'm trying to learn about SQL injections and have tried to implement these, but when I put this code in my controller:
params[:username] = "johndoe') OR admin = 't' --"
#user_query = User.find(:first, :conditions => "username = '#{params[:username]}'")
I get the following error:
Couldn't find all Users with 'id': (first, {:conditions=>"username = 'johndoe') OR admin = 't' --'"}) (found 0 results, but was looking for 2)
I have created a User Model with the username "johndoe", but I am still getting no proper response. BTW I am using Rails 4.
You're using an ancient Rails syntax. Don't use
find(:first, :condition => <condition>) ...
Instead use
User.where(<condtion>).first
find accepts a list of IDs to lookup records for. You're giving it an ID of :first and an ID of condition: ..., which aren't going to match any records.
User.where(attr1: value, attr2: value2)
or for single items
User.find_by(attr1: value, attr2: value)
Bear in mind that while doing all this, it would be valuable to check what the actual sql statement is by adding "to_sql" to the end of the query method (From what I remember, find_by just does a LIMIT by 1)

Doctrine2 fetch Count more optimized and faster way Or Zf2 library

I am using Doctrine2 and Zf2 , now when I need to fetch count of rows, I have got the following two ways to fetch it. But my worry is which will be more optimized and faster way, as in future the rows would be more than 50k. Any suggestions or any other ways to fetch the count ?? Is there any function to get count which can be used with findBy ???
Or should I use normal Zf2 Database library to fetch count. I just found that ORM is not preferred to fetch results when data is huge. Please any help would be appreciated
$members = $this->getEntityManager()->getRepository('User\Entity\Members')->findBy(array('id' => $id, 'status' => '1'));
$membersCnt = sizeof($members);
or
$qb = $this->getEntityManager()->createQueryBuilder();
$qb->select('count(p)')
->from('User\Entity\Members', 'p')
->where('p.id = '.$id)
->andWhere('p.status = 1');
$membersCnt = $qb->getQuery()->getSingleScalarResult();
Comparison
1) Your EntityRepository::findBy() approach will do this:
Query the database for the rows matching your criteria. The database will return the complete rows.
The database result is then transformed (hydrated) into full PHP objects (entities).
2) Your EntityManager::createQueryBuilder() approach will do this:
Query the database for the number of rows matching your criteria. The database will return a simple number (actually a string representing a number).
The database result is then transformed from a string to a PHP integer.
You can safely conclude that option 2 is far more efficient than option 1:
The database can optimize the query for counting, which might make the query faster (take less time).
Far less data is returned from the database.
No entities are hydrated (only a simple string to integer cast).
All in all less processing power and less memory will be used.
Security comment
Never concatenate values into a query!
This can make you vulnerable to SQL injection attacks when those values are (derived from) user-input.
Also, Doctrine2 can't make use of prepared statements / parameter binding, which can lead to some performance-loss when the same query is used often (with or without different parameters).
In other words, replace this:
->where('p.id = '.$id)
->andWhere('p.status = 1')
with this:
->where('p.id = :id')
->andWhere('p.status = :status')
->setParameters(array('id' => $id, 'status' => 1))
or:
->where($qb->expr()->andX(
$qb->expr()->eq('p.id', ':id'),
$qb->expr()->eq('p.status', ':status')
)
->setParameters(array('id' => $id, 'status' => 1))
Additionally
For this particular query, there's no need to use the QueryBuilder, you can use straight DQL in stead:
$dql = 'SELECT COUNT(p) FROM User\Entity\Members p WHERE p.id = :id AND p.status = :status';
$q = $this->getEntityManager()->createQuery($dql);
$q->setParameters(array('id' => $id, 'status' => 1));
$membersCnt = $q->getSingleScalarResult();
You should totally go to the dql version of the count.
With the first method you will hydrate (convert from db resultset to objects) each of the rows as single object and put them on one array and then count the amount items in that array. That will be a totally waste of memory and cycles if the only objective is to know the number of elements in that result set.
With the second method the dql will be gracefully converted to SELECT COUNT(*) Blah blah blah
plain SQL sentence and will retrieve directly the count from db.
The comment about ORM is not preferred to when to retrieve data is huge is true, in big batch process you should paginate your query to retrieve data instead all at the same time to avoid memory overrides but in that case you are only retrieving a single number, the total count so this rule doesn’t apply.
Query builder is so slow .
Use DQL for faster select .
$query = $this->getEntityManager()->createQuery("SELECT count(m) FROM User\Entity\Members m WHERE m.status = 1 AND m.id = :id ");
$query->setParameter(':id', $id);
You need setParameter for prevent SQL injection .
Stored procedure is fastest but it depend on your DB .
Make all relations of entity Lazy.

How to Write a Where Clause to Handle Either a Specific Value or Any Value

I have a report with a table in Rails where users can optionally set filters like selecting a location or picking a range of dates and update the table via an ajax request.
Can I write this where clause so that it any date/blanks or all locations?
#orders = Order.where('created_at <= ? AND ? <= created_at AND location_id = ?', date_order_start, date_order_end, loc_filter)
The query above fails on blanks (e.g., "") and if I put nils they translate to nulls in the SQL.
To solve this problem right now I have a bunch of conditional statements that check whether value is present in the ajax request and then creates a different where clause depending on the case. My current conditionals are unwieldy, error prone and not scalable.
Searches on things like "wildcard sql" end up leading me to text searches (i.e., %) which I don't think fits in this case.
I am running on Rails 3.2 with postgresql.
I sometimes use an array of query statements and arguments like this:
queries = []
args = []
if some_condition
queries.push("created_at <= ?")
args.push(whatever_date)
end
if another_condition
queries.push("created_at >= ?")
args.push(another_date)
end
#order = Order.where(queries.join(" AND "), *args)

doctrine native sql not accepting parameter list

I'm trying to do native SQL in Doctrine. Basically I have 2 parameters:
CANDIDATE_ID - user for who we delete entries,
list of FILE_ID to keep
So I make
$this->getEntityManager()->getConnection()->
executeUpdate( "DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN :KEEPID",
array(
"ID" => $candidate->id,
"KEEPID" => array(2) )
);
But Doctrine fails:
Notice: Array to string conversion in D:\xampp\htdocs\azk\vendor\doctrine\dbal\lib\Doctrine\DBAL\Connection.php on line 786
Is this bug in Doctrine? I'm making somewhere else select with IN but with QueryBuilder and it's working. Maybe someone could suggest better way of deleting entries, with QueryBuilder for example?
$stmt = $conn->executeQuery('SELECT * FROM articles WHERE id IN (?)',
array(array(1, 2, 3, 4, 5, 6)),
array(\Doctrine\DBAL\Connection::PARAM_INT_ARRAY)
);
From Doctrine's documentation.
You can't pass an array of IDs to a parameter. You can do this for scalar values, but even if this had a 'toString', it wouldn't be what you want.
String concatenation is one method,
"DELETE FROM FILE WHERE CANDIDATE_ID = :ID AND NOT ID IN (". implode(",", $list_of_ids) .")"
But this method goes straight around parameters, and therefore suffers in terms of readability, and is limited to a certain maximum line length, which can vary between databases.
Another approach is to write a function returning a table result, which takes a string of IDs as a parameter.
You could also solve this with a join to a table containing the IDs to keep.
It's a problem I've seen many times with few good answers, but it's usually caused by a misunderstanding in the way the database is modelled. This is a 'code smell' for database access.