Reading/Writting from/to same table/DB in SQLite from multiple nodes - sql

I used R to create a SQLite database and now I want to read from it (in parallel from multiple cores that have access to the same sqlite DB) multiple times and write to another DB multiple times, ~ 1,000 or more times in parallel. However, when I try doing such operation I get the following error:
sqliteFetch(rs, n = -1, ...) :
RSQLite driver: (RS_SQLite_fetch: failed first step: database is locked)
In my script I am running the following two commands that I think are giving the error (not sure if it comes from the reading or the writing):
dbGetQuery(db1, sql.query)
# later on...
if(dbExistsTable(db2, table.name){
dbWriteTable(db2, table.name, my.df, append = T)
} else {
dbWriteTable(db2, table.name, my.df)
}
Do you know if such operation is possible? If so, anyway to do it and avoid such error? I asked this question before and I was refered to ACID design of Databases, which make me think such operation should be possible but somehow it's not working.
I am also open to suggestions, such as oh you can use MySQL to do that, should work better, etc.
Thanks!

Related

Invoking a large set of SQL from a Rails 4 application

I have a Rails 4 application that I use in conjunction with sidekiq to run asynchronous jobs. One of the jobs I normally run outside of my Rails application is a large set of complex SQL queries that cannot really be modeled by ActiveRecord. The connection this set of SQL queries has with my Rails app is that it should be executed anytime one of my controller actions is invoked.
Ideally, I'd queue a job from my Rails application within the controller for Sidekiq to go ahead and run the queries. Right now they're stored in an external file, and I'm not entirely sure what the best way is to have Rails run the said SQL.
Any solutions are appreciated.
I agree with Sharagoz, if you just need to run a specific query, the best way is to send the query string directly into the connection, like:
ActiveRecord::Base.connection.execute(File.read("myquery.sql"))
If the query is not static and you have to compose it, I would use Arel, it's already present in Rails 4.x:
https://github.com/rails/arel
You didn't say what database you are using, so I'm going to assume MySQL.
You could shell out to the mysql binary to do the work:
result = `mysql -u #{user} --password #{password} #{database} < #{huge_sql_filename}`
Or use ActiveRecord::Base.connection.execute(File.read("huge.sql")), but it won't work out of the box if you have multiple SQL statements in your SQL file.
In order to run multiple statements you will need to create an initializer that monkey patches the ActiveRecord::Base.mysql2_connection to allow setting MySQL's CLIENT_MULTI_STATEMENTS and CLIENT_MULTI_RESULTS flags.
Create a new initializer config/initializers/mysql2.rb
module ActiveRecord
class Base
# Overriding ActiveRecord::Base.mysql2_connection
# method to allow passing options from database.yml
#
# Example of database.yml
#
# login: &login
# socket: /tmp/mysql.sock
# adapter: mysql2
# host: localhost
# encoding: utf8
# flags: 131072
#
# #param [Hash] config hash that you define in your
# database.yml
# #return [Mysql2Adapter] new MySQL adapter object
#
def self.mysql2_connection(config)
config[:username] = 'root' if config[:username].nil?
if Mysql2::Client.const_defined? :FOUND_ROWS
config[:flags] = config[:flags] ? config[:flags] | Mysql2::Client::FOUND_ROWS : Mysql2::Client::FOUND_ROWS
end
client = Mysql2::Client.new(config.symbolize_keys)
options = [config[:host], config[:username], config[:password], config[:database], config[:port], config[:socket], 0]
ConnectionAdapters::Mysql2Adapter.new(client, logger, options, config)
end
end
end
Then update config/database.yml to add flags:
development:
adapter: mysql2
database: app_development
username: user
password: password
flags: <%= 65536 | 131072 %>
I just tested this on Rails 4.1 and it works great.
Source: http://www.spectator.in/2011/03/12/rails2-mysql2-and-stored-procedures/
Executing one query is - as outlined by other people - quite simply done through
ActiveRecord::Base.connection.execute("SELECT COUNT(*) FROM users")
You are talking about a 20.000 line sql script of multiple queries. Assuming you have the file somewhat under control, you can extract the individual queries from it.
script = Rails.root.join("lib").join("script.sql").read # ah, Pathnames
# this needs to match the delimiter of your queries
STATEMENT_SEPARATOR = ";\n\n"
ActiveRecord::Base.transaction do
script.split(STATEMENT_SEPARATOR).each do |stmt|
ActiveRecord::Base.connection.execute(stmt)
end
end
If you're lucky, then the query delimiter could be ";\n\n", but this depends of course on your script. We had in another example "\x0" as delimiter. The point is that you split the script into queries to send them to the database. I wrapped it in a transaction, to let the database know that there is coming more than one statement. The block commits when no exception is raised while sending the script-queries.
If you do not have the script-file under control, start talking to those who control it to get a reliable delimiter. If it's not under your control and you cannot talk to the one who controls it, you wouldn't execute it, I guess :-).
UPDATE
This is a generic way to solve this. For PostgreSQL, you don't need to split the statements manually. You can just send them all at once via execute. For MySQL, there seem to be solutions to get the adapter into a CLIENT_MULTI_STATEMENTS mode.
If you want to execute raw SQL through active record you can use this API:
ActiveRecord::Base.connection.execute("SELECT COUNT(*) FROM users")
If you are running big SQL every time, i suggest you to create a sql view for it. It be boost the execution time. The other thing is, if possible try to split all those SQL query in such a way that it will be executed parallely instead of sequentially and then push it to sidekiq queue.
You have to use ActiveRecord::Base.connection.execute or ModelClass.find_by_sql to run custom SQL.
Also, put an eye on ROLLBACK transactions, you will find many places where you dont need such ROLLBACK feature. If you avoid that, the query will run faster but it is dangerous.
Thanks all i can suggest.
use available database tools to handle the complex queries, such as views, stored procedures etc and call them as other people already suggested (ActiveRecord::Base.connection.execute and ModelClass.find_by_sql for example)- it might very well cut down significantly on query preparation time in the DB and make your code easier to handle
http://dev.mysql.com/doc/refman/5.0/en/create-view.html
http://dev.mysql.com/doc/connector-cpp/en/connector-cpp-tutorials-stored-routines-statements.html
abstract your query input parameters into a hash so you can pass it on to sidekiq, don't send SQL strings as this will probably degrade performance (due to query preparation time) and make your life more complicated due to funny SQL driver parsing bugs
run your complex queries in a dedicated named queue and set concurrency to such a value that will prevent your database of getting overwhelmed by the queries as they smell like they could be pretty db heavy
https://github.com/mperham/sidekiq/wiki/API
https://github.com/mperham/sidekiq/wiki/Advanced-Options
have a look at Squeel, its a great addition to AR, it might be able to pull some of the things you are doing
https://github.com/activerecord-hackery/squeel
http://railscasts.com/episodes/354-squeel
I'll assume you use MySQL for now, but your mileage will vary depending on the DB type that you use. For example, Oracle has some good gems for handling stored procedures, views etc, for example https://github.com/rsim/ruby-plsql
Let me know if some of this stuff doesn't fit your use case and I'll expand
I see this post is kind of old. But I would like to add my solution to it. I was in a similar situation; I also needed a way to force feed "PRAGMA foreign_keys = on;" into my sqlite connection (I could not find a previous post that spelled it out how to do it.) Anywho, this worked like a charm for me. It allowed me to write "pretty" sql and still get it executed. Blank lines are ignored by the if statement.
conn = ActiveRecord::Base.establish_connection(adapter:'sqlite3',database:DB_NAME)
sqls = File.read(DDL_NAME).split(';')
sqls.each {|sql| conn.connection.execute(sql<<';') unless sql.strip.size == 0 }
conn.connection.execute('PRAGMA foreign_keys = on;')
I had the same problem with a set of sql statements that I needed to execute all in one call to the server. What worked for me was to set up an initializer for Mysql2 adapter (as explained in infused answer) but also do some extra work to process multiple results. A direct call to ActiveRecord::Base.connection.executewould only retrieve the first result and issue an Internal Error.
My solution was to get the Mysql2 adapter and work directly with it:
client = ActiveRecord::Base.connection.raw_connection
Then, as explained here, execute the query and loop through the results:
client.query(multiple_stms_query)
while client.next_result
result = client.store_result
# do something with it ...
end

Oracle error: Application failover does not support non-single-SELECT statement

We are getting the following error using Oracle:
[Oracle JDBC Driver]Application failover does not support non-single-SELECT statement
The error occurs when we try to make a delete or insert over a large number of rows (tens of millions of rows).
I know that the script works, because it was working for almost an year before these error messages start to pop.
We know that no one change any database configuration, so we figure out that the problem must be on the volume of processed data (row number is growing as time goes by...).
But we never see that kind of error before! What does it means? It seems that a failover engine tries to recover from an error, but when oracle is 'taken over' by this engine, it enter in a more restricted state, where some kinds of queries does not work (like Windows Safe Mode...)
Well, if this is what is happening, how can I get the real error message? The one that trigger the failover mechanism?
BTW, below is one of the deletes that triggers the error:
delete from odf_ca_rnv_av_snapshot_week
(we tried this one just to test the simplest delete we could think of... a truncate won't help us with the real deal :) )
check this link
the error seems to come not from Oracle or JDBC, but from "progress". It means that it can only recover from SELECT statements and not from DML.
You'll have to figure out why the failover occurs in the first place.

What problems may occur while querying SQL databases with big amount of data over internet

I am having this big database on one MSSQL server that contains data indexed by a web crawler.
Every day I want to update SOLR SearchEngine Index using DataImportHandler which is situated in another server and another network.
Solr DataImportHandler uses query to get data from SQL. For example this query
SELECT * FROM DB.Table WHERE DateModified > Config.LastUpdateDate
The ImportHandler does 8 selects of this types. Each select will get arround 1000 rows from database.
To connect to SQL SERVER i am using com.microsoft.sqlserver.jdbc.SQLServerDriver
The parameters I can add for connection are:
responseBuffering="adaptive/all"
batchSize="integer"
So my question is:
What can go wrong while doing this queries every day ? ( except network errors )
I want to know how is SQL Server working in this context ?
Further more I have to take a decicion regarding the way I will implement this importing and how to handle errors, but first I need to know what errors can arise.
Thanks!
Later edit
My problem is that I don't know how can this SQL Queries fail. When i am calling this importer every day it does 10 queries to the database. If 5th query fails I have to options:
rollback the entire transaction and do it again, or commit the data I got from the first 4 queries and redo somehow the queries 5 to 10. But if this queries always fails, because of some other problems, I need to think another way to import this data.
Can this sql queries over internet fail because of timeout operations or something like this?
The only problem i identified after working with this type of import is:
Network problem - If the network connection fails: in this case SOLR is rolling back any changes and the commit doesn't take place. In my program I identify this as an error and don't log the changes in the database.
Thanks #GuidEmpty for providing his comment and clarifying out this for me.
There could be issues with permissions (not sure if you control these).
Might be a good idea to catch exceptions you can think of and include a catch all (Exception exp).
Then take the overall one as a worst case and roll-back (where you can) and log the exception to include later on.
You don't say what types you are selecting either, keep in mind text/blob can take a lot more space and could cause issues internally if you buffer any data etc.
Though just a quick re-read and you don't need to roll-back if you are only selecting.
I think you would be better having a think about what you are hoping to achieve and whether knowing all possible problems will help?
HTH

is there a maximum number of inserts that can be run in a batch sql script?

I have a series of simple "Insert INTO" type statements but after running about 3 or 4 of them the script stops and i get empty sets when i try selecting from the appropriate tables....aside from my specific code...i wonder whether there is an ideal way of running multiple insert type queries.
Right now i just have a txt file saved as a.sql with normal sql commands separated by ";"
No, there is not. however, if it stops after 3 or 4 inserts, it's a good bet there's an error in the 3rd or 4th insert. Depending on which SQL engine you use, there are different ways of making it report errors during and after operations.
Additionally, if you have lots of inserts, it's a good idea to wrap them inside a transaction - this basically buffers all the insert commands until it sees the end command for the transaction, and then commit everything to your table. That way, if something goes wrong, your database doesn't get polluted with data that needs to first be deleted again. More importantly, every insert without a transaction counts as a single transaction, which makes them really slow - Doing 100 inserts inside a transaction can be as fast as doing two or three normal inserts.
Maximum Capacity Specifications for SQL Server
Max Batch size = 65,536 * Network Packet Size
However I doubt that Max Batch size is your problem.

Problem with a MS Access query after a "Compact and repair" operation

I have an Access application that use the classical front-end/back-end approach. Yesterday, the backend got corrupted for a reason I don't know. So I opened the backend with Access 2003 and access asked me if I wanted to repair the file, I said yes and it seemed to work.
I can open the database see the tables contents and run most of the queries.
However there is an access query that doesn't work with a specific where clause.
Example :
// This works in the original DB, but not in the compacted one :
SELECT a, b, c
FROM tbl1 INNER JOIN tbl2 ON tbl1.d = tbl2.d
WHERE e = 3 AND tbl2.f = 1;
// This works in both the original and the compacted one :
SELECT a, b, c
FROM tbl1 INNER JOIN tbl2 ON tbl1.d = tbl2.d
WHERE e = 3;
When I try to run the queries, nothing happens. The access process start to use most of the CPU and the GUI stop responding. If I run the query from the query editor, I can use Ctrl+Break to stop the execution. I tried to give the query lot of time and it didn't help.
I've checked the execution plan in showplan.out and it seems correct (at least it should not takes forever to execute)
I tried to compact the DB again. I tried to import the tables in a new DB. I even tried to import the tables and their data in a mdb file that was in a now good state (from a backup).
Anyone have an idea?
Sounds like an index was corrupted and when that happens, it's dropped during the compact. Check for a system table called MSysCompactErrors -- you'll have to show hidden objects and/or system objects in Tools | Options | VIEW.
Never compact a Jet MDB without making a backup beforehand. Because of that rule, the COMPACT ON CLOSE function is completely useless, as it's not cancellable, so you always make sure it's turned off in all MDBs.
I don't know what type of meta data Access brings along when it imports a table from one database into another one. If the meta data is corrupted, importing the table to another database wouldn't necessarily resolve the problem. If practical, you might try creating the tables from scratch in a brand new database and then just exporting and importing (or copying and paste appending) the data into the new database.
I've never seen a table get corrupted like this in such a small database, although with Access anything is possible. Could there be something wrong with the data?
I'd try recreating the query fresh (new name, etc.), and see what happens.
You could even try copying it (even within the same DB or to a brand new one). If that works, the worst case scenario is you have to copy all the objects across to a new DB.
Is there an index on the field tbl2.f?
Also try going into that table in datasheet view, sort tbl2.f in ascending sequence and see if there is anything really strange in the first or last records.
Do you have access to a SQL Server installation? You could use the Upsizing Wizard under the Tools -> Database Utilities menu to copy the data to SQL Server, and see if you get the same problem there.