How do I completely clear a SQLite3 database without deleting the database file? - sql

For unit testing purposes I need to completely reset/clear SQLite3 databases. All databases are created in memory rather than on the file system when running the test suite so I can't delete any files. Additionally, several instances of a class will be referencing the database simultaneously, so I can't just create a new database in memory and assign it to a variable.
Currently my workaround for clearing a database is to read all the table names from sqlite_master and drop them. This is not the same as completely clearing the database though, since meta data and other things I don't understand will probably remain.
Is there a clean and simple way, like a single query, to clear a SQLite3 database? If not, what would have to be done to an existing database to make it identical to a completely new database?
In case it's relevant, I'm using Ruby 2.0.0 with sqlite3-ruby version 1.3.7 and SQLite3 version 3.8.2.

This works without deleting the file and without closing the db connection:
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master;
PRAGMA writable_schema = 0;
VACUUM;
PRAGMA integrity_check;
Another option, if possible to call the C API directly, is by using the SQLITE_DBCONFIG_RESET_DATABASE:
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);
sqlite3_exec(db, "VACUUM", 0, 0, 0);
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);
Here is the reference

The simple and quick way
If you use in-memory database, the fastest and most reliable way is to close and re-establish sqlite connection. It flushes any database data and also per-connection settings.
If you want to have some kind of "reset" function, you must assume that no other threads can interrupt that function - otherwise any method will fail. Therefore even you have multiple threads working on that database, there need to be a "stop the world" mutex (or something like that), so the reset can be performed. While you have exclusive access to the database connection - why not closing and re-opening it?
The hard way
If there are some other limitations and you cannot do it the way above, then you were already pretty close to have a complete solution. If your threads don't touch pragmas explicitly, then only "schema_version" pragma can be changed silently, but if your threads can change pragmas, well, then you have to go through the list on http://sqlite.org/pragma.html#toc and write "reset" function which will set each and every pragma value to it's initial value (you need to read default values at the begining).
Note, that pragmas in SQLite can be divided to 3 groups:
defined initially, immutable, or very limited mutability
defined dynamically, per connection, mutable
defined dynamically, per database, mutable
Group 1 are for example page_size, page_count, encoding, etc. Those are definied at database creation moment and usualle cannot be modified later, with some exceptions. For example page_size can be changed prior to "VACUUM", so the new page size will be set then. The page_count cannot be changed by user, but it changes automatically when adding data (obviously). The encoding is defined at creation time and cannot be modified later.
You should not need to reset pragmas from group 1.
Group 2 are for example cache_size, recursive_triggers, jurnal_mode, foreign_keys, busy_timeout, etc. These pragmas are always set to defaults when opening new connection to the database. If you don't disconnect, you will need to reset those to defaults manually.
Group 3 are for example schema_version, user_version, maybe some others, you need to look it up. Those will also need manual reset. If you disconnect from in-memory database, the database gets destroyed, so then you don't need to reset those.

Create an empty memory database.
Use the backup API to copy that database over the actual database.
In the case of sqlite3-ruby, see test/test_backup.rb for an example.

SELECT * FROM dbname.sqlite_master WHERE type='table';
and
DROP TABLE

Related

How to force Nextflow process to recalculate and ignore cache in resumed workflow

I have a series of processes in nextflow pipeline, employing multiple heavy computing steps and database (SQL) insertion/fetch. I need to insert certain (intermediate) process results to the DB and fetch them later for further processing (within the same pipeline). In the most simplified form it will be something like:
process1 (fetch data from DB)
process2 (analyze process1.out)
process3 (inserts process2.out to DB)
The problem is, that when any values are changed in the DB, output from process1 is still cached (when using -resume flag), so changes in DB are not reflected here at all.
Is there any way to force reprocessing process1 while using -resume and ignore cache?
So far, I was manually deleting respective work folder, or adding dummy line to process1, but that is extremely ineffective solution.
Thanks for any help here.
Result caching is enable by default, but this feature can be disabled using the cache directive by setting the value to false. For example:
process process1 {
cache false
...
}
Not sure if we have the full picture here, but updating a database with some set of process results just to fetch them again later on seems wasteful. Or maybe I've just misunderstood. I would instead try to separate the heavy computational work (hours) from the database transactions (minutes) if at all possible. Note that if you need to make per process database transactions, you might be able to achieve this using the beforeScript and afterScript directives (which can be enabled/disabled using a nextflow.config profile, for example). For example, a beforeScript could be used to create a database object that is updated (using an afterScript) once the process has completed. Since both of these scripts are run from inside the workDir, you could use the basename of the current/working directory (i.e. the task UUID) as a key.

Using H2 1.4 database can I write new rows if reading other rows

Using H2 1.4 database can I write new rows if reading other rows?
i.e if have 1000 rows in table, and have a SELECT query running that is getting primary key 1-10 would it be possible for an INSERT query to insert some new rows at same time, or would it have to wait for (all) the SELECT query on that table to finish?
What is the situation with an UPDATE of rows in table table but not being retrieved by any SELECT query?
I ask because with H2 1.3 I noticed that my application threads that accessed database seemed to spend a lot of time blocking, it seems better now I have upgraded to 1.4. But in my application that is multithreaded the threads are always dealing with different rows so it is important for me to better understanding how locking works in H2 (with the MV store, was previously using PAGE store with 1.3), and whether H2 can just lock individual rows when UPDATING or if it has to lock whole table.
It depends on storage engine that you choose. All information below applies to the most recent version (1.4.199), old versions have some differences.
With default MVStore engine data modification operations and SELECT … FOR UPDATE lock modified (or selected) rows. Other transactions can't modify locked rows in parallel, but can read their values. Note that read committed isolation level is used by default and other isolation levels are not really supported by this engine. With read committed isolation level other transactions will not see the concurrently modified values, they will see old ones. New values will be visible only when that transaction commits its work. With this engine database runs in multi-threaded mode by default, so a long-running command will not block other sessions.
With legacy PageStore engine (add ;MV_STORE=FALSE to the connection URL if you want to create a database with this engine) the whole tables are locked for writing. It means that you really need to lock the tables in the same order (alphabetical or some other) in all your transactions, otherwise a deadlock is possible. With this engine database runs in single-threaded mode by default, you can enable multi-threaded mode explicitly, but it is not safe with this engine. Different sessions can't do their work concurrently, long-running command will block all other sessions.
Databases are not converted from old (PageStore) format to a new (MVStore) format when you open them with a new version of H2, you have to do it by yourself. Also old databases may have serious problems with new versions, it's recommended to export them to SQL with old version of H2 using the SCRIPT TO 'filename.sql' command and load this script into new database with a new version of H2 using the RUNSCRIPT FROM 'filename.sql' command. You need to do it even if you choose to use the old engine. If you have persistent databases don't forget to create regular backup copies (with BACKUP TO 'filename.zip' command, for example).
You can find more details in the documentation:
https://h2database.com/html/advanced.html#mvcc
https://h2database.com/html/features.html#multiple_connections

handling long running large transactions with perl dbi

I've got a large transaction comprising of getting lots of data from database A, do some manipulations with this data, then inserting the manipulated data into database B. I've only got permissions to select in database A but I can create tables and insert/update etc in database B.
The manipulation and insertion part is written in perl and already in use for loading data into database B from other data sources, so all that's required is to get the necessary data from database A and using it to initialize the perl classes.
How can I go about doing this so I can easily track back and pick up from where the error happened if any error occurs during the manipulation or insertion procedures (database disconnection, problems with class initialization because of invalid values, hard disk failure etc...)? Doing the transaction in one go doesn't seem like a good option because the amount data from database A means it would take at least a day or 2 for data manipulation and insertion into database B.
The data from database A can be grouped into around 1000 groups using unique keys, with each key containing 1000s of rows each. One way I thought I could do is to write a script that does commits per group, meaning I've got to track which group has already been inserted into database B. The only way I can think of to track the progress of which groups have been processed or not is either in a log file or in a table in database B. A second way I thought could work is to dump all the necessary fields needed for loading the classes for manipulation and insertion into a flatfile, read the file to initialize the classes and insert into database B. This also means that I got to do some logging, but should narrow it down to the exact row in the flatfile if any error occurs. The script will look something like this:
use strict;
use warnings;
use DBI;
#connect to database A
my $dbh = DBI->connect('dbi:oracle:my_db', $user, $password, { RaiseError => 1, AutoCommit => 0 });
#statement to get data based on group unique key
my $sth = $dbh->prepare($my_sql);
my #groups; #I have a list of this already
open my $fh, '>>', 'my_logfile' or die "can't open logfile $!";
eval {
foreach my $g (#groups){
#subroutine to check if group has already been processed, either from log file or from database table
next if is_processed($g);
$sth->execute($g);
my $data = $sth->fetchall_arrayref;
#manipulate $data, then use it to load perl classes for insertion into database B
#.
#.
#.
}
print $fh "$g\n";
};
if ($#){
$dbh->rollback;
die "something wrong...rollback";
}
So if any errors do occur, I can just run this script again and it should skip the groups or rows that have been processed and continue.
Both these methods is just variations on the same theme, and both require going back to where I've been tracking my progress (in table or file), skip the ones that've been commited to database B and process the remaining data.
I'm sure there's a better way of doing this but am struggling to think of other solutions. Is there another way of handling large transactions between databases that require data manipulation between getting data out from one and inserting into another? The process doesn't need to be all in Perl, as long as I can reuse the perl classes for manipulating and inserting the data into the database.
Sorry to say so but I really don't see how you could possibly solve this problem by taking a short cut. To me it sounds like you've though about the most reasonable ways:
Save the state in some temp table/file (I'd look into "perldoc -f tie", or sqlite) at each step
Handle errors properly TryCatch.pm, eval or whatever you prefer
Log your errors properly, i.e. structured logs you can read in
Add some "resume" flag to your script which reads in previous log and data and tries again
This is probably along the lines you've been thinking, but as I said, I don't think there's a general "right" way to handle your problem.

HSQLDB clear table data after restart

I want to save some temporary data in memory, which should be removed after server shuts down.
There is a temporary table in HSQLDB, but the data is removed immediately after transaction committed, which is too short for me. On the other side, the memory table keeps a script log file and resume the data when server new starts. It takes time and place to maintain such script logs, which are useless for my situation.
what I need is just a type of table, only the table structure is persistent in hard disk, the data and the data operations should only be performed in memory. Otherwise why do I need a in-memory DB instead of mysql?
Is there such type of table in HSQLDB?
thanks
Create the file: database, then create the tables. Perform SHUTDOWN. Edit the .properties file for the database, add the setting below and save.
files_readonly=true
When you perform your tests with this database, no data is written to disk.
Alternatively, with the latest versions of HSQLDB 2.2.x, you can can specify this property on the connection URL during the tests. For example
jdbc:hsqldb:file:myfilepath;files_readonly=true

Doctrine schema changes while keeping data?

We're developing a Doctrine backed website using YAML to define our schema. Our schema changes regularly (including fk relations) so we need to do a lot of:
Doctrine::generateModelsFromYaml(APPPATH . 'models/yaml', APPPATH . 'models', array('generateTableClasses' => true));
Doctrine::dropDatabases();
Doctrine::createDatabases();
Doctrine::createTablesFromModels();
We would like to keep existing data and store it back in the re-created database. So I copy the data into a temporary database before the main db is dropped.
How do I get the data from the "old-scheme DB copy" to the "new-scheme DB"? (the new scheme only contains NEW columns, NO COLUMNS ARE REMOVED)
NOTE:
This obviously doesn't work because the column count doesn't match.
SELECT * FROM copy.Table INTO newscheme.Table
This obviously does work, however this is consuming too much time to write for every table:
SELECT old.col, old.col2, old.col3,'somenewdefaultvalue' FROM copy.Table as old INTO newscheme.Table
Have you looked into Migrations? They allow you to alter your database schema in programmatical way. WIthout losing data (unless you remove colums, of course)
How about writing a script (using the Doctrine classes for example) which parses the yaml schema files (both the previous version and the "next" version) and generates the sql scripts to run? It would be a one-time job and not require that much work. The benefit of generating manual migration scripts is that you can easily store them in the version control system and replay version steps later on. If that's not something you need, you can just gather up changes in the code and do it directly through the database driver.
Of course, the more fancy your schema changes becomes, the harder the maintenance will get i.e. column name changes, null to not null etc.