HSQLDB + SQuirreL: reading data by block - hsqldb

I'm running an instance of HSQLDB from inside a Java class: an instance of org.hsqldb.Server is initialized and set to be only in memory, no other configuration; then, it's used filling it with data accessible from outside the running jvm.
Using SQuirreL set to "Read on, Block size", I connect to HSQLDB server and query for data: it seems like all returning rows from the query are loaded in client memory and then displayed by block size. Instead, using Oracle (by example) I see the client downloading only displayed rows, others are downloaded only when the list is scrolled down. Is it possible to force HSQLDB client to act in the same way?

The query is performed using a java.sql.Statement object. This has a setFetchSize(n) method that indicates the number of rows that are fetched at one time. HSQLDB supports this when it is used in Server mode. It returns the rows in chunks containing the indicated fetch size.
The application program, in this case SQuirrel, should explicitly call setFetchSize(n) on the Statement object.

Related

Using H2 1.4 database can I write new rows if reading other rows

Using H2 1.4 database can I write new rows if reading other rows?
i.e if have 1000 rows in table, and have a SELECT query running that is getting primary key 1-10 would it be possible for an INSERT query to insert some new rows at same time, or would it have to wait for (all) the SELECT query on that table to finish?
What is the situation with an UPDATE of rows in table table but not being retrieved by any SELECT query?
I ask because with H2 1.3 I noticed that my application threads that accessed database seemed to spend a lot of time blocking, it seems better now I have upgraded to 1.4. But in my application that is multithreaded the threads are always dealing with different rows so it is important for me to better understanding how locking works in H2 (with the MV store, was previously using PAGE store with 1.3), and whether H2 can just lock individual rows when UPDATING or if it has to lock whole table.
It depends on storage engine that you choose. All information below applies to the most recent version (1.4.199), old versions have some differences.
With default MVStore engine data modification operations and SELECT … FOR UPDATE lock modified (or selected) rows. Other transactions can't modify locked rows in parallel, but can read their values. Note that read committed isolation level is used by default and other isolation levels are not really supported by this engine. With read committed isolation level other transactions will not see the concurrently modified values, they will see old ones. New values will be visible only when that transaction commits its work. With this engine database runs in multi-threaded mode by default, so a long-running command will not block other sessions.
With legacy PageStore engine (add ;MV_STORE=FALSE to the connection URL if you want to create a database with this engine) the whole tables are locked for writing. It means that you really need to lock the tables in the same order (alphabetical or some other) in all your transactions, otherwise a deadlock is possible. With this engine database runs in single-threaded mode by default, you can enable multi-threaded mode explicitly, but it is not safe with this engine. Different sessions can't do their work concurrently, long-running command will block all other sessions.
Databases are not converted from old (PageStore) format to a new (MVStore) format when you open them with a new version of H2, you have to do it by yourself. Also old databases may have serious problems with new versions, it's recommended to export them to SQL with old version of H2 using the SCRIPT TO 'filename.sql' command and load this script into new database with a new version of H2 using the RUNSCRIPT FROM 'filename.sql' command. You need to do it even if you choose to use the old engine. If you have persistent databases don't forget to create regular backup copies (with BACKUP TO 'filename.zip' command, for example).
You can find more details in the documentation:
https://h2database.com/html/advanced.html#mvcc
https://h2database.com/html/features.html#multiple_connections

Variable values stored outside of SSIS

This is merely a SSIS question for advanced programmers. I have a sql table that holds clientid, clientname, Filename, Ftplocationfolderpath, filelocationfolderpath
This table holds a unique record for each of my clients. As my client list grows I add a new row in my sql table for that client.
My question is this: Can I use the values in my sql table and somehow reference each of them in my SSIS package variables based on client id?
The reason for the sql table is that sometimes we get request to change the delivery or file name of a file we send externally. We would like to be able to change those things dynamically on the fly within the sql table instead of having to export the package each time and manually change then re-import the package. Each client has it's own SSIS package
let me know if this is feasible..I'd appreciate any insight
Yes, it is possible. There are two ways to approach this and it depends on how the job runs. First is if you are running for a single client for a single job run or if you are running for multiple clients for a single job run.
Either way, you will use the Execute SQL Task to retrieve data from the database and assign it to your variables.
You are running for a single client. This is fairly straightforward. In the Result Set, select the option for Single Row and map the single row's result to the package variables and go about your processing.
You are running for multiple clients. In the Result Set, select Full Result Set and assign the result to a single package variable that is of type Object - give it a meaningful name like ObjectRs. You will then add a ForEachLoop Enumerator:
Type: Foreach ADO Enumerator
ADO object source variable: Select the ObjectRs.
Enumerator Mode: Rows in all the tables (ADO.NET dataset only)
In Variable mappings, map all of the columns in their sequential order to the package variables. This effectively transforms the package into a series of single transactions that are looped.
Yes.
I assume that you run your package once per client or use some loop.
At the beginning of the "per client" code read all required values from the database into SSIS varaibles and the use these variables to define what you need. You should not hardcode client specific information in the package.

Spark Streaming - Cached table registered in memory created from each RDD of a windowed stream

I am using spark 1.5.2, my application has StreamingContext created on files and runs a windowed operation with 30 sec window and 10 sec slide interval.
I use foreachRDD for converting each RDD obtained from windowstream to a dataframe and further registered it as table using registerTempTable with name "T_Subs_Rev".
On running the application, spark GUI shows, "T_Subs_Rev" getting created again and again on each sliding interval(when using CACHE TABLE). And entry keeps on adding for each new table created with its respective size in memory.
Q1) Want to know as table name is same "T_Subs_Rev", what is actually happening.. will new entry will replace the older table in memory? Or still all tables with same name keeps on created in the memory.
Q2) Can we control how long a table can be cached in the memory, like with some time parameter and without having to explicitly call UNCACHE.

How to delete whole set from Aerospike namespace?

Is there any way to delete a set from namespace (Aerospike) from aql or CLI ???
My set also contains Ldts .
Please suggest me a way to delete whole Set from LDT
You can delete a set by using
asinfo -v "set-config:context=namespace;id=namespace_name;set=set_name;set-delete=true;"
This link explains more about how the set is deleted
http://www.aerospike.com/docs/operations/manage/sets/#deleting-a-set-in-a-namespace
There is a new and better way to do this as of Aerospike Server version 3.12.0, released in March 2017:
asinfo -v "truncate:namespace=namespace_name;set=set_name"
This previous command has been DEPRECATED, and works only up to Aerospike 3.12.1, released in April 2017:
asinfo -v "set-config:context=namespace;id=namespace_name;set=set_name;set-delete=true;"
The new command is better in several ways:
Can be issued during migrations
Can be issued while data is being written to the set
It is sufficient to run it on just one node of the cluster
I used it under those conditions (during migration, while data was being written, on 1 node) and it ran very quickly. A set with 30 million records was reduced to 1000 records in about 6 seconds. (Those 1000 records were presumably the ones written during those 6 seconds)
Details here
As of Aerospike 3.12, which was released in March 2017, the feature of deleting all data in a set or namespace is now supported in the database.
Please see the following documentation:
http://www.aerospike.com/docs/reference/info#truncate
Which provides the command line command that looks like:
asinfo truncate:namespace=;set=;lut=
It can also be truncated from the client APIs, here is the java documentation:
http://www.aerospike.com/apidocs/java/com/aerospike/client/AerospikeClient.html
and scroll down to the "truncate" method.
Please note that this feature can, optionally, take a time specification. This allows you to delete records, then start inserting new records. As timestamps are used, please make certain your server clocks are well synchronized.
You can also delete a set with the Java client as follows:
(1) Use the client "execute" method, which applies a UDF on all queried rows
AerospikeClient.execute(WritePolicy policy, Statement statement, String packageName, String functionName, Value... functionArgs) throws AerospikeException
(2) Define the statement to include all rows of the given set:
Statement statement = new Statement();
statement.setNamespace("my_namespace");
statement.setSetName("my_set");
(3) Specify a UDF that deletes the given record:
function delete_rec(rec)
aerospike:remove(rec)
end
(4) Call the method:
ExecuteTask task = AerospikeClient.execute(null, statement, "myUdf", "delete_rec")
task.waitTillComplete(timeout);
Is it performant? Unclear, but my guess is asinfo is better. But it's very convenient for testing/debugging/setup.
You can't delete a set but you can delete all records that exist in the set by scanning all the records and deleting then one by one. Sample C# code that will do the trick:
AerospikeClient.ScanAll(null, AerospikeNameSpace, category, DeleteAllRecordsCallBack);
Here DeleteAllRecordsCallBack is a callback function where in you can delete records one by one. This callback function gets called for all records.
private void DeleteAllRecordsCallBack(Key key, Record record)
{
AerospikeClient.Delete(null, key);
}
Using AQL:
TRUNCATE namespace_name.set_name
https://www.aerospike.com/docs/tools/aql/aql-help.html
Take care that you'll need to restart your nodes (one by one to avoid downtime) because you'll not retrieve bins left space instead.
I mean maximum limit of bins in Aerospike is 32,767. If you just delete and recreate several times your set, if it's create for example 10000 bins each time, you'll not be able to create more than 2,767 bins the 4th time because bins counter is kept in ram.
If you restart you're cluster, it will be released.
You can't dynamically delete a set from namespace like "drop table" in RDMS.
The following command using asinfo only lazily delete data inside a set:
asinfo -v "set-config:context=namespace;id=namespace_name;set=set_name;set-delete=true;"
There is a post about it in aerospike discuss site and I didn't see progress about this issue yet.
In our production experience special java deletion utility works quite well even without recently introduced durable deletes. You build it from sources, put somewhere near the cluster and run this way:
java -jar delete-set-1.0.0-jar-with-dependencies.jar -h <aerospike_host> -p 3000 -s <set_to_delete> -n <namespace_name>
In our prod environment cold restarts are quite rare events, basically when aerospike crashes. And the data flow is quite high so defragmentation kicks in earlier and we don't even have zombie record issue.
BTW asinfo way mentioned earlier didn't work for us. The records stayed there for couple of days so we use delete-set utility which worked right away.

How do I completely clear a SQLite3 database without deleting the database file?

For unit testing purposes I need to completely reset/clear SQLite3 databases. All databases are created in memory rather than on the file system when running the test suite so I can't delete any files. Additionally, several instances of a class will be referencing the database simultaneously, so I can't just create a new database in memory and assign it to a variable.
Currently my workaround for clearing a database is to read all the table names from sqlite_master and drop them. This is not the same as completely clearing the database though, since meta data and other things I don't understand will probably remain.
Is there a clean and simple way, like a single query, to clear a SQLite3 database? If not, what would have to be done to an existing database to make it identical to a completely new database?
In case it's relevant, I'm using Ruby 2.0.0 with sqlite3-ruby version 1.3.7 and SQLite3 version 3.8.2.
This works without deleting the file and without closing the db connection:
PRAGMA writable_schema = 1;
DELETE FROM sqlite_master;
PRAGMA writable_schema = 0;
VACUUM;
PRAGMA integrity_check;
Another option, if possible to call the C API directly, is by using the SQLITE_DBCONFIG_RESET_DATABASE:
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 1, 0);
sqlite3_exec(db, "VACUUM", 0, 0, 0);
sqlite3_db_config(db, SQLITE_DBCONFIG_RESET_DATABASE, 0, 0);
Here is the reference
The simple and quick way
If you use in-memory database, the fastest and most reliable way is to close and re-establish sqlite connection. It flushes any database data and also per-connection settings.
If you want to have some kind of "reset" function, you must assume that no other threads can interrupt that function - otherwise any method will fail. Therefore even you have multiple threads working on that database, there need to be a "stop the world" mutex (or something like that), so the reset can be performed. While you have exclusive access to the database connection - why not closing and re-opening it?
The hard way
If there are some other limitations and you cannot do it the way above, then you were already pretty close to have a complete solution. If your threads don't touch pragmas explicitly, then only "schema_version" pragma can be changed silently, but if your threads can change pragmas, well, then you have to go through the list on http://sqlite.org/pragma.html#toc and write "reset" function which will set each and every pragma value to it's initial value (you need to read default values at the begining).
Note, that pragmas in SQLite can be divided to 3 groups:
defined initially, immutable, or very limited mutability
defined dynamically, per connection, mutable
defined dynamically, per database, mutable
Group 1 are for example page_size, page_count, encoding, etc. Those are definied at database creation moment and usualle cannot be modified later, with some exceptions. For example page_size can be changed prior to "VACUUM", so the new page size will be set then. The page_count cannot be changed by user, but it changes automatically when adding data (obviously). The encoding is defined at creation time and cannot be modified later.
You should not need to reset pragmas from group 1.
Group 2 are for example cache_size, recursive_triggers, jurnal_mode, foreign_keys, busy_timeout, etc. These pragmas are always set to defaults when opening new connection to the database. If you don't disconnect, you will need to reset those to defaults manually.
Group 3 are for example schema_version, user_version, maybe some others, you need to look it up. Those will also need manual reset. If you disconnect from in-memory database, the database gets destroyed, so then you don't need to reset those.
Create an empty memory database.
Use the backup API to copy that database over the actual database.
In the case of sqlite3-ruby, see test/test_backup.rb for an example.
SELECT * FROM dbname.sqlite_master WHERE type='table';
and
DROP TABLE