Grails/Hibernate Batch Insert

Grails/Hibernate Batch Insert - sql

I am using STS + Grails 1.3.7 and doing the batch insertion for thousands instances of a domain class.
It is very slow because Hibernate simply batch all the SQL statements into one JDBC call instead of combining the statements into one.
How can I make them into one large statement?

What you can do is to flush the hibernate session each 20 insert like this :
int cpt = 0
mycollection.each{
cpt ++
if(cpt > 20){
mycollection.save(flush:true)
}
else{
mycollection.save()
}
}
The flushing of hbernate session executes the SQL statement each 20 inserts.
This is the easiest method but you can find more interessant way to do it in Tomas lin blog. He is explaining exactly what you want to do : http://fbflex.wordpress.com/2010/06/11/writing-batch-import-scripts-with-grails-gsql-and-gpars/

Using the withTransaction() method on the domain classes makes the inserts much faster for batch scripts. You can build up all of the domain objects in one collection, then insert them in one block.
For example:
Player.withTransaction{
for (p in players) {
p.save()
}
}

You can see this line in Hibernate doc:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
When I changed the type of generator, it worked.

Related

java/jdbc timeout in clojure

I am trying to add timeout to jdbc/query and jdbc/execute!. Somewhere in the web I found that both functions take :timeout as an option. Documention also says the options are passed to prepare-statment which takes in :timeout as an option.
My function calls look like,
(jdbc/query db-read-spec query {:timeout 2})
(jdbc/execute! db-write-spec query {:timeout 2})
Is this how it is done? If yes, How do I test this?
If there is different way of doing this which is testable, that works too.

The :timeout option causes .setQueryTimeout to be called on the PreparedStatement used under the hood of clojure.java.jdbc. It is in seconds, not milliseconds, so your query would have to be extremely slow for a timeout of 2,000 seconds (just over half an hour) to take effect.
JDBC supports several different timeouts across several of its classes. For example, javax.sql.DataSource supports .setLoginTimeout (also in seconds), as does java.sql.DriverManager.
There are also database-specific options you can add to the connection string (which you can add as additional key/value pairs in your "db-spec") to control lower-level timeouts. For example, MySQL supports connectionTimeout and socketTimeout in the connection string -- and both of those are in milliseconds. clojure.java.jdbc allows for those to be provided in your "db-spec" hash map as :connectTimeout and :socketTimeout keys respectively.
Note that clojure.java.jdbc is considered "Stable" at this point and all current and future development effort is focused on next.jdbc at this point. next.jdbc makes it easier to use the loginTimeout since it operates on JDBC objects directly, so the whole (Java) API is available as well. It also has built-in support for connection pooling and is, overall, simpler and faster than clojure.java.jdbc.

You can leverage query-hint on mysql-select-queries (time in ms)
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM t1 INNER JOIN t2 WHERE....
then you can just wrap your queries:
(defn timed-query [db query t]
(j/query db [(str (subs query 0 6)
(format " /*+ MAX_EXECUTION_TIME(%s) */ " t)
(subs query 7))]))
and test:
(deftest test-query-timeout
(is (thrown? Exception (timed-query db "select * from Employees where id>5" 1))))
you should use much-complex queries for this to work with 1ms;

I figure out a work around to test this out. Since I use postgres I could leverage select pg_sleep(time-in-seconds)
And my test looks like
(is (thrown-with-msg? PSQLException #"ERROR: canceling statement due to user request"
(fetch-or-save "select pg_sleep(3)")))

How to control flow of a transactional stream with Spring Data R2DBC?

Support for transactional streams seems to have been recently implemented but due to its newness, there are not many code examples.
Could someone show an example of a transactional stream that does a series of database inserts and then returns some value on success, but with a midstream checkpoint in between inserts that tests some condition and might roll back the transaction and return different values depending on the checkpoint result?

Reactive transactions follow the same pattern as imperative ones:
A transaction is started before running any user-space commands
Run user-space commands
Commit (or rollback)
A few aspects to note here: A connection is always associated with a materialization of a reactive sequence. What we know from a Thread-bound connection that is bound to an execution in imperative programming translates to an materialization in reactive programming.
So each (concurrent) execution gets a connection assigned.
Spring Data R2DBC has no support for savepoints. Take a look at the following code example that illustrates a decision to either commit or rollback:
DatabaseClient databaseClient = DatabaseClient.create(connectionFactory);
TransactionalOperator transactionalOperator = TransactionalOperator
.create(new R2dbcTransactionManager(connectionFactory));
transactionalOperator.execute(tx -> {
Mono<Void> insert = databaseClient.execute("INSERT INTO legoset VALUES(…)")
.then();
Mono<Long> select = databaseClient.execute("SELECT COUNT(*) FROM legoset")
.as(Long.class)
.fetch()
.first();
return insert.then(select.handle((count, sink) -> {
if(count > 10) {
tx.setRollbackOnly();
}
}));
}).as(StepVerifier::create).verifyComplete();
Notable aspects here are:
We're using TransactionalOperator instead of #Transactional.
The code in .handle() calls setRollbackOnly() to roll back the transaction.
Using #Transactional, you would typically use exceptions to signal a rollback condition.

Data is not properly stored to hsqldb when using pooled data source by dbcp

I'm using hsqldb to create cached tables and indexed tables.
The data being stored has pretty high frequency so I need to use a connection pool.
Also because there is a lot of data I do not call checkpoint on every commit, but rather expect the data to be flushed after 50,000 rows are inserted.
So the thing is that I can see the .data file is growing but when I connect with hsqldb client I don't see the tables and the data.
So I had 2 simple tests, one inserted single row and one inserted 60,000 rows to new table. In both cases I couldn't see the result in any hsqldb client.
(Note that I use shutdown=true)
So when I add checkpoint after each commit, it solve the problem.
Also if specify in the connection string to use log, it solves the problem (I don't want the log in production though). Also not using pooled connection solved the problem and last is using pooled data source and explicitly close it before shutdown.
So I guess that some connections in the connection pool are not being closed, preventing from the db to somehow commit the changes and make them available for the client. But then, why couldn't I see the result even with 60,000 rows?
I also would expect the pool to be closed automatically...
What am I doing wrong? What is happening behind the scene?
The code to get the data source looks like this:
Class.forName("org.hsqldb.jdbcDriver");
String url = "jdbc:hsqldb:" + m_dbRoot + dbName + "/db" + ";hsqldb.log_data=false;shutdown=true;hsqldb.nio_data_file=false";
ConnectionFactory connectionFactory = new DriverManagerConnectionFactory(url, user, password);
GenericObjectPool connectionPool = new GenericObjectPool();
KeyedObjectPoolFactory stmtPool = new GenericKeyedObjectPoolFactory(null);
new PoolableConnectionFactory(connectionFactory, connectionPool, stmtPool, null, false, true);
DataSource ds = new PoolingDataSource(connectionPool);
And I'm using this Pooled data source to create table:
Connection c = m_dataSource.getConnection();
Statement st = c.createStatement();
String script = String.format("CREATE CACHED TABLE IF NOT EXISTS %s (id %s NOT NULL, entity %s NOT NULL, PRIMARY KEY (id));", m_tableName, m_idGenerator.getIdType(), TABLE_ENTITY_TYPE);
st.execute(script);
c.close;
st.close();
And insert rows:
Connection c = m_dataSource.getConnection();
c.setAutoCommit(false);
Statement stmt = c.prepareStatement(m_sqlInsert);
stmt.setObject(1, id);
stmt.setBinaryStream(2, Serializer.Helper.serialize(m_serializer, entity));
stmt.executeUpdate();
stmt.close();
stmt = null;
c.commit();
c.close();
stmt.close();
so the above seems to add data but it cannot be seen.
When I explicitly called
connectionPool.close();
Then and only then I could see the result.
I also tried to use JDBCDataSource and it worked as well.
So what is going on? And what is the right way to do this?

Your method of accessing the database from outside your application process is simply wrong.
Only one java process is supposed to connect to the file: database.
In order to achieve your aim, launch an HSQLDB server within your application, using exactly the same JDBC URL. Then connect to this server from the external client.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/listeners-chapt.html#lsc_app_start
Update: The OP commented that the external client was used after the application had stopped. Because you have turned the log off with hsqldb.log_data=false, nothing is persisted permanently. You need to perform an explicit CHECKPOINT or SHUTDOWN when your application completes its work. You cannot rely on shutdown=true at all, even without connection pooling.
See the Guide:
http://www.hsqldb.org/doc/2.0/guide/deployment-chapt.html#dec_bulk_operations

Testing XQuery and Marklogic transactions

We have some business requirements that call for versioning. We chose to use MarkLogic library services for that. We have an issue testing our code with XRAY and using transactions.
Our test is as follows:
declare function should-save-with-version-when-releasing() {
declare option xdmp:transaction-mode "update";
let $uri := '/some-document-uri.xml'
let $document := fn:doc($uri)
let $pre-release-version := c:get-latest-version($uri)
let $post-release-version := c:get-latest-version($uri)
let $result := mut:release($document) (:this should version up:)
return (assert:not-empty($pre-release-version),
assert:not-empty($result),
assert:not-equal($pre-release-version,$post-release-version),
xdmp:rollback())
The test will pass no matter what, and as it turns out ML rollback demolishes all the variables.
How do we test it using transactions?
Any helps greatly appreciated,
im

With MarkLogic an entire XQuery update normally acts like a single transaction. When mut:release adds an update to the transaction's stack, the rest of the query will not see that update until after it commits. From the point of view of the query, this normally happens after the entire query finishes, and is not visible to the query.
The docs have something useful to add about what http://docs.marklogic.com/xdmp:rollback does:
When a transaction is rolled back, the current statement immediately
terminates, updates made by any statement in the transaction are
discarded, and the transaction terminates.
So it isn't that the variables are demolished: it's that your program is over.
I think http://docs.marklogic.com/guide/app-dev/transactions#id_15746 has an example that is fairly close to your use-case: "Example: Multi-statement Transactions and Same-statement Isolation". It demonstrates how to xdmp:eval or xdmp:invoke to update a document and see the results within the same query.
Test it to see that it works, then replace the xdmp:commit with an xdmp:rollback. For me the example still works. Start replacing the rest of the logic with your unit test logic, and you should be on your way.

ScalaQuery's query/queryNA several times slower than JDBC?

In the following performance tests of many queries, this timed JDBC code takes 500-600ms:
val ids = queryNA[String]("select id from account limit 1000").list
val stmt = session.conn.prepareStatement("select * from account where id = ?")
debug.time() {
for (id <- ids) {
stmt.setString(1, id)
stmt.executeQuery().next()
}
}
However, when using ScalaQuery, the time goes to >2s:
val ids = queryNA[String]("select id from account limit 1000").list
implicit val gr = GetResult(r => ())
val q = query[String,Unit]("select * from account where id = ?")
debug.time() {
for (id <- ids) {
q.first(id)
}
}
After debugging with server logs, this turns out to be due to the fact that the PreparedStatements are being repeatedly prepared and not reused.
This is in fact a performance issue that we've been hitting in our application code, so we're wondering if we're missing something regarding how to reuse prepared statements properly in ScalaQuery, or if dropping down to JDBC is the suggested workaround.

Got an answer from the scalaquery mailing list. This is just how ScalaQuery is designed - it assumes that you're something that provides statement pooling underneath:
Nowadays ScalaQuery always requests a new PreparedStatement from the Connection. There used to be a cache for PreparedStatements in early versions but I removed it because there are already good solutions for this problem. Every decent connection pool should have an option for PreparedStatement pooling. If you're using a Java EE server, it should have an integrated connection pool. For standalone applications, you can use something like http://sourceforge.net/projects/c3p0/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Grails/Hibernate Batch Insert - sql

I am using STS + Grails 1.3.7 and doing the batch insertion for thousands instances of a domain class. It is very slow because Hibernate simply batch all the SQL statements into one JDBC call instead of combining the statements into one. How can I make them into one large statement?

Using the withTransaction() method on the domain classes makes the inserts much faster for batch scripts. You can build up all of the domain objects in one collection, then insert them in one block. For example: Player.withTransaction{ for (p in players) { p.save() } }

You can see this line in Hibernate doc: Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator. When I changed the type of generator, it worked.

Related

java/jdbc timeout in clojure

How to control flow of a transactional stream with Spring Data R2DBC?

Data is not properly stored to hsqldb when using pooled data source by dbcp

Testing XQuery and Marklogic transactions

ScalaQuery's query/queryNA several times slower than JDBC?

Categories

Resources