Batch insert support

Batch insert support - r2dbc

I try to perform a batch insert using r2dbc.
I have see that with DatabaseClient from spring boot, it's not yet possible.
I have try to do that using R2DBC SPI Statement and the and method, like that:
Mono.from(this.factory.create())
.map(connection -> connection.createStatement(insertSQL))
.map(statement -> {
lines.forEach(line -> {
statement
.add()
.bind(0, line.getId())
;
});
return statement;
})
.flatMap(statement -> Mono.from(statement.execute()));
I have see on the log that two insert request are done.
2019-12-18 16:27:36.092 DEBUG [] 4133 --- [tor-tcp-epoll-1] io.r2dbc.postgresql.QUERY : Executing query: insert into table(id) values ($1)
2019-12-18 16:27:36.116 DEBUG [] 4133 --- [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient : Request: Bind{}
2019-12-18 16:27:36.126 DEBUG [] 4133 --- [tor-tcp-epoll-1] io.r2dbc.postgresql.QUERY : Executing query: insert into table(id) values ($1)
2019-12-18 16:27:36.130 DEBUG [] 4133 --- [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient : Request: Bind{}
Is add perform a batch update or just run two requests?
Thanks.

A little bit late, but better than never I guess.
I would say that your code issue two separate requests.
Using DatabaseClient, which I guess have matured since the question was asked:
Mono.from(connectionFactory.create())
.map(connection -> connection.createBatch())
.map(batch -> {
batch.add("delete from table1");
batch.add("delete from table2");
batch.add("delete from table3");
return batch.execute();
})
// Etc
When running the above, the following is logged:
Query:["delete from table1","delete from table2","delete from table3"] Bindings:[]

Related

Handling RuntimeException errors in a BigQuery pipeline

When we use a BigQueryIO transform to insert rows, we have an option called:
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
which instructs the pipeline to NOT attempt to create the table if the table doesn't already exist. In my scenario, I want to trap all errors. I attempted to use the following:
var write=mypipline.apply("Write table", BigQueryIO
.<Employee>write()
.to(targetTableName_notpresent)
.withExtendedErrorInfo()
.withFormatFunction(new EmployeeToTableRow())
.withSchema(schema)
.withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
.withTableDescription("My Test Table")
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND));
which tried to insert rows into a non-existent table. What I found was a RuntimeException. Where I am stuck is that I don't know how to handle RuntimeException problems. I don't believe there is anything here I can surround with a try/catch.
This question is similar to this one:
Is it possible to catch a missing dataset java.lang.RuntimeException in a Google Cloud Dataflow pipeline that writes from Pub/Sub to BigQuery?
but I don't think that got a working answer and was focused on a missing Dataset and not a table.
My exception from the fragment above is:
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 404 Not Found
POST https://bigquery.googleapis.com/bigquery/v2/projects/XXXX/datasets/jupyter/tables/not_here/insertAll?prettyPrint=false
{
"code" : 404,
"errors" : [ {
"domain" : "global",
"message" : "Not found: Table XXXX:jupyter.not_here",
"reason" : "notFound"
} ],
"message" : "Not found: Table XXXX:jupyter.not_here",
"status" : "NOT_FOUND"
}
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:373)
at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:341)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309)
at .(#126:1)

You can't add a try/catch directly from the BigQueryIO in the Beam job, if the destination table doesn't exist.
I think it's better to delegate this responsability outside of Beam or launch the job only if your table exists.
Usually a tool like Terraform has the responsability to create infrastructure, before to deploy resources and run Beam jobs.
If it's mandatory for you to check the existence of the table, you can create :
A Shell script with bq and gcloud cli to check the existence before to launch the job
A Python script to check the existence before to launch the job
Python script :
For Python there is the BigQuery Python client :
from google.cloud import bigquery
from google.cloud.exceptions import NotFound
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to determine existence.
# table_id = "your-project.your_dataset.your_table"
try:
client.get_table(table_id) # Make an API request.
print("Table {} already exists.".format(table_id))
except NotFound:
print("Table {} is not found.".format(table_id))
BQ Shell script :
bq show <project_id>:<dataset_id>.<table_id>
If the table doesn't exist, catch the error and do not start the Dataflow job.

GraphDB Running Queries not showing up in the Monitor Tab in Workbench and not executing

Iam using GraphDB 10.0 and trying to run an insert statement in Workbench. This query starts running. Then I switch to the monitor tab which has no records. So I never see this query there. When I return back to the Sparql tab which contains my insert statement, I see no longer the progress circle instead the following info message :
'No results from previous run. Click Run or press Ctrl/Cmd-Enter to
execute the current query or update.'
This query eventually never gets executed.
In the logs, this query turns up in the query-log but not in the slow-query-log where I see the other successful queries.
Can someone suggest why I see this behaviour?
Here is the query :
PREFIX dsr: <https://sigir.com/model/>
insert
{GRAPH <https://sigir.com/datafactory/measures>
{
?ntwgiri a dsr:NetWeight ;
dsr:value ?ntwg_float .
}
} where {
service <repository:measures> {
SELECT distinct ?ntwgiri ?ntwg_float
where {
?ntwgiri a dsr:NetWeight ;
dsr:value ?ntwg_float .
}
}
}

PhpUnit giving errror at second test function

I am trying to make test to my Database with phpunit and I migrate the database to memory.
the first test run just fine:
/** #test */
public function it_fetches_a_single_ano_letivo()
{
$this->makeAnoLetivo();
$this->getJson('/v1/anos-letivos');
$this->assertResponseOk();
}
but the second test fails and its exactly the same as the first one:
/** #test */
public function it_fetches_anos_letivos()
{
$this->makeAnoLetivo();
$this->getJson('/v1/anos-letivos');
$this->assertResponseOk();
}
Here is the makeAnoLetivo function:
private function makeAnoLetivo($anoLetivoFields = [])
{
while($this->times--)
{
$ano1=$this->fake->year;
$anoLetivo = array_merge([
'ano1' => $ano1+0,
'ano2' => $ano1+1
], $anoLetivoFields);
AnoLetivo::create($anoLetivo);
}
}
and here is the phpUnit output:
Configuration read from {{PATH_TO_PROJECT}}/phpunit.xml
..E
Time: 2.62 seconds, Memory: 23.25Mb
There was 1 error:
1) AnosLetivosTest::it_fetches_anos_letivos
Illuminate\Database\QueryException: SQLSTATE[23000]: Integrity constraint violation: 19 anos_letivos.id may not be NULL (SQL: insert into "anos_letivos" ("ano1", "ano2", "updated_at", "created_at") values (2009, 2010, 2015-03-27 18:41:59, 2015-03-27 18:41:59))
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:620
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:576
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:359
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:316
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php:1702
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Builder.php:933
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1603
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1603
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1501
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:544
{{PATH_TO_PROJECT}}/tests/AnosLetivosTest.php:50
{{PATH_TO_PROJECT}}/tests/AnosLetivosTest.php:32
phar:///usr/local/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///usr/local/bin/phpunit/phpunit/TextUI/Command.php:104
Caused by
PDOException: SQLSTATE[23000]: Integrity constraint violation: 19 anos_letivos.id may not be NULL
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:358
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:612
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:576
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:359
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Connection.php:316
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Query/Builder.php:1702
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Builder.php:933
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1603
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1603
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:1501
{{PATH_TO_PROJECT}}/vendor/laravel/framework/src/Illuminate/Database/Eloquent/Model.php:544
{{PATH_TO_PROJECT}}/tests/AnosLetivosTest.php:50
{{PATH_TO_PROJECT}}/tests/AnosLetivosTest.php:32
phar:///usr/local/bin/phpunit/phpunit/TextUI/Command.php:152
phar:///usr/local/bin/phpunit/phpunit/TextUI/Command.php:104
FAILURES!
Tests: 3, Assertions: 5, Errors: 1.
So the first function run just fine but the second one is the same and fails...
Also if i create a third one (equal) only the first one will pass.
EDIT 1:
So it inserts well in the first test, it rollbacks the DB and Migrates it again for the next test and the insert in the database says the ID may not be NULL, so it seems the create method no longer knows how to insert in the database after the first test... still dont know what causes this, the migrate is correct and it is rollback"ing" well too...
Edit 2:
I tried to run the tests against the production database and it works just fine. So the problem must be on the memory database or on any configuration on this memory database. But I dont know what problem because the first test I get green and it inserts the data without problems, i can even insert 10 items in the first test and it does what it should. But the second test shows the error above.

It looks like the database insert command is failing after the first test. Could be for a number of reasons.
I think you should consider using https://github.com/laracasts/TestDummy - it is designed to allow you to have fake data for all your tests. It will also automatically reset your database between each test (using transactions).
Its a wonderful tool - give it a go

So the solution was to write this lines on the setUp() method:
AnoLetivo::flushEventListeners();
AnoLetivo::boot();
the problem may be in the laravel framework.

Unable to deserialize ActorRef to send result to different Actor

I am starting to use Spark Streaming to process a real time data feed I am getting. My scenario is I have a Akka actor receiver using "with ActorHelper", then I have my Spark job doing some mappings and transformation and then I want to send the result to another actor.
My issue is the last part. When trying to send to another actor Spark is raising an exception:
15/02/20 16:43:16 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.IllegalStateException: Trying to deserialize a serialized ActorRef without an ActorSystem in scope. Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'
The way I am creating this last actor is the following:
val actorSystem = SparkEnv.get.actorSystem
val lastActor = actorSystem.actorOf(MyLastActor.props(someParam), "MyLastActor")
And then using it like this:
result.foreachRDD(rdd => rdd.foreachPartition(lastActor ! _))
I am not sure where or how to do the advise "Use 'akka.serialization.Serialization.currentSystem.withValue(system) { ... }'". Do I need to set anything special through configuration? Or create my actor differently?

Look at the following example to access an actor outside of the Spark domain.
/*
* Following is the use of actorStream to plug in custom actor as receiver
*
* An important point to note:
* Since Actor may exist outside the spark framework, It is thus user's responsibility
* to ensure the type safety, i.e type of data received and InputDstream
* should be same.
*
* For example: Both actorStream and SampleActorReceiver are parameterized
* to same type to ensure type safety.
*/
val lines = ssc.actorStream[String](
Props(new SampleActorReceiver[String]("akka.tcp://test#%s:%s/user/FeederActor".format(
host, port.toInt))), "SampleReceiver")

I found that if I collect before I send to the actor it works like a charm:
result.foreachRDD(rdd => rdd.collect().foreach(producer ! _))

Grails/Hibernate Batch Insert

I am using STS + Grails 1.3.7 and doing the batch insertion for thousands instances of a domain class.
It is very slow because Hibernate simply batch all the SQL statements into one JDBC call instead of combining the statements into one.
How can I make them into one large statement?

What you can do is to flush the hibernate session each 20 insert like this :
int cpt = 0
mycollection.each{
cpt ++
if(cpt > 20){
mycollection.save(flush:true)
}
else{
mycollection.save()
}
}
The flushing of hbernate session executes the SQL statement each 20 inserts.
This is the easiest method but you can find more interessant way to do it in Tomas lin blog. He is explaining exactly what you want to do : http://fbflex.wordpress.com/2010/06/11/writing-batch-import-scripts-with-grails-gsql-and-gpars/

Using the withTransaction() method on the domain classes makes the inserts much faster for batch scripts. You can build up all of the domain objects in one collection, then insert them in one block.
For example:
Player.withTransaction{
for (p in players) {
p.save()
}
}

You can see this line in Hibernate doc:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
When I changed the type of generator, it worked.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Batch insert support - r2dbc

Related

Handling RuntimeException errors in a BigQuery pipeline

GraphDB Running Queries not showing up in the Monitor Tab in Workbench and not executing

PhpUnit giving errror at second test function

Unable to deserialize ActorRef to send result to different Actor

Grails/Hibernate Batch Insert

Categories

Resources