Performance in Entity Framework Foreach Add elements - sql

I have a program to add points of interest and it's guides from a webservice to a data base, it looks like this:
// For all the cities (like 1M)
foreach(City city in ListOfCities){
try {
AddCity(city);
} catch (Exception ex){
_logger.Error(ex.Message);
continue;
}
}
//save points of interest of the city to the database
public void AddCity(City city){
using (WEntities context = new WEntities()){
//for all the points of interest
foreach(PointOfInterest point in city){
try{
//search all the guides and add them to the point of interest
List<Guide> listGuides = _webservice.GetAllGuidesForPoint(point);
foreach(Guide guide in listGuides){
point.Guides.Add(guide);
}
// add the point to the context and save it to the database
context.PointOfInterest.AddObject(point);
context.ObjectStateManager.ChangeObjectState(point, System.Data.EntityState.Added);
context.SaveChanges();
} catch (Exception ex) {
_logger.Error(ex.Message)
continue;
}
}
}
}
The problem is, that given a number of cities, the speed of each iteration drop significantly, at first each loop may take 1 second, and in the end can take more than 30 minutes.
What I'm doing wrong? Can I do something to make all the iterations to take the same time (the short one, if I can choose)?
P.D.: what's more, the cpu and ram usage increments over the time.

Related

Merging Mono and Flux in Spring WebFlux

Let's say I have a method store(Flux<DataBuffer> bufferFlux) which receives some data as a flux of DataBuffers, calculates an identifier, creates an AsynchronousFileChannel and then uses DataBufferUtils the data to the channel.
I started like this. Please note, that the following code will not work. It should just illustrate how I create a FileChannel and how I would like to write the data, while releasing used buffers and closing the channel afterwards.
public Mono<Void> store(Flux<DataBuffer> bufferFlux) {
var channelMono = Mono.defer(() -> {
try {
log.info("opening file {}", filePath);
return Mono.just(AsynchronousFileChannel
.open(filePath, StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE));
} catch (IOException ex) {
log.error("error opening file", ex);
return Mono.error(ex);
}
});
// calculate identifier
// store buffers to AsynchronousFileChannel
return DataBufferUtils
.write(bufferFlux, fileChannel)
.doOnNext(DataBufferUtils.releaseConsumer())
.doFinally(f -> {
try {
fileChannel.close();
} catch (IOException ioException) {
log.error("error closing file channel", ioException);
}
})
.then();
}
The problem is, that I just started with reactive programming and have no clue how I could bring these two building blocks together, so that
the data is written to the channel
all buffers are gracefully released
the channel is closed after writing the data
the whole operation just signals complete or error (I guess this is what Mono<Void> is used for)
Can anyone help me choose the right operators or point me to a conceptual problem (perhaps there is a good reason why I cannot find a suitable operator)? :)
Thank you!

How to force Zipkin/Brave/Spring-Cloud-Sleuth span to be exportable?

How can I force a Zipkin span to be exportable?
In below code spans are sometimes exportable, sometimes not in a non repeatable manner.
It seems to me that if I comment first scopedSpan, than second manually created spanInScope is exportable, but how can first scopedSpan prevent second spanInScope from being exportable? How do they interfere?
#SneakyThrows
private void debugScopedSpan(final String label) {
ScopedSpan scopedSpan = tracer.startScopedSpan(label + "_1").tag("type", "manual");
try {
log.info("===== debugScopedSpan_1 {}", label);
} catch (RuntimeException | Error e) {
scopedSpan.error(e);
throw e;
} finally {
scopedSpan.finish();
}
// Why both above scopedSpan and below spanInScope cant be exportable at the same time??? How do they iterfere with each other?
Span trace = tracer.nextSpan().name(label+"_2").tag("type", "manual").start();
final Tracer.SpanInScope spanInScope = tracer.withSpanInScope(trace);
log.info("===== debugScopedSpan_2 {}", label);
spanInScope.close();
trace.finish();
}
It's because of sampling. Please create a bean of sampler type whose value can be Sampler.ALWAYS or set the probability property to 1.0

Repast: Query() method runs greatly slower than manual iteration

Recently I found a big problem using repast query() method. I found it is somehow significantly slower than using the simple manual iteration approach to get the specific agent set. Take a package object query for all hubs with the same "hub_code" for example, I tested both using query and manual iteration approaches:
public void arrival_QueryApproach() {
try {
if (this.getArr_time() == this.getGs().getTick()) {
Query<Object> hub_query = new PropertyEquals<Object>(context, "hub_code", this.getSrc());
for (Object o: hub_query.query()) {
if (o instanceof Hub) {
((Hub)o).getDepature_queue().add(this);
this.setStatus(3);
this.setCurrent_hub(this.getSrc());
break;
}
}
}
}
catch (Exception e) {
System.out.println("No hub identified: " + this.getSrc());
}
}
public void arrival_ManualApproach() {
try {
if (this.getArr_time() == this.getGs().getTick()) {
for (Hub o: gs.getHub_list()) {
if (o.getHub_code().equals(this.getSrc())) {
((Hub)o).getDepature_queue().add(this);
this.setStatus(3);
this.setCurrent_hub(this.getSrc());
break;
}
}
}
}
catch (Exception e) {
System.out.println("No hub identified: " + this.getSrc());
}
}
The executing speed is dramatically different. There are 50000 package and 350 hub objects in my model. It took me on average 1 minute and 40 seconds to run 1600 ticks when using built-in query function, while it takes only 5 seconds when using manual iteration approach. What are the causes to this dramatic difference and why query works so slow? Instead it should logically runs much quicker.
Another issue assocaited with the query methods is that “PropertyGreaterThanEquals” or "PropertyLessThanEquals" runs much slower than using the method “PropertyEquals”. below is another simple example about query a suitable dock for a truck to unload goods.
public void match_dock() {
// Query<Object> pre_fit = new PropertyGreaterThanEquals(context, "unload_speed", 240);
// Query<Object> pre_fit = new PropertyLessThanEquals(context, "unload_speed", 240);
Query<Object> pre_fit = new PropertyEquals(context, "unload_speed", 240);
for (Object o : pre_fit.query()) {
if (o instanceof Dock) {
System.out.println("this dock's id is: " + ((Dock)o).getId());
}
}
}
There are only 3 docks and 17 truck objects in the model. it took less than one second to run total of 1920 ticks if using "PropertyEquals"; however, it took me more than 1 minute to run total of 1920 ticks if choosing the query methods “PropertyGreaterThanEquals” or "PropertyLessThanEquals". in this sense, I have to again loop through the all objects(docks) and doing the greater than query manually? This appears to be another issue much affecting the model execution speed?
I am using java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
My Eclipse complier level is 10. Installed JREs (default) JDK 11.
Thanks for helpful advice.

remove objects from arraylist is doing strange things

In the code below, i want the balls to change from ArrayList ballList to another ArrayList oldBalls as soon as their age turns higher than a threshold value.
This code should be very simple but i can't figure out why when the age is same as the threshold (not larger) they disappear and then they come back 2 frames later.
I have checked related questions using iterators for ArrayLists in java but i think there should be a way to do this in processing without java.
Also i cant seem to post any question on the processing forum even though i can sign in, no idea why...
I have reduced the code to the minimum able to reproduce the error.
ArrayList <Ball> ballList;
ArrayList <Ball> oldBalls;
int threshold=4;
PFont font;
void setup(){
size(400,400);
frameRate(0.5);
font=createFont("Georgia", 18);
textFont(font);
textAlign(CENTER, CENTER);
ballList=new ArrayList<Ball>();
oldBalls=new ArrayList<Ball>();
noFill();
strokeWeight(2);
}
void draw(){
background(0);
Ball b=new Ball(new PVector(10, random(height/10,9*height/10)),new PVector(10,0),0);
ballList.add(b);
stroke(0,0,200);
for(int i=0;i<oldBalls.size();i++){
Ball bb=oldBalls.get(i);
bb.update();
bb.render();
text(bb.age,bb.pos.x,bb.pos.y);
}
stroke(200,0,0);
for(int i=0;i<ballList.size();i++){
Ball bb=ballList.get(i);
bb.update();
bb.render();
bb.migrate();
text(bb.age,bb.pos.x,bb.pos.y);
}
}
class Ball{
PVector pos;
PVector vel;
int age;
Ball(PVector _pos, PVector _vel, int _age){
pos=_pos;
vel=_vel;
age=_age;
}
void migrate(){
if(age>threshold){
oldBalls.add(this);
ballList.remove(this);
}
}
void update(){
pos.add(vel);
age+=1;
}
void render(){
ellipse(pos.x,pos.y,24,24);
}
}
Note how balls labelled with age=threshold suddenly disappear...
i guess the problem is here:
for(int i=0;i<ballList.size();i++){
Ball bb=ballList.get(i);
bb.update();
bb.render();
//add this
if(bb.migrate())
i--;
text(bb.age,bb.pos.x,bb.pos.y);
}
and
boolean migrate(){
if(age>threshold){
oldBalls.add(this);
ballList.remove(this);
//and this
return true;
}
return false;
}
migrate() will remove the object from the ballList and reduce it's size by 1.
What it looks like is happening here is because you're altering the List's whilst iterating through them. Consider this for loop you have here
for(int i=0;i<ballList.size();i++){
Ball bb=ballList.get(i);
bb.update();
bb.render();
bb.migrate();
text(bb.age,bb.pos.x,bb.pos.y);
}
Say ballList has 2 balls in it both age 3, the first loops gets ball[0] and then removes it from the list, i will increment and the loop will immediately exit because ballList.size() is now 1. So it's not the ball which gets to age 4 that vanishes but the subsequent one.

Using java.util.randomUUID for setInsertId and no new data after some time

Setting the insertid with randomUUID on row level and after some time, I see that no rows being introduced to big query. I've instrumented the code to capture the failures and even though no failure causing retry, it is not streaming data into big query. One thing perhaps I should mention is that we maintain queue of connections to avoid warm up period and all. I suspect that somehow it is identifying any incoming rows as duplicate!
Populating batch of rows as;
rowList.add(new Rows().setJson(this.row).setInsertId(UUID.randomUUID().toString()));
then, calling InsertBatch method;
TableDataInsertAllRequest content = new TableDataInsertAllRequest().setRows(rowList);
Throwable cause = null;
try{
while(retryStrategy.retriesRemaining()){
try{
#SuppressWarnings("unused")
TableDataInsertAllResponse response = bq.tabledata().insertAll(bqProjectId, DataSetId, TableId, content).execute();
if (response.getInsertErrors()!=null){
warn("Inserting One of the rows has failed");
statsKeeper.post(BigQueryStat.REMOTE_SERVICE_UNAVAILABLE.getId(), 1L);
throw new SocketTimeoutException();
}
else
return true;
} catch(Throwable e){
cause = e;
if (!shouldRetry(e, retryStrategy)) {
statsKeeper.post(BigQueryStat.SERVICE_DOWN_REPORT.getId(), 1L);
throw e;
}
}
}
if(!retryStrategy.retriesRemaining())
statsKeeper.post(BigQueryStat.SERVICE_DOWN_REPORT.getId(), 1L);
} catch(Exception e){
throw Throwables.propagate(e);
}
Is there any obvious reason why google bq apis determines those rows as duplicate?
Can you provide the times/tables that you're encountering this behavior on? There is a 1-2 minute warm up period before rows potentially appear, and occasionally rows take longer if the associated backend system encounters some data availability issues.