FileSystem lock while creating Indexer

FileSystem lock while creating Indexer - lucene

I have used hibernate-search where I have annotated the domains with #Indexed, #Field and many more.
My project is based on multiple microservices like
Search Service - which upon starting read the data from DB and create the indexes. (code has been mentioned below):
#Transactional(readOnly = true)
public void initializeHibernateSearch() {
try {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(centityManager);
fullTextEntityManager.createIndexer().startAndWait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
MicroService 1 - which updates the index when any domain is inserted or updated.
The issue I am facing is that when Search Service is started and creates the indexes, it takes the locks on the index files and never releases the locks.
and when Microservice 1 tries to update the index upon insertion or update, it throws an exception as below
org.apache.lucene.store.LockObtainFailedException: Lock held by another program: /root/data/index/default/Event/write.lock
at org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:118)
at org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
at org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:776)
at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:126)
at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:92)
at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriter(AbstractWorkspaceImpl.java:117)
at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203)
at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:81)
at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46)
at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:166)
at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:152)
at java.lang.Thread.run(Thread.java:748)
Could you please let me know what is the right approach to use hibernate-search in microservices architecture.

There are several options. My recommendation is the first one, as it best fits the architectural spirit of Micro services.
Each micro service has an independent index: don't share the index directory.
Use the master/slave architecture described in the docs, so to have a single service write to the index - the other services will have to delegate to the single writer. Indexes can be replicated over network based filesystem, or using Infinispan.
Disable exclusive_index_use, a configuration property (also described in the docs). I'm listing this for completeness, but this is generally a bad idea; it's going to be much slower and it's going to be your responsibility that one service busy writing won't timeout another service needing to write.

Related

Proper logging in reactive application - WebFlux

last time I am thinking about proper using logger in our applications.
For example, I have a controller which returns a stream of users but in the log, I see the "Fetch Users" log is being logged by another thread than the thread on the processing pipeline but is it a good approach?
#Slf4j
class AwesomeController {
#GetMapping(path = "/users")
public Flux<User> getUsers() {
log.info("Fetch users..");
return Flux.just(...)..subscribeOn(Schedulers.newParallel("my-custom"));
}
}
In this case, two threads are used and from my perspective, not a good option, but I can't find good practices with loggers in reactive applications. I think below approach is better because allocation memory is from processing thread but not from spring webflux thread which potential can be blocking but logger.
#GetMapping(path = "/users")
public Flux<User> getUsers() {
return Flux.defer(() -> {
return Mono.fromCallable(() -> {
log.info("Fetch users..");
.....
})
}).subscribeOn(Schedulers.newParallel("my-custom"))
}

The normal thing to do would be to configure the logger as asynchronous (this usually has to be explicit as per the comments, but all modern logging frameworks support it) and then just include it "normally" (either as a separate line as you have there, or in a side-effect method such as doOnNext() if you want it half way through the reactive chain.)
If you want to be sure that the logger's call isn't blocking, then use BlockHound to make sure (this is never a bad idea anyway.) But in any case, I can't see a use case for your second example there - that makes the code rather difficult to follow with no real advantage.
One final thing to watch out for - remember that if you include the logging statement separately as you have above, rather than as part of the reactive chain, then it'll execute when the method at calltime rather than subscription time. That may not matter in scenarios like this where the two happen near simultaneously, but would be rather confusing if (for example) you're returning a publisher which may be subscribed to multiple times - in that case, you'd only ever see the "Fetch users..." statement once, which isn't obvious when glancing through the code.

How does the distributed executor service in Redisson work with regards to scoping / closuring?

If I push a Runnable to a redisson distributed executor service, what rules am I required to oblige by?
Surely , I can not have free reign, I do not see how that is possible, yet, it is not mention in the docs at all, nor are any rules apparently enforced by the API, like R extends Serializable or similar.
If I pass this runnable:
new Runnable(()-> {
// What can I access here, and have it be recreated in whatever server instance picks it up later for execution?
// newlyCreatedInstanceCreatedJustBeforeThisRunnableWasCreated.isAccissible(); // ?
// newlyComplexInstanceSuchAsADatabaseDriverThatisAccessedHere.isAccissible(); // ?
// transactionalHibernateEntityContainingStaticReferencesToComplexObjects....
// I think you get the point.
// Does Redisson serialize everything within this scope?
// When it is recreated later, surely, I can not have access to those exact objects, unless they run on the same server, right?
// If the server goes does and up, or another server executes this runnable, then what happens?
// What rules do we have to abide by here?
})
Also, what rules do we have to abide by when pushing something to a RQueue, RBlockingDequeu, or Redisson live objects?
It is not clear from the docs.
Also, would be great if a link to a single site documentation site could be provided. The one here requires a lot of clickin and navigation:
https://github.com/redisson/redisson/wiki/Table-of-Content

https://github.com/redisson/redisson/wiki/9.-distributed-services#933-distributed-executor-service-tasks
You can have an access to RedisClient and taskId. Full state of task object will be serialized.
TaskRetry setting applied to each task. If task isn't executed after 5 minutes since the moment of start then it will requeued.

I agree that the documentation is lacking some "under the hood" explanations.
I was able to execute db reads and inserts through the Callable/runnable that was submitted to the remote ExecutorService.
I configured a single Redis on a remote VM, the database and the app running locally on my laptop.
The tasks were executed without any errors.

Spring sleuth Runtime Sampling and Tracing Decision

I am trying to integrate my Application with Spring sleuth.
I am able to do a successfull integration and I can see spans getting exported to Zipkin.
I am exporting zipkin over http.
Spring boot version - 1.5.10.RELEASE
Sleuth - 1.3.2.RELEASE
Cloud- Edgware.SR2
But now I need to do this in a more controlled way as application is already running in production and people are scared about the overhead which sleuth can have by adding #NewSpan on the methods.
I need to decide on runtime wether the Trace should be added or not (Not talking about exporting). Like for actuator trace is not getting added at all. I assume this will have no overhead on the application. Putting X-B3-Sampled = 0 is not exporting but adding tracing information. Something like skipPattern property but at runtime.
Always export the trace if service exceeds a certain threshold or in case of Exception.
If I am not exporting Spans to zipkin then will there be any overhead by tracing information?
What about this solution? I guess this will work in sampling specific request at runtime.
#Bean
public Sampler customSampler(){
return new Sampler() {
#Override
public boolean isSampled(Span span) {
logger.info("Inside sampling "+span.getTraceId());
HttpServletRequest httpServletRequest=HttpUtils.getRequest();
if(httpServletRequest!=null && httpServletRequest.getServletPath().startsWith("/test")){
return true;
}else
return false;
}
};
}

people are scared about the overhead which sleuth can have by adding #NewSpan on the methods.
Do they have any information about the overhead? Have they turned it on and the application started to lag significantly? What are they scared of? Is this a high-frequency trading application that you're doing where every microsecond counts?
I need to decide on runtime whether the Trace should be added or not (Not talking about exporting). Like for actuator trace is not getting added at all. I assume this will have no overhead on the application. Putting X-B3-Sampled = 0 is not exporting but adding tracing information. Something like skipPattern property but at runtime.
I don't think that's possible. The instrumentation is set up by adding interceptors, aspects etc. They are started upon application initialization.
Always export the trace if service exceeds a certain threshold or in case of Exception.
With the new Brave tracer instrumentation (Sleuth 2.0.0) you will be able to do it in a much easier way. Prior to this version you would have to implement your own version of a SpanReporter that verifies the tags (if it contains an error tag), and if that's the case send it to zipkin, otherwise not.
If I am not exporting Spans to zipkin then will there be any overhead by tracing information?
Yes, there is cause you need to pass tracing data. However, the overhead is small.

How to use hazelcast lock

I have to upgrade an app which is using an old version of hazelcast to one of the more recent versions. There was some hazelcast lock functionality which has since been deprecated and removed altogether from the API. In particular, the old lock functionality worked as such:
Hazecast.getLock(myString);
The getLock function was a static method on Hazelcast. Now it is to be replaced with something like:
hazelcastInstance.getLock(myString);
...where the lock comes from one of the instances in the cluster.
My question is, can I use any one of the instances in the hazelcast cluster to get the lock? And if so, will this lock all instances?

Q1: Yes, you can use any one of the instances in the hazelcast cluster to get the lock (ILock).
You can think of ILock in hazelcast framework as the distributed implementation of java.util.concurrent.locks.Lock. For more details please see
http://docs.hazelcast.org/docs/3.5/javadoc/com/hazelcast/core/ILock.html
Q2: If you lock the critical section using an ILock, then the guarded critical section is guaranteed to be executed by only one thread in the entire cluster at a given point of time. Thus once the lock() method is called by say Thread1 in one the nodes, other threads (in other nodes as well) will wait until the lock is released.
Sample Code :
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
Lock testLock = hazelcastInstance.getLock( "testLock" );
testLock.lock();
try
{
// critical section code.
}
finally
{
testLock.unlock();
}

Why configure with Sagas()?

Why is it necessary to configure with Sagas()? I ask because I had been running a saga with raven persistence for the last few months before I noticed the Sagas() is not in the configure.with, in fact I realized I was missing a bit of the RavenPersistence stuff as well. Yet, as far as I know Sagas have been working 98% of the time and persisting to Raven. So I wonder what the Sagas() configuration does differently than not configuring it.
The reason I say 98% of the time is I do notice random messages falling out of a method and not sending the next message it is designated to send in the Saga. I am curious if not having the proper configuration is the cause of this.
_logger.InfoFormat("1.1 - Preparing Saga for; File: {0}", message.FileNumber);
//Creates Saga information
SetupSaga(uploads,
message.Documents,
message.ProcedureID.GetValueOrDefault(0),
file.Client.Id,
message.FileNumber,
message.Stage,
user);
_logger.InfoFormat("1.2 - Upload Saga Unique ID; File: {0}, UniqueID: {1}", message.FileNumber, Data.UniqueID);
Bus.SendLocal(new GetLoanInformation {
UniqueID = Data.UniqueID
});

The NServiceBus Host does a lot of configuration automatically based on roles and profiles. Both the Sagas configuration and the Raven persistence are handled for you automatically. You would only need to do this manually if you were going to run a Saga when self-hosting, which would be somewhat rare.
For a better idea of what happens as a result of all the different roles and profiles, check out All About NServiceBus Host Profiles and Roles. (Disclaimer: This is my blog post.)
The problem you're mentioning is due to something else, but a lot more information would be required to diagnose it.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

FileSystem lock while creating Indexer - lucene

Related

Proper logging in reactive application - WebFlux

How does the distributed executor service in Redisson work with regards to scoping / closuring?

Spring sleuth Runtime Sampling and Tracing Decision

How to use hazelcast lock

Why configure with Sagas()?

Categories

Resources