Inconsistent behavior of Quartz2 scheduler in Apache Camel - apache

I have an Apache Camel project that is using Quartz2 as the scheduler. The requirement is to make it a cluster. The code is deployed to weblogic 12c. the quartz is configured as per many samples with clustering enabled.
This is my properties file (without the datasource)
org.quartz.scheduler.instanceName = MyScheduler
org.quartz.scheduler.instanceId = AUTO
org.quartz.scheduler.skipUpdateCheck = true
org.quartz.scheduler.jobFactory.class = org.quartz.simpl.SimpleJobFactory
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
org.quartz.jobStore.misfireThreshold = 60000
org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.useProperties=true
org.quartz.JobBuilder.requestRecovery=true
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval = 20000
When I deploy and start both nodes I see that the QRTZ_SCHEDULER_STATE table has extra entry for one of the nodes:
MyScheduler-routerContext server_node21567108546690
MyScheduler-routerContext-1 server_node11565896495100
MyScheduler-routerContext-1 server_node11567108547295
And I am guessing because of that the one node is being called once in a while while the other node gets called all the time (so occasionally both nodes are invoked at the same time).
I have tried to do a clean restart of weblogic nodes but the issue is still there
This is how my route(s) look like:
from("quartz2://provRegGroup/createUsersTrigger?cron={{create_users_cron}}&job.name=createUsersJob")
.routeId("createUsersRB")
.log("**** starting check for create users");
//where
//create_users_cron=0+0,5,10,15,20,25,30,35,40,45,50,55+*+*+*+?
//expecting one node being called by the scheduler at a time..

I figured out what caused the issue. apparently there were orphan weblogic processes that were running on one (or even both nodes) - this would be a question to our tech archs - why this was such a mess.. ps was showing two weblogic servers running on a node - one that I started recently and one that was there for say a month..
expecting this would never happen to production environment I assume the issue has been resolved..

Related

Why Apache Ignite Cache.replace-K-V-V api call performing slow?

We are running Ignite cluster with 12 nodes running on Ignite 2.7.0 on openjdk
1.8 at RHEL platform.
Seeing heavy cputime spent with https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#replace-K-V-V-
We are witnessing slowness with one of our process and when we tried to drill it
further by profiling the JVM, the main culprit (taking ~78% of total time)
seems to be coming from Ignite cache.repalce(K,V,V) api call.
Out of 77.9 by replace, 39% is taken by GridCacheAdapater.equalVal and 38.5%
by GridCacheAdapter.put
Cache is Partitioned and ATOMIC with readThrough,writeThrough,writeBehindEnabled set to True.
Attaching the profiling snapshot of one node(similar is the profiling result on other nodes), Can someone please check and suggest what
could be the cause OR some known performance issue with this Ignite version related to cache.replace(k,v,v) api ?
JVM Prolfiling Snapshot of one node
I guess that it can be related to next issue:
https://issues.apache.org/jira/browse/IGNITE-5003
The problem there related to the operations for the same key before the previous batch of updates (that contains this key) will be stored in the database.
As I see it should be added to Ignite 2.8.
Update:
I tested putAll operation. From the next two pictures you can see that putAll waiting for GridCacheWriteBehindStore.write (two different threads) that contains updateCache:
public void write(Entry<? extends K, ? extends V> entry) {
try {
if (log.isDebugEnabled())
log.debug(S.toString("Store put",
"key", entry.getKey(), true,
"val", entry.getValue(), true));
updateCache(entry.getKey(), entry, StoreOperation.PUT);
}
And provided issue can affect your put operations (or replace as well).

How to start and stop multiple weblogic managed servers at one go through WLST

I am writing a code to start , stop, undeploy and deploy my application on weblogc.
My components need to be deployed on few managed servers.
When I do new deployments manually I can start and stop the servers in parallel, by ticking multiple boxes and selecting start and stop from the dop down. See below.
but when trying from WLST, i could do that in one server at a time.
ex:
start(name='ServerX',type='Server',block='true')
start(name='ServerY',type='Server',block='true')
shutdown(name='ServerX',entityType='Server',ignoreSessions='true',timeOut=600,force='true',block='true')
shutdown(name='ServerY',entityType='Server',ignoreSessions='true',timeOut=600,force='true',block='true')
Is there a way I can start stop multiple servers in once command?
Instead of directly starting and stopping servers, you create tasks, then wait for them to complete.
e.g.
tasks = []
for server in cmo.getServerLifeCycleRuntimes():
# to shut down all servers
if (server.getName() != ‘AdminServer’ and server.getState() != ‘RUNNING’ ):
tasks.append(server.start())
#or to start them up:
#if (server.getName() != ‘AdminServer’ and server.getState() != ‘SHUTDOWN’ ):
# tasks.append(server.shutdown())
#wait for tasks to complete
while len(tasks) > 0:
for task in tasks:
if task.getStatus() != ‘TASK IN PROGRESS’ :
tasks.remove(task)
java.lang.Thread.sleep(5000)
I know this is an old post, today I was reading this book "Advanced WebLogic Server Automation" written by Martin Heinzl so in the page 282 I found this.
def startCluster(clustername):
try:
start(clustername, 'Cluster')
except Exception, e:
print 'Error while starting cluster', e
dumpStack()
I tried it and it started managed servers in parallel.
Just keep in mind the AdminServer must be started first and your script must connect to the AdminServer before trying it.
Perhaps this would not be useful for you as the servers should be in a cluster, but I wanted to share this :)

Apache Ignite - Distibuted Queue and Executors

I am planning to use Apache Ignite Distributed Queue.
I am using Ignite with a spring boot application. So, on bootup, I will be adding 20 names in a queue. But, since there are 3 servers in a cluster, the same 20 names gets added 3 times. But, i want to add them only once in the queue.
Ignite ignite = Ignition.ignite();
IgniteQueue<String> queue = ignite.queue(
"queueName", // Queue name.
0, // Queue capacity. 0 for unbounded queue.
null // Collection configuration.
);
Distributed executors, will be able to poll from the queue and run the task. Here, the executor is expected to poll, run the task and then add the same name to the queue. Trying to achieve round robin here.
Only one executor should be running the same task at any point of time, though there are multiple servers in a cluster.
Any suggestion for this.
You can launch ignite cluster singleton service https://apacheignite.readme.io/docs/cluster-singletons which will fill data to queue. Also you can adding data from coordinator node (oldest node in cluster) ignite.cluster().forOldest().node().isLocal()
I fixed bootup time duplicate cache loading issue this way:
final IgniteAtomicLong cacheLoadCnt = ignite.atomicLong(cacheName + "Cnt", 0, true);
if (cacheLoadCnt.get() == 0) {
loadCache();
cacheLoadCnt.addAndGet(1);
}

How to monitor GlassFish thread pool via asadmin interface

I'm trying to use the asadmin interface to monitor a thread-pool on GlassFish 3.1.1. I'm executing the following command:
asadmin get -m server.network.my-listener.thread-pool.*
and I'm getting data back, but most of it has lastsampletime = -1 (so the related data is zero; and is worthless).
Note: I've also tried the REST interface, which I believe asadmin delegates to, and the JMX interface. Same problem: much of the data has lastsampletime = -1.
I've already turned monitoring to HIGH for all modules. What am I missing?
It seems like redeploying my application was necessary for the monitoring to actually get values. Perhaps I interpreted the manual incorrectly but it seems to suggest that a restart/redeploy wouldn't be required:
Oracle GlassFish Server 3.1 Administration Guide
Also, it is weird that the following shows there is no monitoring data:
asadmin get -m server.thread-pools.thread-pool.http-thread-pool.*
Instead you must go through a specific network listener like:
asadmin get -m server.network.http-listener-2.thread-pool.*
It also took me by surprise that enabling thread-pool monitoring IS NOT enough to see thread pool statistics. You must also enable http-service monitoring:
asadmin enable-monitoring
asadmin set server.monitoring-service.module-monitoring-levels.thread-pool=HIGH
asadmin set server.monitoring-service.module-monitoring-levels.http-service=HIGH
That's all you should need to do.
Enable monitoring, set to HIGH, for the http-service module on the DAS, stand-alone instance, or cluster you want to monitor.
Deploy an app to the DAS, stand-alone instance, or cluster and make http-requests.
asadmin get -m *instancename*.network.*listener*.thread-pool.*
Looks like you are monitoring DAS, since you are using asadmin get -m server.network.my-listener.thread-pool.*.
I deployed a simple war to DAS and made a bunch of http requests. I see the corethreads-count and maxthreads-count have last sample time as -1. And the remaining statistics have actual last sample times.
asadmin get -m "server.network.http-listener-1.thread-pool.*"
server.network.http-listener-1.thread-pool.corethreads-count = 0
server.network.http-listener-1.thread-pool.corethreads-description = Core number of threads in the thread pool
server.network.http-listener-1.thread-pool.corethreads-lastsampletime = -1
server.network.http-listener-1.thread-pool.corethreads-name = CoreThreads
server.network.http-listener-1.thread-pool.corethreads-starttime = 1320764890444
server.network.http-listener-1.thread-pool.corethreads-unit = count
server.network.http-listener-1.thread-pool.currentthreadcount-count = 5
server.network.http-listener-1.thread-pool.currentthreadcount-description = Provides the number of request processing threads currently in the listener thread pool
server.network.http-listener-1.thread-pool.currentthreadcount-lastsampletime = 1320765351708
server.network.http-listener-1.thread-pool.currentthreadcount-name = CurrentThreadCount
server.network.http-listener-1.thread-pool.currentthreadcount-starttime = 1320764890445
server.network.http-listener-1.thread-pool.currentthreadcount-unit = count
server.network.http-listener-1.thread-pool.currentthreadsbusy-count = 0
server.network.http-listener-1.thread-pool.currentthreadsbusy-description = Provides the number of request processing threads currently in use in the listener thread pool serving requests
server.network.http-listener-1.thread-pool.currentthreadsbusy-lastsampletime = 1320765772814
server.network.http-listener-1.thread-pool.currentthreadsbusy-name = CurrentThreadsBusy
server.network.http-listener-1.thread-pool.currentthreadsbusy-starttime = 1320764890445
server.network.http-listener-1.thread-pool.currentthreadsbusy-unit = count
server.network.http-listener-1.thread-pool.dotted-name = server.network.http-listener-1.thread-pool
server.network.http-listener-1.thread-pool.maxthreads-count = 0
server.network.http-listener-1.thread-pool.maxthreads-description = Maximum number of threads allowed in the thread pool
server.network.http-listener-1.thread-pool.maxthreads-lastsampletime = -1
server.network.http-listener-1.thread-pool.maxthreads-name = MaxThreads
server.network.http-listener-1.thread-pool.maxthreads-starttime = 1320764890443
server.network.http-listener-1.thread-pool.maxthreads-unit = count
server.network.http-listener-1.thread-pool.totalexecutedtasks-count = 31
server.network.http-listener-1.thread-pool.totalexecutedtasks-description = Provides the total number of tasks, which were executed by the thread pool
server.network.http-listener-1.thread-pool.totalexecutedtasks-lastsampletime = 1320765772814
server.network.http-listener-1.thread-pool.totalexecutedtasks-name = TotalExecutedTasksCount
server.network.http-listener-1.thread-pool.totalexecutedtasks-starttime = 1320764890444
server.network.http-listener-1.thread-pool.totalexecutedtasks-unit = count
Command get executed successfully.
To instantly enable monitoring without restart use enable-monitoring command
enable-monitoring
enable-monitoring --modules jvm=LOW
enable-monitoring --modules thread-pool=HIGH
enable-monitoring --modules http-service=HIGH
enable-monitoring --modules jdbc-connection-pool=HIGH
The trick is that thread-pool and http-service modules must have high level to get monitoring info.
For more info refer https://docs.oracle.com/cd/E26576_01/doc.312/e24928/monitoring.htm#GSADG00558

Start a service from another computer

I have a project in which we would like to do the following :
Install a service that perform several tasks. this would be put on Computer A and B
Another computer C serves as a witness;
At start, only A is running because the work it performs cannot be duplicated;
Should A fail, then B must start. C should be the one that verify is one is running or not;
Sholud A return back up after a fail, then B still runs and A is on stand-by;
Should B then fail, C start A services,
And So On.
Is it possible, if so how ? Both A and B have a SQL server 2008 on them but this part is taken care of for us.
Thanks a lot.
EDIT : I tried stopping a service (that I know is running) and it dosn't seem to work :
Dim path As ManagementPath = New ManagementPath
path.Server = System.Environment.MachineName
path.NamespacePath = "root\CIMV2"
path.RelativePath = "Win32_service.Name='" + strServiceName + "'"
Dim service As ManagementObject = New ManagementObject(path)
Dim temp As ManagementBaseObject = service.InvokeMethod("StopService", Nothing, Nothing)
In this case, strServiceName is "CommunicationInterface" which is a service I recently add and started manually.
I am running under windows 7.
Just set the services to not start automatically and then have the C computer start them as needed using the WMI class Win32_Service. You can also use this class to query if the services are running or not.
Look at the StartService and StopService methods as well as the State property.
Here's some sample code: Service Management in VB.NET
You should look into Windows Server 2008 Failover Clusters. From that page:
A failover cluster is a group of
independent computers that work
together to increase the availability
of applications and services. The
clustered servers (called nodes) are
connected by physical cables and by
software. If one of the cluster nodes
fails, another node begins to provide
service (a process known as failover).
Users experience a minimum of
disruptions in service.