I am getting the following, and before I take any action I need to know if this is actually excessive resources or if i should just increase my notification threshold.
lfd on sv1.server.com: Excessive resource usage: tendes (3222 (Parent PID:3222))
Time: Thu Oct 16 10:00:25 2014 -0400 Account: tendes
Resource: Process Time Exceeded: 64257 > 7200 (seconds)
Executable: /usr/bin/php Command Line: /usr/bin/php PID:
3222 (Parent PID:3222) Killed: No
Yes, I an see your tendes user php files are taking lot of time to execute and due to that you are getting this alert from your server firewall. You will have to check your user scrips OR add this user in /etc/csf/csf.pignore file so that you will get any alert of this user.
Related
I have a WebSphere Portal Version 8.5 Cluster on AIX 7.1 with multiple Virtual Portals, working with managed pages and each Virtual Portal has it's own libraries and one shared library for all VPs using syndication of that library to each VP.
i successfully created the syndication pair between the syndicator (WAS base portal) and the subscriber (Virtual Portal) and tested connection between them and all is good (make sense since VP are local on the same server). however when trying to syndicate the library content it stays on Queued status and in the SystemOut.log i see the following error log:
[4/25/17 9:33:53:201 IDT] 00004163 PackageConsum E Unexpected exception thrown while updating subscription: [IceId: Current State: ], exception: com.ibm.workplace.wcm.services.WCMServiceRuntimeException: code: 400
com.ibm.workplace.wcm.services.WCMServiceRuntimeException: code: 400
at com.aptrix.syndication.business.subscriber.CatalogRetrieverTask.getSourceCatalog(CatalogRetrieverTask.java:330)
at com.aptrix.syndication.business.subscriber.CatalogRetrieverTask.process(CatalogRetrieverTask.java:144)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask.processPackage(PackageConsumerTask.java:513)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask.processUpdate(PackageConsumerTask.java:267)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask$1.run(PackageConsumerTask.java:183)
at com.ibm.wps.ac.impl.UnrestrictedAccessImpl.run(UnrestrictedAccessImpl.java:84)
at com.ibm.wps.command.ac.ExecuteUnrestrictedCommand.execute(ExecuteUnrestrictedCommand.java:90)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask.doManagedWork(PackageConsumerTask.java:195)
at com.aptrix.syndication.business.ManagedTask.runWork(ManagedTask.java:62)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmWork.runImpl(AbstractWcmWork.java:162)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork.access$001(AbstractWcmSystemWork.java:40)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork$1.run(AbstractWcmSystemWork.java:92)
at com.ibm.wps.ac.impl.UnrestrictedAccessImpl.run(UnrestrictedAccessImpl.java:84)
at com.ibm.wps.command.ac.ExecuteUnrestrictedCommand.execute(ExecuteUnrestrictedCommand.java:90)
at com.ibm.workplace.wcm.services.repository.PACServiceImpl.runAsPrivileged(PACServiceImpl.java:1878)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork.runImpl(AbstractWcmSystemWork.java:87)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmWork.run(AbstractWcmWork.java:146)
at com.ibm.wps.services.workmanager.impl.WasWorkWrapper.run(WasWorkWrapper.java:44)
at com.ibm.ws.asynchbeans.J2EEContext$RunProxy.run(J2EEContext.java:271)
at java.security.AccessController.doPrivileged(AccessController.java:274)
at com.ibm.ws.asynchbeans.J2EEContext.run(J2EEContext.java:797)
at com.ibm.ws.asynchbeans.WorkWithExecutionContextImpl.go(WorkWithExecutionContextImpl.java:222)
at com.ibm.ws.asynchbeans.ABWorkItemImpl.run(ABWorkItemImpl.java:206)
at java.lang.Thread.run(Thread.java:804)
[4/25/17 9:33:53:222 IDT] 00004163 SyndicationEx W Unsuccessful request to send summary: 400
com.aptrix.deployment.wizard.SyndicatorCommunicationException: Unsuccessful request to send summary: 400
at com.ibm.workplace.wcm.api.syndication.SyndicationExtensionsServiceImpl.sendSummaryToSyndicator(SyndicationExtensionsServiceImpl.java:293)
at com.ibm.workplace.wcm.api.syndication.SyndicationExtensionsServiceImpl.processSubscriberCompleting(SyndicationExtensionsServiceImpl.java:246)
at com.aptrix.syndication.business.subscriber.SubscriberTaskManager.processFailedUpdate(SubscriberTaskManager.java:405)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask.processUpdate(PackageConsumerTask.java:400)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask$1.run(PackageConsumerTask.java:183)
at com.ibm.wps.ac.impl.UnrestrictedAccessImpl.run(UnrestrictedAccessImpl.java:84)
at com.ibm.wps.command.ac.ExecuteUnrestrictedCommand.execute(ExecuteUnrestrictedCommand.java:90)
at com.aptrix.syndication.business.subscriber.PackageConsumerTask.doManagedWork(PackageConsumerTask.java:195)
at com.aptrix.syndication.business.ManagedTask.runWork(ManagedTask.java:62)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmWork.runImpl(AbstractWcmWork.java:162)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork.access$001(AbstractWcmSystemWork.java:40)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork$1.run(AbstractWcmSystemWork.java:92)
at com.ibm.wps.ac.impl.UnrestrictedAccessImpl.run(UnrestrictedAccessImpl.java:84)
at com.ibm.wps.command.ac.ExecuteUnrestrictedCommand.execute(ExecuteUnrestrictedCommand.java:90)
at com.ibm.workplace.wcm.services.repository.PACServiceImpl.runAsPrivileged(PACServiceImpl.java:1878)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmSystemWork.runImpl(AbstractWcmSystemWork.java:87)
at com.ibm.workplace.wcm.services.workmanager.AbstractWcmWork.run(AbstractWcmWork.java:146)
at com.ibm.wps.services.workmanager.impl.WasWorkWrapper.run(WasWorkWrapper.java:44)
at com.ibm.ws.asynchbeans.J2EEContext$RunProxy.run(J2EEContext.java:271)
at java.security.AccessController.doPrivileged(AccessController.java:274)
at com.ibm.ws.asynchbeans.J2EEContext.run(J2EEContext.java:797)
at com.ibm.ws.asynchbeans.WorkWithExecutionContextImpl.go(WorkWithExecutionContextImpl.java:222)
at com.ibm.ws.asynchbeans.ABWorkItemImpl.run(ABWorkItemImpl.java:206)
at java.lang.Thread.run(Thread.java:804)
[4/25/17 9:33:53:227 IDT] 00004163 syndication I Syndication Summary - Subscriber
Syndicator: IntShared_Syn, URL=http://'Was_Server':10039/wps/wcm/connect?MOD=Synd
Subscriber: IntShared_Sub, URL=http://'Was_Server':10039/wps/wcm/connect/'VP_URL_Context'?MOD=Subs
Status: FAILED
Failure Detail: Update failed on subscriber
Unexpected exception thrown while updating subscription: [IceId: Current State: ], exception: com.ibm.workplace.wcm.services.WCMServiceRuntimeException: code: 400
Update Type: REBUILD
Start Date: Tue Apr 25 09:33:53 IDT 2017
Finished Date: Tue Apr 25 09:33:53 IDT 2017
Duration:
Total: 0
Total Failed: 0
[4/25/17 9:33:54:613 IDT] 00000136 syndication I Syndication Summary - Syndicator
Syndicator: IntShared_Syn, URL=http://'Was_Server':10039/wps/wcm/connect?MOD=Synd
Subscriber: IntShared_Sub, URL=http://'VP_HostName':10039/wps/wcm/connect?MOD=Subs
Status: FAILED
Failure Detail: Terminated without confirmation
Returned non-confirmed response: Not confirmed. Unable to contact subscriber. Check the subscriber to ensure it is active and error free. Also review your network connections and your syndication configuration to ensure the subscriber details are correct.
Update Type: REBUILD
Start Date: Tue Apr 25 09:33:53 IDT 2017
Finished Date: Tue Apr 25 09:33:54 IDT 2017
Duration: 1 second
Total: 0
Total Failed: 0
WCM Syndication requires HTTP Basis Authentication to be configured and working.
then I needed to make sure that Trust Association is enabled in WAS Console under Security -> Global Security -> Web and SIP security -> Trust association.
confirmed that the box that says Enable trust association is checked.
also ensured the Interceptor com.ibm.portal.auth.tai.HTTPBasicAuthTAI is created and the configuration were correct.
the cause of the error was that in the fields of urlBlackList and urlWhiteList there was use of the variable ${WpsContextRootPath} which i found out that it is not set anywhere so i change it to /wps instead and now the fields are as follow:
urlBlackList = /wps/myportal*
urlWhiteList = /wps/mycontenthandler*
after Restarting the server and retry syndication - it works!.
also you may follow the direction in this link:
https://developer.ibm.com/answers/questions/206675/why-do-i-see-occasionally-see-a-popup-box-with-a-t.html
but setting these parameters disabled the servlet of vieweing all items in the libraries...
You can try using the ip address instead of the hostname. or Try adding the VP context to the syndicator/subscriber URLs.
I am using the Cumulocity java agent (7.38.0) and it apparently lost communication with the server somehow and never recovered. The admin interface says:
LAST COMMUNICATION
November 22, 2016 2:25 AM
and last cumulo record in the the device syslog was:
Nov 22 01:25:47 localhost root: 01:25:47.166 [CumulocityLongPollingTransport-scheduler-2] WARN c.c.s.c.n.ConnectionHeartBeatWatcher - canceling the long poll request because of inactivity
(there was 1 hour time diff due to some device config prob.)
process looks running anyways:
ps -ef | grep -i c8y
root 1341 1257 0 Nov19 ? 00:00:00 /bin/sh ./c8y-agent.sh
root 1342 1341 0 Nov19 ? 00:00:00 /bin/sh ./c8y-agent.sh
root 1344 1342 0 Nov19 ? 00:25:39 java -cp cfg/*:lib/* -Dlogback.configurationFile=cfg/logback.xml c8y.lx.agent.Agent
Has anyone seen this prob before?
We had it once or twice when people were connecting to cumulocity via firewall or vpn. The result was exactly as you described: the polling gets stuck after some time, like if connections were blocked. In other words i would suspect that it’s a proxy that’s blocking the reconnect.
I have had a node web service running successfully on an aws ubuntu server for over a month, with the requests cached using redis.
Yesterday I started getting the following error from some of my routes:
MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.
I was able to stop the error occurring by using:
config set stop-writes-on-bgsave-error no
as suggested in the answers to this question, but it doesn't actually solve the underlying problem.
To find the underlying problem I checked the logs and found the following had started happening:
[1105] 09 Aug 13:17:14.800 - 0 clients connected (0 slaves), 797680 bytes in use
[1105] 09 Aug 13:17:15.101 * 1 changes in 900 seconds. Saving...
[1105] 09 Aug 13:17:15.101 * Background saving started by pid 28090
[28090] 09 Aug 13:17:15.101 # Failed opening .rdb for saving: Permission denied
[1105] 09 Aug 13:17:15.201 # Background saving error
Over the weekend no one had been using the server, but before the weekend the logs were fine, and we were getting no errors:
[12521] 06 Aug 04:49:27.308 - 0 clients connected (0 slaves), 803352 bytes in use
[12521] 06 Aug 04:49:29.012 * 1 changes in 900 seconds. Saving...
[12521] 06 Aug 04:49:29.012 * Background saving started by pid 26663
[26663] 06 Aug 04:49:29.014 * DB saved on disk
[26663] 06 Aug 04:49:29.014 * RDB: 2 MB of memory used by copy-on-write
[12521] 06 Aug 04:49:29.112 * Background saving terminated with success
As I said, no one has touched this server in the intervening time.
Looking around for people having the same problem I found this question. I checked the ownership and permissions on the directory and db file as suggested in the answers there:
drwxr-xr-x 2 redis redis 26 Aug 6 06:55 redis
-rw-r--r-- 1 redis redis 18 Aug 6 06:55 dump-6379.rdb
The permissions and ownership both look ok to me, but I have noticed that the date on the file and folder is between the last time I saw the service working and the first time it failed. Unfortunately that hasn't really helped me with what to do next and I am at a bit of a loss.
I am looking for suggestions for next steps to find the cause of the problem, or at least a way of making redis able to write again.
I want to use monit to kill a process that uses more than X% CPU for more than N seconds.
I'm using stress to generate load to try a simple example.
My .monitrc:
check process stress
matching "stress.*"
if cpu usage > 95% for 2 cycles then stop
I start monit (I checked syntax with monit -t .monitrc):
monit -c .monitrc -d 5
And I launch stress:
stress --cpu 1 --timeout 60
Stress shows in top as using 100 %CPU.
I'd expect monit to kill stress in about 10 seconds, but stress completes successfully. What am I doing wrong?
I also tried monit procmatch "stress.*", which shows two matches for some reason. Maybe that's relevant?
List of processes matching pattern "stress.*":
stress --cpu 1 --timeout 60
stress --cpu 1 --timeout 60
Total matches: 2
WARNING: multiple processes matched the pattern. The check is FIRST-MATCH based, please refine the pattern
EDIT: Tried e.lopez's method
I had to remove the start statement from .monitrc because it was causing a error in monit ('stress' failed to start (exit status -1) -- Program /usr/bin/stress timed out and then a zombie process).
So launched stress manually:
stress -c 1
stress: info: [8504] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
The .monitrc:
set daemon 5
check process stress
matching "stress.*"
stop program = "/usr/bin/pkill stress"
if cpu > 5% for 2 cycles then stop
Launched monit:
monit -Iv -c .monitrc
Starting Monit 5.11 daemon
'xps13' Monit started
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check skipped (initializing)
'stress'
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check succeeded [current cpu usage=0.0%]
'stress' process is running with pid 8504
'stress' zombie check succeeded [status_flag=0000]
'stress' cpu usage check succeeded [current cpu usage=0.0%]
'stress' process is not running
'stress' trying to restart
'stress' start skipped -- method not defined
Monit sees the right process (pids match), but sees 0% usage (stress is using 1 cpu at 100% per top). I killed stress manually, which is when monit says the process is not running (at the end, above). So, monit is monitoring the process fine, but isn't seeing the right cpu usage.
Any ideas?
Note that if your system has many cores, the fact that you stress just one of them (cpu 1) will not stress the whole system. In my tests with a i7 Processor, stressing the CPU 1 to 95% just stresses the total System to 12.5%.
Depending on the number of cores, you might want to use accordingly:
monit -c X
where X is the amount of cores you want to stress.
But this is not your main issue. Your problem is that you do not provide monit with a stop instruction for the stress programm. Look at this:
check process stress
matching "stress.*"
start program = "/usr/bin/stress -c 1" with timeout 10 seconds
stop program = "/usr/bin/pkill stress"
if cpu > 5% for 2 cycles then stop
You are missing at least the "stop" line, where you define the command which will be used by monit to actually stop the process. As stress is not a service, you might want to use the pkill instruction in order to kill the process.
I tested the above configuration successfully. Output of the monit.log:
[CET Nov 5 09:03:02] info : 'stress' start action done
[CET Nov 5 09:03:02] info : 'Overlord' start action done
[CET Nov 5 09:03:12] info : Awakened by User defined signal 1
[CET Nov 5 09:03:22] error : 'stress' cpu usage of 12.5% matches resource limit [cpu usage<5.0%]
[CET Nov 5 09:03:32] error : 'stress' cpu usage of 12.4% matches resource limit [cpu usage<5.0%]
[CET Nov 5 09:03:32] info : 'stress' stop: /usr/bin/pkill
So: Assuming you are just willing to test, hence the CPU-Usage is not relevant, just use the confg I provided above. Once you are sure your config works, adjust the resource limits for the processes you would like to monitor in a production environment.
Always have at hand: https://mmonit.com/monit/documentation/
Hope it helps.
Regards
I think the reason why you're seeing 0% cpu is because stress -c 1 creates two processes - one "worker" process which will create the load and second mostly idle background process (open htop and filter for stress to see the second process).
If a regex matches more than one process, monit will pick the process with the longest uptime (check the monit doc) - for me the background process always had a longer uptime than the "worker" process.
You can mitigate this by using stress-ng. Here the "worker" process has a distinct name so there is no ambiguity when matching.
stress-ng -c 1
works with the following .monitrc file
set daemon 5
check process stress
matching "stress-ng-cpu"
stop program = "/usr/bin/pkill stress-ng"
if cpu > 5% for 2 cycles then stop
I have installed OpenShift Origin V3 on aws ec2(Fedora19) using oo-install.The set up is One Broker +One Node.
I was making some modifications to the security groups to make it more restrictive -
and it ended up some issues in the mongo service.
1.service mongod does not start up and the status shows failed.
The /var/log/mongodb/mongodb.log says
Thu Mar 6 11:24:08.189 [initandlisten] ERROR: listen(): bind() failed errno:99 Cannot assign requested address for socket: :27017
Thu Mar 6 11:24:08.189 [initandlisten] now exiting
Running oo-accept-broker -v says
FAIL: error logging into mongo db: MOPED: Retrying connection to primary for replica set :27017">]>: MOPED: Retrying connection to primary for replica set :27017">]>/MOPED: --username Retrying, exit code: 1
Any pointers on how to resolve this will be greatly appreciated.
Thanks
Shabna
I would try rolling back your changes to the security groups first and then make the changes one by one and see which one causes the issue, then post that to stack and see if anyone can comment on the specific change that is affecting mongodb.