Could not create file store directory exception in grinder - jython

I am trying to setup an environment consisting of 10 agents, 1 process per agent, 10 threads per process.
The issue comes when I open 10 terminals/prompts and click on "start worker processes" in grinder after starting all the 10 agents. All the agents except for one or two, automatically exit stating this exception. I understand it is because all of them start simultaneously, but is there something I can do to make it work..?
Here is the exception
2013-10-31 14:36:41,469 INFO agent: communication shut down
2013-10-31 14:36:41,475 ERROR agent: Could not create file store directory
net.grinder.engine.agent.FileStore$FileStoreException: Could not create file store directory
at net.grinder.engine.agent.FileStore.getDirectory(FileStore.java:111) ~[grinder-core-3.11.jar:na]
at net.grinder.engine.agent.AgentImplementation.run(AgentImplementation.java:188) ~[grinder-core-3.11.jar:na]
at net.grinder.Grinder.run(Grinder.java:124) ~[grinder-core-3.11.jar:na]
at net.grinder.Grinder.main(Grinder.java:67) ~[grinder-core-3.11.jar:na]
Caused by: net.grinder.util.Directory$DirectoryException: Could not delete 'C:\Users\reddys.ADS\.\HYDCNU304BWC0-file-store\current\a360utility$py.clas
s'
at net.grinder.util.Directory.deleteContents(Directory.java:257) ~[grinder-core-3.11.jar:na]
at net.grinder.util.Directory.copyTo(Directory.java:473) ~[grinder-core-3.11.jar:na]
at net.grinder.engine.agent.FileStore.getDirectory(FileStore.java:101) ~[grinder-core-3.11.jar:na]
... 3 common frames omitted

Try following settings in grinder.properties:
grinder.processes = 10
grinder.threads = 10
And the just start one agent using startAgent.cmd or .sh script.

Related

Found error 'CRASH REPORT Process' on RabbitMQ in every 10 mins

I found error on RabbitMQ in every 10 mins. Please help me to investigate this problem.
Error message.
021-09-09 13:25:30.084 [error] <0.14464.32> CRASH REPORT Process <0.14464.32> with 0 neighbours crashed with reason: no function clause matching rabbit_mgmt_wm_node:find_type(rabbit#controller1, []) line 79
2021-09-09 13:25:30.085 [error] <0.14457.32> Ranch listener rabbit_web_dispatch_sup_15672, connection process <0.14457.32>, stream 1 had its request process <0.14464.32> exit with reason function_clause and stacktrace [{rabbit_mgmt_wm_no
I had the same issue with Zabbix monitoring the RabbitMQ server every minute which generated a crash-error with the same frequency.
The URL used by Zabbix to monitor contained a domain part to the node name ie. rabbit#my_host.zzz.aws instead of the actual node name as displayed by the console: rabbit#my_host. this explains why rabbit_mgmt_wm_node:find_type failed and crashed.
This was verified using curl as shown below:
curl -v -u user:passwd 'http://127.0.0.1:15672/api/nodes/rabbit#my_host?memory=true'
which returned a valid response, HTTP/1.1 200 OK, when the node name matched and the crash/error when it did not.
please refer to this thread:
https://groups.google.com/g/rabbitmq-users/c/N0EgrLn55XQ

Datanode error : NameSystem.getDatanode

Help please Folks
I am trying to set up my Hadoop multinode env (1 master, 1 secondary and 3 slaves - hadoop 2.7.1/Ubuntu 14 on AWS) and i am getting "NameSystem.getDatanode" ERROR message. I browsed and read and tried but reach my limits. Could you point me at least in some direction
Logs (extract) from master - xxx-141/142/143 are the ip of the slaves
'''''''''''''''''''''''''''''''''''
Line 134: 2016-01-23 17:36:19,432 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.143:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 135: 2016-01-23 17:36:19,457 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 172.31.22.141:50010 is expected to serve this storage.
Line 159: 2016-01-23 17:36:20,988 ERROR org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node DatanodeRegistration(XXX.XX.XX.141:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node XXX.XX.XX.143:50010 is expected to serve this storage.
Extract From SLAVE2 SERVER logs
'''''''''''''''''''''''''''''''''''''
2016-01-23 17:36:14,812 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
`2016-01-23 17:36:18,607 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Unsuccessfully sent block report 0x3c90bbfe60c, containing 1 storage report(s), of which we sent 0. The reports had 1 total blocks and used 0 RPC(s). This took 4 msec to generate and 144 msecs for RPC and NN processing. Got back no commands.
2016-01-23 17:36:18,608 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000 is shutting down
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.UnregisteredNodeException): Data node DatanodeRegistration(1XX.XX.XX.142:50010, datanodeUuid=6826238d-9213-4b19-a6eb-13115e3bea8d, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-57295bbd-e78e-4265-99f7-fdacccbcb33a;nsid=1674724909;c=0) is attempting to report storage ID 6826238d-9213-4b19-a6eb-13115e3bea8d. Node 1XX.XX.XX.141:50010 is expected to serve this storage.
at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanode(DatanodeManager.java:495)
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1791)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1315)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:163)
at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28543)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy13.blockReport(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:199)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:463)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:688)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:823)
at java.lang.Thread.run(Thread.java:745)
2016-01-23 17:36:18,610 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d) service to master/MAST.XX.XX.169:9000
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool BP-1050309752-MAST.XX.XX.169-1453113991010 (Datanode Uuid 6826238d-9213-4b19-a6eb-13115e3bea8d)
2016-01-23 17:36:18,611 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing block pool BP-1050309752-MAST.XX.XX.169-1453113991010
2016-01-23 17:36:20,611 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2016-01-23 17:36:20,613 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 0
2016-01-23 17:36:20,614 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ip-SLAV-XX-XX-142/1XX.XX.XX.142
************************************************************/`
it looks like you have three slaves
172.31.22.141:50010
172.31.22.142:50010
172.31.22.143:50010
and you created two of them from a clone of the first slave, after the slave was already included in the cluster.
The two clones now already have a copy of the DFS and use the same storage ID as the first slave. Only one slave with the same ID is expected by the name server. It is trying to tell you this by logging:
[...] is attempting to report storage ID [...].
Node [...]:50010 is expected to serve this storage.
You can try removing the dfs directory on two of the slaves, then restarting them.
i.e. stop the slaves, do a rm -rf on the dfs directory like:
rm -rf /tmp/hadoop-hadoop/dfs/
You can then restart and check that all slaves are connecting and test file replication, e.g. by setting the replication level to 4 for some files like:
hdfs dfs -setrep -w 4 -R /user/somedir
The -w option causes the command to wait until replication has succeeded.

WSO2 Samples failing

I'm working on setting up and understaind WSO2 ESB and i was going through the samples and set up.
I was looking at this sample, but well, any i tried from the first four failed:
http://docs.wso2.org/wiki/display/ESB451/Sample+3%3A+Local+Registry+Entry+Definitions%2C+Reusable+Endpoints+and+Sequences
So, i start the ESB (Management Console is running fine), that works fine. I can build the SimpleStockQuoteService and i can start the sample AXIS2 server. I can open the wsdl from the browser, so that piece looks fine.
When i run the client code from command line
ant stockquote -Daddurl=http://localhost:9000/services/SimpleStockQuoteService -Dtrpurl=http://localhost:8280/
It gets to the axis2 server (i can see it in the logs: "Mon Mar 11 16:53:37 CET 2013 samples.services.SimpleStockQuoteService :: Generating quote for : IBM")), it gets to the ESB (i can see it in the log, too), but suddenly, when it is trying to forward (?) or pass (?) the message, the connection is suddenly dropped. This is what i see in the log:
[2013-03-11 16:53:37,701] INFO - LogMediator Text = Sending quote request, version = 0.1, direction = incoming
[2013-03-11 16:53:37,830] ERROR - SourceHandler I/O error: A l├®tezo kapcsolatot a t├ívoli ├íllom├ís k├®nyszer├¡tetten bez├írta
java.io.IOException: A l├®tezo kapcsolatot a t├ívoli ├íllom├ís k├®nyszer├¡tetten bez├írta
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:93)
at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:113)
at org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput(DefaultNHttpServerConnection.java:150)
at org.apache.http.impl.nio.DefaultServerIOEventDispatch.inputReady(DefaultServerIOEventDispatch.java:154)
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:158)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:340)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:318)
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:278)
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)
at java.lang.Thread.run(Thread.java:619)
"A l├®tezo kapcsolatot a t├ívoli ├íllom├ís k├®nyszer├¡tetten bez├írta
java.io.IOException: A l├®tezo kapcsolatot a t├ívoli ├íllom├ís k├®nyszer├¡tetten bez├írta" This piece is having bad encoding and is in hungarian, it means something like :"Connection was closed forcefully by remote host"
I don't really know what is going wrong... Any ideas?
I'm on Windows 7. I've downloaded the latest WSO2 ESB (wso2esb-4.6.0.zip)

DCOM and OpenMPI

I did the DCOMCNFG with both the launch and the remote access permissions, granting my local logon on each node . Have OpenMPI_v1.6.1-x64 installed in root and remote machines. HAve specified the path of .exe in the target node. But while running the .exe from root node with mpirun. I am getting the following error:
D:\x64\Release>mpirun -np 2 -hostfile myhostfile.txt MPISample.exe
connecting to n1234
username:toney.mathew
password:********
Save Credential?(Y/N) n
[n1205:04420] Could not connect to namespace cimv2 on node n1234. Error code =-2147023174
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------
[n1205:04420] [[28225,0],0] ORTE_ERROR_LOG: A message is attempting to be sent t
o a process whose contact information is unknown in file ..\..\..\openmpi-1.6.1\
orte\mca\rml\oob\rml_oob_send.c at line 145
[n1205:04420] [[28225,0],0] attempted to send to [[28225,0],1]: tag 1
[n1205:04420] [[28225,0],0] ORTE_ERROR_LOG: A message is attempting to be sent t
o a process whose contact information is unknown in file ..\..\..\openmpi-1.6.1\
orte\orted\orted_comm.c at line 126
and to be more specific, i am using windows7-64bit os in both nodes,with same same user loged in.
turned off firewall, and it worked smoothly.

How do I use Nagios to monitor a log file

We are using Nagios to monitor our network with great success. However, we have a syslog for critical application errors and while I set up check_log, it doesn't seem to work as well as monitering a device.
The issues are:
It only shows the last entry
There doesn't seem to be a way to acknowledge the critical error and
return the monitor to a good state
Is nagios the wrong tool, or are we just not setting up the service monitering right?
Here are my entries
# log file
define command{
command_name check_log
command_line $USER1$/check_log -F /var/log/applications/appcrit.log -O /tmp/appcrit.log -q ?
}
# Define the log monitering service
define service{
name logfile-check ;
use generic-service ;
check_period 24x7 ;
max_check_attempts 1 ;
normal_check_interval 5 ;
retry_check_interval 1 ;
contact_groups admins ;
notification_options w,u,c,r ;
notification_period 24x7 ;
register 0 ;
}
define service{
use logfile-check
host_name localhost
service_description CritLogFile
check_command check_log
}
For monitoring logs with Nagios, typically the log checker will return a warning only for newly discovered error messages each time it is invoked (so it must retain some state in order to know to ignore them on subsequent runs). Therefore I usually set:
max_check_attempts 1
is_volatile 1
This causes Nagios to send out the alert immeidately, but only once, and then go back to normal.
My favorite log checker is logwarn, but I'm biased because I wrote it myself after not finding any existing ones that I liked. The logwarn package includes a Nagios plugin.
Nothing in your config jumps out at me as being misconfigured.
By design, check_log will only show either an OK message, or the last log entry that triggered an alert. If you need to see multiple entries, you'll need to modify the plugin.
However, I find the fact that you're not getting recoveries somewhat odd. The way check_log works (by comparing the current log to the previous version), you should get a recovery on the very next service check. Except of course, when there have been additional matching entries added to the log since the last check.
Does forcing another service check (or several) cause it to recover?
Also, I don't intend this in a mean way, but make sure it's really malfunctioning.
Is your log getting additional matching entries in between checks, causing it not to recover? Your check is matching "?" which will match anything new in the log. Is something else (a non-error) being added to the log and inadvertently causing a match?
If none of the above are the issue, I would suggest narrowing it down by taking Nagios out of the equation. Try running check_log manually (from the command line, but as the same user as nagios), and with a different oldlog. It should go something like this -
run check with a new "oldlog" - get initialization message
run check - check OK
make change to log
run check - check fails
run check - check OK
If this doesn't work, then you know to focus on the log, the oldlog, and how the check_log is doing the check.
If it works, then it points more towards a problem with your nagios configuration.
There is a Nagios plugin that you can use to check the log files: it's called check_logfiles and it's used to scan the lines of a file for regular expressions.
The following link shows how to install and configure check_logfiles for Nagios and Opsview:
https://www.opsview.com/resources/nagios-alternative/blog/syslog-monitoring-nagios-opsview
As there are many ways to achieve a goal, there is also a nice plugin from Consol available:
https://labs.consol.de/lang/en/nagios/check_logfiles/
supports regex
supports log rotation
To use it, you need a cfg file, this is an example for oracle databases
#searches = ({
tag => 'oraalerts',
options => 'sticky=28800',
logfile => '/u01/app/oracle/diag/rdbms/davmdkp/DAVMDKP1/trace/alert_DAVMDKP1.log',
criticalpatterns => [
'ORA\-0*204[^\d]', # error in reading control file
'ORA\-0*206[^\d]', # error in writing control file
'ORA\-0*210[^\d]', # cannot open control file
'ORA\-0*257[^\d]', # archiver is stuck
'ORA\-0*333[^\d]', # redo log read error
'ORA\-0*345[^\d]', # redo log write error
'ORA\-0*4[4-7][0-9][^\d]',# ORA-0440 - ORA-0485 background process failure
'ORA\-0*48[0-5][^\d]',
'ORA\-0*6[0-3][0-9][^\d]',# ORA-6000 - ORA-0639 internal errors
'ORA\-0*1114[^\d]', # datafile I/O write error
'ORA\-0*1115[^\d]', # datafile I/O read error
'ORA\-0*1116[^\d]', # cannot open datafile
'ORA\-0*1118[^\d]', # cannot add a data file
'ORA\-0*1122[^\d]', # database file 16 failed verification check
'ORA\-0*1171[^\d]', # datafile 16 going offline due to error advancing checkpoint
'ORA\-0*1201[^\d]', # file 16 header failed to write correctly
'ORA\-0*1208[^\d]', # data file is an old version - not accessing current version
'ORA\-0*1578[^\d]', # data block corruption
'ORA\-0*1135[^\d]', # file accessed for query is offline
'ORA\-0*1547[^\d]', # tablespace is full
'ORA\-0*1555[^\d]', # snapshot too old
'ORA\-0*1562[^\d]', # failed to extend rollback segment
'ORA\-0*162[89][^\d]', # ORA-1628 - ORA-1632 maximum extents exceeded
'ORA\-0*163[0-2][^\d]',
'ORA\-0*165[0-6][^\d]', # ORA-1650 - ORA-1656 tablespace is full
'ORA\-16014[^\d]', # log cannot be archived, no available destinations
'ORA\-16038[^\d]', # log cannot be archived
'ORA\-19502[^\d]', # write error on datafile
'ORA\-27063[^\d]', # number of bytes read/written is incorrect
'ORA\-0*4031[^\d]', # out of shared memory.
'No space left on device',
'Archival Error',
],
warningpatterns => [
'ORA\-0*3113[^\d]', # end of file on communication channel
'ORA\-0*6501[^\d]', # PL/SQL internal error
'ORA\-0*1140[^\d]', # follows WARNING: datafile #20 was not in online backup mode
'Archival stopped, error occurred. Will continue retrying',
]
});
I believe there's now a real Nagios plugin that monitors logs effectively.
http://support.nagios.com/forum/viewtopic.php?f=6&t=8851&p=42088&hilit=unixautomation#p42088
The home page of the Nagios plugin on that page is Nagios Log Monitor
Your [ commands.cfg file ] will contain:
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTNAME$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
OR
define command {
command_name NagiosLogMonitor
command_line $USER1$/NagiosLogMonitor $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ '$ARG5$' '$ARG6$' $ARG7$ $ARG8$ $ARG9$ $ARG10$
}
Your [ services.cfg file ] will look similar to:
define service {
check_command NagiosLogMonitor!logrobot!autofig!/var/log/proteus.log!15!500.html!500 Internal Server Error!1!2!-foundn
max_check_attempts 1
service_description 500_ERRORS_LOGCHECK
host_name sky.blat-01.net,sky.blat-02.net,sky.blat-03.net
use fifteen-minute-interval
}
Nagios now has a solution that integrates tightly with Nagios Core, XI, etc.
Nagios Log Server which can alert on any query on any log file on any system in your infrastructure.