asadmin list-instances cluster glassfish - glassfish

When i run this command asadmin list-instances i get this result, someone have an ideas what does this mean?
[glassfish#mydas]$ asadmin list-instances
I1 not running [pending config changes are: _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/admin-ear/admin-ear-13308077918078249404.0.ear; _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-12940026351961817647.0.ear; _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-11974752653489746292.0.ear; ]
I2 not running [pending config changes are: _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/admin-ear/admin-ear-13308077918078249404.0.ear; _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-12940026351961817647.0.ear; _deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-11974752653489746292.0.ear; ]
Command list-instances executed successfully.
I know that i have two instances of my cluster and not running, but i mean this lines here:
[pending config changes are: _deploy
/opt/glassfish3/glassfish/domains/D/applications/__internal/admin-ear/admin-ear-13308077918078249404.0.ear;
_deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-12940026351961817647.0.ear;
_deploy /opt/glassfish3/glassfish/domains/D/applications/__internal/comptabilite-ear/comptabilite-ear-11974752653489746292.0.ear;
]
I checked this file /opt/glassfish3/glassfish/domains/D/applications/__internal and i removed all the files but i get the same result.
And how can i empty all this to get a clear message like this :
I1 not running
I2 not running
Thank you.

The message means that you did some configuration changes for the instances via domain admin server (DAS), but the instances have not been started since then. That means that the remote instances don't know about those configuration changes and will trigger synchronization from the DAS to apply changes when started. Until they can connect to the DAS, those changes will not be applied.
In your case it seems that you have deployed 3 EARs, and you specified either to deploy them on all targets, or the deploy targets include the 2 instances. Therefore the EARs will be deployed to both instances once the config is synchronized (after you start the instances).
Files in applications/__internal are the files of the EAR applications, removing them only corrupts the applications, but doesn't undeploy them. Undeploy would be triggered only if you deployed the applications by dropping the to the autodeploy directory, but not if you deploy using asadmin or admin console. If you open config/domain.xml file, you should still be able to see references to all the 3 applications somewhere, even after you deleted the application files.
In order to hide the messages in list-instances, you should properly undeploy all the 3 applications to remove them from the configuration, or at least remove both instances from their deployment targets, so that they only remain deployed on the DAS (but that is probably not what you want usually).
If you want the application to be deployed on the instances, you need to start the instances to synchronize the configuration with the DAS.

Try the following:
asadmin start-instance --sync full I1
asadmin start-instance --sync full I2
This should resynchronize your instances with the DAS.
If this doesn't help you can try the following:
asadmin list-instances --long=true
This should list the failed commands in detail. You can connect to the specific instances via SSH and execute the commands manually, this should apply the pending changes. You may have to restart the instances afterwards to make them synchronize the status with the DAS.
See also:
Glassfish Admin Guide - Resynchronizing GlassFish Server Instances and the DAS
Glassfish Admin Guide - list-instances
GLASSFISH-13213 - failed commands should not be listed as pending changes

Related

RabbitMQ SSL Clustering on Windows Breaks rabbitmqctl

I'm trying to follow the steps in the RabbitMQ docs here to get clustering with SSL working on Windows. I'm noticing though that the "rabbitmqctl status" command starts failing after the environment variables defined in those steps are set. I'm getting the following error when executing "rabbitmqctl status":
Error: unable to connect to node 'rabbit#server1': nodedown
I've already configured RabbitMQ to use TLS 1.2 and have verified that it's working. I've ensured that my Erlang 18 cookie is the same in the user directory C:\users\me and C:\Windows on the machine, but the error persists, and is stopping other servers from clustering with it. The docs say that the Windows SSL Cluster setup is "Coming soon"... Here are the steps I've taken so far on server1. I think that Erlang wants forward slashes in the paths - this matches the rabbit.config SSL settings.
Combined the contents of my server\cert.pem and server\key.pem into rabbit.pem via the command "type server\cert.pem server\key.pem > server\rabbit.pem"
Created environment variable ERL_SSL_PATH and set to: "C:/Program
Files/erl7.0/lib/ssl-7.0/ebin"
Created environment variable RABBITMQ_CTL_ERL_ARGS and set to: -pa "%ERL_SSL_PATH%" -proto_dist inet_tls -ssl_dist_opt server_certfile C:/OpenSSL-Win64/server/rabbit.pem -ssl_dist_opt server_secure_renegotiate true client_secure_renegotiate true
Created environment variable RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS and set to same value as RABBITMQ_CTL_ERL_ARGS
Copied the erlang cookie at C:\Windows.erlang.cookie to my local user profile directory.
Restarted rabbit using rabbitmq-service start
At this point, on server1, "rabbitmqctl status" no longer works. Attempts to try to join server2 to server1 result in a "node down" error.
Edit 1: I can't get the initial step in the docs working to ask Erlang to report its SSL directory on Windows in order to set ERL_SSL_PATH correctly. Erlang is installed at C:\Program Files\erl7.0 on my server.
Edit 2: Using werl.exe (at C:\Program Files\erl7.0\bin\werl.exe), I was able to issue a command "Foo=io:format(code:lib_dir(ssl, ebin))." and it reported the path as: c:/Program Files/erl7.0/lib/ssl-7.0/ebin. However, this doesn't seem to be the cause of the this issue since that's already what I was using.
Thanks,
Andy
For environment changes to take effect on Windows, the service must be
re-installed. It is not sufficient to restart the service. This can be
done using the installer or on the command line with administrator
permissions
(source)
This will do:
rabbitmq-service.bat stop
rabbitmq-service.bat remove
rabbitmq-service.bat install
rabbitmq-service.bat start
Also, if while the node you're working on is down, the other cluster nodes were running, their state might be assumed to have gone out of sync. In that case, the node might fail to start up and you might need to:
rabbitmqctl force_boot
Check the logs to confirm. (at %RABBIT_BASE%\log\rabbit#server.log)
Late answer but, hopefully this could help a searcher...

Hosts in Nagios are disappearing

This may belong in ServerFault, but I wanted to approach this community first. If this is not correct, please move this thread or close and I will open on the correct thread.
PROBLEM:
Hosts, along with their associated services, disappear and reappear upon refresh (F5 / Ctrl+F5 / etc).
STEPS TO REPRODUCE:
1. Log into Nagios
2. Click Service Detail
3. See a breakdown of services but you don't see the last one you added.
4. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
5. Refresh screen by using F5 / Ctrl+F5 / etc and it doesn't show up still
6. Refresh screen and it will show up.
(!) - Steps 4-6 vary
WHAT I'VE TRIED:
Restarting Nagios service (service Nagios restart)
Restarting HTTPD service (service httpd restart)
Restarting VPS
Refresh browser including "Clear Cache and Hard Reload"
Tried different browsers
Tried different computers
Tried different networks
SCREENSHOTS:
GOOD
https://i.imgur.com/KUW5C6E.png
BAD
https://i.imgur.com/rWFLEaf.png
POSSIBLE CAUSE:
The reason we're in this situation now is because we had an intern add this latest host and its associated service. He added it correctly, and I even checked his work. He did the normal preflight but instead of issuing the reset command via SSH he issued the command on the Web interface itself by accessing "Process Info > Restart the Nagios process". Seems like it would work OK, but we've never restarted like this and is the only reason I suspect it's the culprit of the issue we are seeing. Is there something different that this restart does over the normal SSH restart?
EDIT: To add to all of this, we have updated a different file today, unrelated to this host or it's services and Nagios is not updating.
Thanks for helping!
Rich
EXTRA:
Here is a screenshot of the config file:
https://i.imgur.com/2UsYZcw.png
This can happen if you have multiple Nagios services running, There could be a secondary instance of the service running which hasn't been updated with the new configuration files as it technically hasn't been restarted. I've had this happen once or twice.
First, shut down Nagios
service nagios stop
Next, kill all remaining instances.
killall -9 nagios
Finally, start Nagios back up
service nagios start
That should fix your problem.

ERROR: The overall deployment failed because too many individual instances failed deployment

I'm trying to deploy using CircleCI -> S3 -> CodeDeploy -> EC2.
I was able to upload deploy image onto S3 from CircleCI, but unable to deploy S3 to EC2 instance. Here's the error.
The overall deployment failed because too many individual instances
failed deployment, too few healthy instances are available for
deployment, or some instances in your deployment group are
experiencing problems. (Error code: HEALTH_CONSTRAINTS)
The error was provided from CodeDeploy. I can't figure out why and how.
I'd appreciate if you give some advise.
If you are running on Ubuntu there might be plenty of reasons, here is a checklist can verify
Check code-deploy agent is installed on your EC2 Instance. Please refer this document to install code deploy agent.
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent-operations-install-ubuntu.html
$ sudo service codedeploy-agent status
In case if you are running Ubuntu release 20.x and you get this error
./install:22:in block in method_missing': undefined method path' for
#<IO:> (NoMethodError)
try running the install file via this script
sudo ./install auto > /tmp/logfile
Check you have EC2 Instance Code Deploy Role -> Create a code deployment role and assign it to the Instance, https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-service-role.html.
In case if you assign the EC2 Role after initiate, restart the server.
Check your appsec.yml file placement as per the top answer, try to avoid any long timeout in it.
Log into your instance check your error log
$ tail -f /var/log/aws/codedeploy-agent/codedeploy-agent.log
You should be able to figure out what caused the individual instances to fail by digging into the deployment instance details:
http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-view-instance-details.html
These should contain more detailed information about why your application was unable to be deployed.
This error is commonly due to problems in the configuration of the appSpec.yml or appSpec.json file (It depends on the format you are using).
"If you have any Hook I recommend that you remove them, check if it works, then you can add one by one (the Hooks) and so you can identify the error"
The appspec.yml file should be located at the root of your project:
│-- appspec.yml
│-- index.html
└-- scripts
│-- install_dependencies
│-- start_server
└-- stop_server
In the scripts folder you will have to place the processes that you want to be executed according to the Hook
Here is an example of the appspec.yml file
version: 0.0
os: linux
files:
- source: /index.html
destination: /var/www/html/
hooks:
BeforeInstall:
- location: scripts/install_dependencies
timeout: 300
runas: root
- location: scripts/start_server
timeout: 300
runas: root
ApplicationStop:
- location: scripts/stop_server
timeout: 300
runas: root
I hope I can help you 😃👻🕺🏾
Make sure the CodeDeploy Host Agent Service is running in your target EC2 instance.
The error you are facing is a generic error message thrown on any of the event failure which could be beforeblockTraffic, blockTraffic, ApplicationStop etc.
The first step in this case would be check whether code deploy agent is running or not if first event i.e. BeforeBlockTraffic event is failed.
As you can see in the screenshot below, the event failure message would tell you the exact error behind.
From the failed deployments, I can see all lifecycle events were skipped. Instance i-0bcc36e73851297f2 is currently in Stopped state but I can see the IAM instance profile is missing. Your Amazon EC2 instances need permission to access the Amazon S3 buckets or GitHub repositories where the applications that will be deployed by AWS CodeDeploy are stored. To launch Amazon EC2 instances that are compatible with AWS CodeDeploy, you must create an additional IAM role, an instance profile. 1
For such failures, you can always begin with a general troubleshooting checklist for a failed deployment 2 and then look for troubleshooting guides on Deployment Issues and Instance issues3.
1[http://docs.aws.amazon.com/codedeploy/latest/userguide/how-to-create-iam-instance-profile.html]1
2 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting-general.html]2
3 [http://docs.aws.amazon.com/codedeploy/latest/userguide/troubleshooting.html]3
Check the status of the Code Deploy Agent. In my case, the agent wasn't up.
Please check the role given to the ec2 machine(where the agent is running). It should have s3 access as well. This resolved my issue.
"The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path 'appspec.yml'"
Please place your appspec.yml file in your root folder to solve this error
To access your after script and before script
The overall deployment failed because too many individual instances failed deployment, too few healthy instances are available for deployment, or some instances in your deployment group are experiencing problems.

Solr issue: ClusterState says we are the leader, but locally we don't think so

So today we run into a disturbing solr issue.
After a restart of the whole cluster one of the shard stop being able to index/store documents.
We had no hint about the issue until we started indexing (querying the server looks fine).
The error is:
2014-05-19 18:36:20,707 ERROR o.a.s.u.p.DistributedUpdateProcessor [qtp406017988-19] ClusterState says we are the leader, but locally we don't think so
2014-05-19 18:36:20,709 ERROR o.a.s.c.SolrException [qtp406017988-19] org.apache.solr.common.SolrException: ClusterState says we are the leader (http://x.x.x.x:7070/solr/shard3_replica1), but locally we don't think so. Request came from null
at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:503)
at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:267)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:550)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:126)
at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
We run Solr 4.7 in Cluster mode (5 shards) on jetty.
Each shard run on a different host with one zookeeper server.
I checked the zookeeper log and I cannot see anything there.
The only difference is that in the /overseer_election/election folder I see this specific server repeated 3 times, while the other server are only mentioned twice.
45654861x41276x432-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x368-x.x.x.x:7070_solr-n_00000003xx
74030267x31685x369-x.x.x.x:7070_solr-n_00000003xx
Not even sure if this is relevant. (Can it be?)
Any clue what other check can we do?
We've experienced this error under 2 conditions.
Condition 1
On a single zookeeper host there was an orphaned Zookeeper ephemeral node in
/overseer_elect/election. The session this ephemeral node was associated with no longer existed.
The orphaned ephemeral node cannot be deleted.
Caused by: https://issues.apache.org/jira/browse/ZOOKEEPER-2355
This condition will also be accompanied by a /overseer/queue directory that is clogged-up with queue items that are forever waiting to be processed.
To resolve the issue you must restart the Zookeeper node in question with the orphaned ephemeral node.
If after the restart you see Still seeing conflicting information about the leader of shard shard1 for collection <name> after 30 seconds
You will need to restart the Solr hosts as well to resolve the problem.
Condition 2
Cause: a mis-configured systemd service unit.
Make sure you have Type=forking and have PIDFile configured correctly if you are using systemd.
systemd was not tracking the PID correctly, it thought the service was dead, but it wasn't, and at some point 2 services were started. Because the 2nd service will not be able to start (as they both can't listen on the same port) it seems to just sit there in a failed state hanging, or fails to start the process but just messes up the other solr processes somehow by possibly overwriting temporary clusterstate files locally.
Solr logs reported the same error the OP posted.
Interestingly enough, another symptom was that zookeeper listed no leader for our collection in /collections/<name>/leaders/shard1/leader normally this zk node contains contents such as:
{"core":"collection-name_shard1_replica1",
"core_node_name":"core_node7",
"base_url":"http://10.10.10.21:8983/solr",
"node_name":"10.10.10.21:8983_solr"}
But the node is completely missing on the cluster with duplicate solr instances attempting to start.
This error also appeared in the Solr Logs:
HttpSolrCall null:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /roles.json
To correct the issue, killall instances of solr (or java if you know it's safe), and restart the solr service.
We figured out!
The issue was that jetty didn't really stop so we had 2 running processes, for whatever reason this was fine for reading but not for writing.
Killing the older java process solved the issue.

Deploy & Debug remote Jetty with IntelliJ 12

I've been hacking and googling for a while now, and I've found several statck overflow threads that seemed like they were written for older versions of intellij, with various application servers. Usually they tell you to enter
java -Xdebug -Xrunjdwp:transport=dt_socket,address=51887,suspend=n,server=y
One answer suggests using something like
-agentlib:jdwp:transport=dt_socket,address=51887,suspend=n,server=y
But then I get this:
Error occurred during initialization of VM
Could not find agent library: libjdwp:transport.jnilib (searched /Library/Java/JavaVirtualMachines/1.6.0_37-b06-434.jdk/Contents/Libraries:/System/Library/Java/Extensions:/Library/Java/Extensions:.)
Then after one or the other of the above they tell you something like "Edit Configurations> jetty > remote and enter localhost, 51887" (the port number varies)
However in 12, the page you land on after you select remote has a plethora of options, and is asking for JNDI ports, not jdwp ports on another tab it actually suggests the jdwp parameters above.
Researching the JNDI port bit, generally yields instructions to add args like this to your command line...
-Dcom.sun.management.jmxremote= \
-Dcom.sun.management.jmxremote.port=1099 \
-Dcom.sun.management.jmxremote.ssl=false \
-Dcom.sun.management.jmxremote.authenticate=false\
I've done that too and I can see port 1099 held by java (using lsof) and I can telnet to 1099, so I know the JVM is listening. (We'll try not to worry about the fact that that appears to say, open up a port by which anyone install arbitrary java code over the network to your computer without a password)
However, in Intellij whenever I try to deploy and debug it gives me the following message:
I can see java RMI communications over 1099 when I snoop port 1099 with wireshark (but they are illegible). Evidently, the communications are not satisfactory for Intellij, so I'm wondering if there's something I need to do to Jetty to get it to play nice. Note that changing the Jetty version is not presently an option, so let's not go there :).
I've also tried removing the artifact, disabling make, and trying to just connect the debugger, but it still gives me the same red baloon and error message, so evidently the JNDI (port 1099) part is required.
Does anyone see something I'm doing wrong, or know of something else I should do to get this to work?
(I'm wondering if it is something similar to this: http://youtrack.jetbrains.com/issue/IDEA-65746 jboss issue)
Edit: Thanks to this google groups post I've discovered that it is possible to get the debugger connected if you don't specify Edit Configurations> + > jetty > remote, but instead choose Edit Configurations > + > remote, but debug and deploy is what I'm after so that's only a half solution.
Jetty remote configuration requires several manual steps which are performed automatically when you start Jetty directly from IDEA using the local configuration instead.
If you absolutely must use the remote configuration, try the following steps:
In the Remote staging section of the Server tab of the IDEA Jetty remote run configuration:
specify Same file system for Type and Host
specify path to the <Jetty home>/contexts folder in the Local path field of the contexts section
(settings will differ if you have Jetty running on another machine than IDEA, but I assume it's the same machine in your case)
Pass the following VM parameters to the Jetty process:
-Dcom.sun.management.jmxremote=
-Dcom.sun.management.jmxremote.port=<JNDI port>
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-DOPTIONS=jmx
<JNDI port> value should be the same as specified in the JNDI port field of the IDEA Jetty run configuration
Pass the following configuration files to the Jetty process (in the command line):
etc/jetty-jmx.xml
etc/jetty.xml
If you need to debug, you should also pass to Jetty process VM parameters taken from IDEA Jetty run configuration: Startup/Connection tab, select Debug list item under the To debug remote server JVM ...
Here is the sample command line to start Jetty process with all the required options:
java -Xdebug -Xrunjdwp:transport=dt_socket,address=60208,suspend=n,server=y -DSTOP.PORT=0 -Dcom.sun.management.jmxremote= -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -DOPTIONS=jmx -Dfile.encoding=UTF-8 -classpath start.jar etc/jetty-jmx.xml etc/jetty.xml