Issue with Open Shift Origin Mongo DB service - openshift-origin

I have installed OpenShift Origin V3 on aws ec2(Fedora19) using oo-install.The set up is One Broker +One Node.
I was making some modifications to the security groups to make it more restrictive -
and it ended up some issues in the mongo service.
1.service mongod does not start up and the status shows failed.
The /var/log/mongodb/mongodb.log says
Thu Mar 6 11:24:08.189 [initandlisten] ERROR: listen(): bind() failed errno:99 Cannot assign requested address for socket: :27017
Thu Mar 6 11:24:08.189 [initandlisten] now exiting
Running oo-accept-broker -v says
FAIL: error logging into mongo db: MOPED: Retrying connection to primary for replica set :27017">]>: MOPED: Retrying connection to primary for replica set :27017">]>/MOPED: --username Retrying, exit code: 1
Any pointers on how to resolve this will be greatly appreciated.
Thanks
Shabna

I would try rolling back your changes to the security groups first and then make the changes one by one and see which one causes the issue, then post that to stack and see if anyone can comment on the specific change that is affecting mongodb.

Related

Found error 'CRASH REPORT Process' on RabbitMQ in every 10 mins

I found error on RabbitMQ in every 10 mins. Please help me to investigate this problem.
Error message.
021-09-09 13:25:30.084 [error] <0.14464.32> CRASH REPORT Process <0.14464.32> with 0 neighbours crashed with reason: no function clause matching rabbit_mgmt_wm_node:find_type(rabbit#controller1, []) line 79
2021-09-09 13:25:30.085 [error] <0.14457.32> Ranch listener rabbit_web_dispatch_sup_15672, connection process <0.14457.32>, stream 1 had its request process <0.14464.32> exit with reason function_clause and stacktrace [{rabbit_mgmt_wm_no
I had the same issue with Zabbix monitoring the RabbitMQ server every minute which generated a crash-error with the same frequency.
The URL used by Zabbix to monitor contained a domain part to the node name ie. rabbit#my_host.zzz.aws instead of the actual node name as displayed by the console: rabbit#my_host. this explains why rabbit_mgmt_wm_node:find_type failed and crashed.
This was verified using curl as shown below:
curl -v -u user:passwd 'http://127.0.0.1:15672/api/nodes/rabbit#my_host?memory=true'
which returned a valid response, HTTP/1.1 200 OK, when the node name matched and the crash/error when it did not.
please refer to this thread:
https://groups.google.com/g/rabbitmq-users/c/N0EgrLn55XQ

How to fix etcd cluster "error "tls: first record does not look like a TLS handshake""

I created a three node etcd cluester, config and start is already OK, but when I check the /var/log/messages, it shows
etcd: rejected connection from "172.17.0.3:43192" (error "tls: first
record does not look like a TLS handshake", ServerName "")
How can I fix it ?
I have checked the health of etcd :
member 48b0dff99d5c867e is healthy: got healthy result from https://172.17.0.9:2379
member 646dab89331aabab is healthy: got healthy result from https://172.17.0.8:2379
member b45603216bfac234 is healthy: got healthy result from https://172.17.0.10:2379
That shows Ok, but when I cat the /var/log/messages, it always shows this error :
Jan 12 20:08:57 master etcd: rejected connection from
"172.17.0.3:43160" (error "tls: first record does not look like a TLS
handshake", ServerName "")
Jan 12 20:08:57 master etcd: rejected
connection from "172.17.0.3:43162" (error "tls: oversized record
received with length 21536", ServerName "")
I got this message for the etcd peer communication when switching from http to https for peer communication. Apparently etcd has persistent peer information that overrides the command line options so it continued to use http for peer communication in spite of the command line options.
In the end, since this was a test cluster, I nuked /var/lib/etcd and the new cli configuration took hold
There is no solution from my side to fully help you with an issue but I've found couple of links that might help you in further investigations. Read them carefully, try solutions and I hope you will resolve the problem.
Github question #9917: check ETCDCTL_API variable, especially make sure --endpoints is configured with https.
Runtime reconfiguration: try to reconfigure you etcd by updating/removing/adding etcs members.
nginx ingress: check your nginx ingress annotations in case you are using nginx
google groups TLS handshake topic: Check this topic, especially comments related to VAULT_ADDR variable. I will copy paste last comment from thread here:
We were able to get everything to work, after understanding the
permission issues.
You asked: "Please confirm if you are seeing server error messages
before initializing Vault" Upon further examination, I did determine
that the errors were not happening before initializing the Vault.
The problem ended up not being related to VAULT_ADDR, and we used the
value: "http://127.0.0.1:8200"
I have the setup operation scripted, and it appears that not
everything was being run at the proper permissions. At first I was
running the scripts using the "sudo" command, which resulted in the
failures. I discovered that the permissions for the certificate key
were restricted and the file could not be accessed by my user. There
may have been other permission issues as well. But once I switched
user to root, and ran the script, everything behaved correctly.
Thanks

Weblogic Admin Server Start issue - DataSource passwords got exparied

We are trying to use one of the existing weblogic 12c domain and It's DataSource passwords been expired.
Since AdminServer is not responding correctly I tried to re-start AdminServer as well, Now I have changed the DB passwords and wanted to set the new passwords starting the AdminServer, but I cant start AdminServer it's failing complaining passwords are expired. (I could have get a way with this issue if i keep the admin server running and set the new passwords)
I can see DataSources are targeted to Admin Server and I thought if I untarget DS from AdminServer I could start AdminServer correctly. hence I removed the AdminServer as a target from config.xml and tried o start the Admin but it's still failing complaining passwords are expired. Is Config cached anywhere , looks like I am Admin is still using the old config file ? by the way I have tried removing the tmp folder as well.
Also, I tried encrypting the new password and placing on JDBC config files, Probably the way I encrypted was wrong. These are the steps I used to encrypt
1. Connect to WLST offline ( because Admin is not up)
2. Read domain
3. Call encrypt function for new password
4. Print the encrypt password
Anything wrong ? Appreciate any suggestion to resolve this issue.
Error is starting like this ,
Jun 22, 2015 4:38:04 PM oracle.security.jps.JpsStartup start
INFO: Jps initializing.
Jun 22, 2015 4:38:07 PM org.hibernate.validator.util.Version <clinit>
INFO: Hibernate Validator 12.1.3.0.0
Jun 22, 2015 4:38:07 PM org.hibernate.validator.engine.resolver.DefaultTraversableResolver detectJPA
INFO: Instantiated an instance of org.hibernate.validator.engine.resolver.JPATraversableResolver.
[EL Severe]: ejb: 2015-06-22 16:38:11.173--ServerSession(143991231)--Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: ORA-01017: invalid username/password; logon denied
Error Code: 1017
Jun 22, 2015 4:38:11 PM oracle.security.jps.internal.common.config.AbstractSecurityStore getSecurityStoreVersion
WARNING: Unable to get the Version from Store returning the default oracle.security.jps.service.policystore.PolicyStoreException: javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.5.2.v20140319-9ad6abd): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: ORA-01017: invalid username/password; logon denied
Error Code: 1017
at oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.processJPAException(JpsDBDataManager.java:2180)
at oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.init(JpsDBDataManager.java:1028)
at oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.jpsObjectBaseQuery(JpsDBDataManager.java:3089)
at oracle.security.jps.internal.policystore.rdbms.JpsDBDataManager.queryBaseObjects(JpsDBDataManager.java:5761)
at oracle.security.jps.internal.common.config.AbstractSecurityStore.getSecurityStoreVersion(AbstractSecurityStore.java:211)
at oracle.security.jps.internal.common.config.AbstractSecurityStore.getSecurityStoreVersion(AbstractSecurityStore.java:195)
at oracle.security.jps.internal.common.config.AbstractSecurityStore.<init>(AbstractSecurityStore.java:99)
at oracle.security.jps.internal.credstore.AbstractCredentialStore.<init>(AbstractCredentialStore.java:104)
at oracle.security.jps.internal.credstore.ldap.LdapCredentialStore.<init>(LdapCredentialStore.java:130)
at oracle.security.jps.internal.credstore.ldap.LdapCredentialStoreProvider.getInstance(LdapCredentialStoreProvider.java:235)
at oracle.security.jps.internal.credstore.rdbms.DbmsCredentialStoreProvider.getInstance(DbmsCredentialStoreProvider.java:101)
at oracle.security.opss.internal.runtime.ServiceContextManagerImpl.createContextInternal(ServiceContextManagerImpl.java:432)
Thanks.
First take backup of complete config folder inside domain.It looks like you are using rdbms policy store inside domain.So check for security-realm tag in config.xml there you will able to find encrypted password change the same using newly encrypted password and your admin server should start.

How can I change which address Datastax agent will try to connect to?

My Cassandra instances are not listening on 127.0.0.1. When I start datastax-agent I find this in logs:
# tail -n 100 /var/log/datastax-agent/agent.log
...
ERROR [Initialization] 2015-05-19 22:35:04,064 Can't connect to Cassandra, retrying soon.
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.TransportException: [/127.0.0.1:9042] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:220)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:78)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1231)
at com.datastax.driver.core.Cluster.init(Cluster.java:158)
at com.datastax.driver.core.Cluster.connect(Cluster.java:246)
at clojurewerkz.cassaforte.client$connect_or_close.doInvoke(client.clj:149)
at clojure.lang.RestFn.invoke(RestFn.java:410)
at clojurewerkz.cassaforte.client$connect.invoke(client.clj:165)
at opsagent.cassandra$setup_cassandra$fn__8157.invoke(cassandra.clj:344)
at again.core$with_retries_STAR_$fn__8013.invoke(core.clj:98)
at again.core$with_retries_STAR_.invoke(core.clj:97)
at opsagent.cassandra$setup_cassandra.invoke(cassandra.clj:339)
at opsagent.opsagent$setup_cassandra.invoke(opsagent.clj:153)
at opsagent.jmx$determine_ip.invoke(jmx.clj:276)
at opsagent.jmx$setup_jmx$fn__8438.invoke(jmx.clj:293)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:745)
How can I change which address the Datastax Agent connects to? I have tried setting local_interface in the agent's address.yaml (and restarting agent), but that doesn't seem to work.
The secret was to set rpc_address to 0.0.0.0. Cred to LHWizard for pointing this out.

apache mod_proxy error os10060 and returning 503?

Can't get to my site. Apache gives the following error message:
[Fri Sep 05 08:47:42 2008] [error] (OS 10060)A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. : proxy: HTTP: attempt to connect to 10.10.10.1:80 (10.10.10.1) failed
Can you connect to the proxied host (10.10.10.1) directly? Is it functioning normally?
http://www.checkupdown.com/status/E503.html
Your Web server is effectively 'closed for repair'. It is still functioning minimally because it can at least respond with a 503 status code, but full service is impossible i.e. your Web site is simply unavailable. There are a myriad possible reasons for this, but generally it is because of some human intervention by the operators of your Web server machine. You can usually expect that someone is working on the problem, and normal service will resume as soon as possible.
You need to restart the webserver then figure out why it shut it self down.