Ambari registration phase fails for SSL on EC2 - apache

I am trying to use Apache Ambari to configure a Hadoop cluster on EC2.
During the registration phase I get this error:
Command start time 2016-11-23 20:25:12
('Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 312, in <module>
main(heartbeat_stop_callback)
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 248, in main
stop_agent()
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 198, in stop_agent
sys.exit(1)
SystemExit: 1
INFO 2016-11-23 20:25:18,716 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-11-23 20:25:18,907 main.py:74 - loglevel=logging.INFO
INFO 2016-11-23 20:25:18,907 DataCleaner.py:39 - Data cleanup thread started
INFO 2016-11-23 20:25:18,908 DataCleaner.py:120 - Data cleanup started
INFO 2016-11-23 20:25:18,909 DataCleaner.py:122 - Data cleanup finished
INFO 2016-11-23 20:25:18,930 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2016-11-23 20:25:18,931 main.py:289 - Connecting to Ambari server at https://IPADDRESS.us-west-2.compute.internal:8440 (172.31.37.172)
INFO 2016-11-23 20:25:18,931 NetUtil.py:59 - Connecting to https://IPADDRESS.us-west-2.compute.internal:8440/ca
ERROR 2016-11-23 20:25:18,983 NetUtil.py:77 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
ERROR 2016-11-23 20:25:18,983 NetUtil.py:78 - SSLError: Failed to connect. Please check openssl library versions.
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2016-11-23 20:25:18,983 NetUtil.py:105 - Server at https://IPADDRESS.us-west-2.compute.internal:8440 is not reachable, sleeping for 10 seconds...
', None)
('Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 312, in <module>
main(heartbeat_stop_callback)
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 248, in main
stop_agent()
File "/usr/lib/python2.6/site-packages/ambari_agent/main.py", line 198, in stop_agent
sys.exit(1)
SystemExit: 1
INFO 2016-11-23 20:25:18,716 ExitHelper.py:53 - Performing cleanup before exiting...
INFO 2016-11-23 20:25:18,907 main.py:74 - loglevel=logging.INFO
INFO 2016-11-23 20:25:18,907 DataCleaner.py:39 - Data cleanup thread started
INFO 2016-11-23 20:25:18,908 DataCleaner.py:120 - Data cleanup started
INFO 2016-11-23 20:25:18,909 DataCleaner.py:122 - Data cleanup finished
INFO 2016-11-23 20:25:18,930 PingPortListener.py:50 - Ping port listener started on port: 8670
INFO 2016-11-23 20:25:18,931 main.py:289 - Connecting to Ambari server at https://IPADDRESS.us-west-2.compute.internal:8440 (172.31.37.172)
INFO 2016-11-23 20:25:18,931 NetUtil.py:59 - Connecting to https://IPADDRESS.us-west-2.compute.internal:8440/ca
ERROR 2016-11-23 20:25:18,983 NetUtil.py:77 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)
ERROR 2016-11-23 20:25:18,983 NetUtil.py:78 - SSLError: Failed to connect. Please check openssl library versions.
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2016-11-23 20:25:18,983 NetUtil.py:105 - Server at https://IPADDRESS.us-west-2.compute.internal:8440 is not reachable, sleeping for 10 seconds...
', None)
Connection to IPADDRESS.us-west-2.compute.internal closed.
SSH command execution finished
host=IPADDRESS.us-west-2.compute.internal, exitcode=0
Command end time 2016-11-23 20:25:21
Registering with the server...
Registration with the server failed.
I think it is something basic, but I was not able to solve.
The openssl version is 1.0.2g
Any advice?
Thank you

This seems to be a known issue related to JDK being used on the host machine for Ambari server.
The post here mentions that Oracle JDK should be used to get past this problem.

in case this is not the JDK issue as mentioned here, then there would be some issue with the version of the python being used for launching ambari-agent and ambari-server. Make sure that both are using same version i.e python 2.7 etc and restart them.
P.S After struggling hours when I ran into this issue, it was due to ambari-server running python2.6 and agent running in python2.7 for me.

Related

Action "apache2ctl graceful" failed | Let's Encrypt on Raspbian with Owncloud

Currently I've got the problem that I can't install an SSL Certificate on my locale RPI-4B with Owncloud installed on it.
I tried installing the Certificate using this (German) Tutorial: Click here
But everytime I tip in: sudo letsencrypt -d srvschneg.ddns.net --redirect -m {MY MAIL}, the following error is thrown
What would you like to do?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: Attempt to reinstall this existing certificate
2: Renew & replace the cert (limit ~5 per 7 days)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 1
Keeping the existing certificate
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/000-default-le-ssl.conf
Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
Unable to restart apache using ['apache2ctl', 'graceful']
Rolling back to previous server configuration...
Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
Unable to restart apache using ['apache2ctl', 'graceful']
Encountered exception during recovery:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2185, in _reload
util.run_script(self.option("restart_cmd"))
File "/usr/lib/python3/dist-packages/certbot/util.py", line 86, in run_script
raise errors.SubprocessError(msg)
certbot.errors.SubprocessError: Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/certbot/client.py", line 526, in deploy_certificate
self.installer.restart()
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2175, in restart
self._reload()
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2203, in _reload
raise errors.MisconfigurationError(error)
certbot.errors.MisconfigurationError: Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2185, in _reload
util.run_script(self.option("restart_cmd"))
File "/usr/lib/python3/dist-packages/certbot/util.py", line 86, in run_script
raise errors.SubprocessError(msg)
certbot.errors.SubprocessError: Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/certbot/error_handler.py", line 108, in _call_registered
self.funcs[-1]()
File "/usr/lib/python3/dist-packages/certbot/client.py", line 626, in _rollback_and_restart
self.installer.restart()
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2175, in restart
self._reload()
File "/usr/lib/python3/dist-packages/certbot_apache/configurator.py", line 2203, in _reload
raise errors.MisconfigurationError(error)
certbot.errors.MisconfigurationError: Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
Error while running apache2ctl graceful.
httpd not running, trying to start
Action 'graceful' failed.
The Apache error log may have more information.
I already tried it lots of times -> I really don't know anymore what I should do...
I hope anyone is able to help me
Matthias
Update:
I finally made it using those 3 lines in the default-ssl.conf:
SSLCertificateChainFile /etc/letsencrypt/live/mydom.tld/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/mydom.tld/privkey.pem
SSLCertificateFile /etc/letsencrypt/live/mydom.tld/cert.pem
https://doc.owncloud.com/server/next/admin_manual/installation/letsencrypt/apache.html#create-an-ssl-virtualhost-configuration

ZooKeeper gives exiting JVM with error code 2 error

I have openjdk16.0.1, zookeper and kafka in my machine.
When I use the command "zkServer.cmd start" it gives me the error below.
ERROR [main:QuorumPeerMain#99] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing C:\Users\****\Desktop\apache-zookeeper-3.7.0-bin\bin\..\conf\zoo.cfg
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:198)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:125)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
Caused by: java.lang.IllegalArgumentException: Malformed \uxxxx encoding.
at java.base/java.util.Properties.loadConvert(Properties.java:672)
at java.base/java.util.Properties.load0(Properties.java:456)
at java.base/java.util.Properties.load(Properties.java:408)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:185)
... 2 more
Invalid config, exiting abnormally
2021-04-30 17:25:17,422 [myid:] - INFO [main:ZKAuditProvider#42] - ZooKeeper audit is disabled.
2021-04-30 17:25:17,428 [myid:] - ERROR [main:ServiceUtils#42] - Exiting JVM with code 2
Is there any solution for this?
tickTime=2000
initLimit=10
syncLimit=5
dataDir=C:\Users\myUserName\Desktop\apache-zookeeper-3.7.0-bin\data clientPort=2181
Now I changed the place of the zookeper folder. it is now under C:
dataDir=C:\apache-zookeeper-3.7.0-bin\data
However, there is still and error. This time:
ERROR [main:ZooKeeperServerMain#70] - Invalid arguments, exiting abnormally
java.lang.NumberFormatException: For input string: "C:\apache-zookeeper-3.7.0-bin\bin\..\conf\zoo.cfg"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Integer.parseInt(Integer.java:660)
at java.base/java.lang.Integer.parseInt(Integer.java:778)
at org.apache.zookeeper.server.ServerConfig.parse(ServerConfig.java:78)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:110)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:68)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:141)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)

ambari-agent can not reach ambari-server

When I finished install ambari-server with httpd local repository and Comfire Hosts on webUI, I got some error as follow:
INFO 2018-05-27 15:39:16,776 NetUtil.py:70 - Connecting to https://master:8440/ca
ERROR 2018-05-27 15:39:16,787 NetUtil.py:96 - [Errno 8] _ssl.c:493: EOF occurred in violation of protocol
ERROR 2018-05-27 15:39:16,788 NetUtil.py:97 - SSLError: Failed to connect.Please check openssl library versions.
Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2018-05-27 15:39:16,789 NetUtil.py:124 - Server at https://master:8440 is not reachable, sleeping for 10 seconds...
INFO 2018-05-27 15:39:26,793 NetUtil.py:70 - Connecting to https://master:8440/ca
ERROR 2018-05-27 15:39:26,799 NetUtil.py:96 - [Errno 8] _ssl.c:493: EOF occurred in violation of protocol
ERROR 2018-05-27 15:39:26,799 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions.Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.
WARNING 2018-05-27 15:39:26,801 NetUtil.py:124 - Server at https://master:8440 is not reachable, sleeping for 10 seconds...
My environment message as follow:
CentOS Linux release 7.5.1804 (Core)
Python2.7.5
Java1.8.0_171
OpenSSL1.0.2k
Ambari2.6.2.0
HDP-2.6.5.0
On my other amabri-agent nodes, I can reach master on 8440 port as follow:
[root#slave2 ~]# telnet master 8440
Trying 192.168.17.128...
Connected to master.
Escape character is '^]'.
Please give me some help, thanks a lot!
I am also getting the same issue.
This worked for me.
In /etc/ambari-agent/conf/ambari-agent.ini
Add this line below [security]
force_https_protocol=PROTOCOL_TLSv1_2
In /etc/python/cert-verification.cfg
[https]
verify=disable
(change from default to disable)
Please check JAVA_HOME and openSSL version in your setup

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS.
Start the hiveserver2 in the Ambari UI and check the contents of /var/log/hive/hiveserver2.log.
Below is the error log.
Any help would be appreciated.
Contents of hiveserver2.log
2018-03-08 04:41:53,345 WARN [main-EventThread]: server.HiveServer2 (HiveServer2.java:process(343)) - This instance of HiveServer2 has been removed from the list of server instances available for dynamic service discovery. The last client session has ended - will shutdown now.
2018-03-08 04:41:53,347 INFO [main]: zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x16203aad5af0040 closed
2018-03-08 04:41:53,347 INFO [main]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:stop(405)) - Shutting down HiveServer2
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-03-08 04:41:53,348 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(508)) - Error starting HiveServer2 on attempt 1, will retry in 60 seconds
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1520480101488_0046 failed 2 times due to AM Container for appattempt_1520480101488_0046_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-91-7.ap-northeast-2.compute.internal:8088/cluster/app/application_1520480101488_0046 Then click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2: tar: Removing leading `/' from member names
tar: Skipping to next header
gzip: /hadoopfs/fs1/yarn/nodemanager/filecache/60_tmp/tmp_tez.tar.gz: invalid compressed data--format violated
tar: Exiting with failure status due to previous errors
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:76)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:488)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:87)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:720)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I had exactly the same issue with HDP on AWS. FYI, In my case the issue was with HDP version 2.6.4.5-2. I'm going to show how I fixed using this version because it is the latest at this time.
As the error log shows the problem is with tez.tar.gz that is corrupted then YARN is unable to decompress it in the YARN container.
This tez.tar.gz file is copied from the hdfs:///hdp/apps/<hdp_version>/tez/tez.tar.gz.
To reproduce the error and confirm that this file is corrupted, you can run the following command:
sudo su
su hdfs
hdfs dfs -get /hdp/apps/2.6.4.5-2/tez.tar.gz
tar -xvzf tez.tar.gz
You will get the following error:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The fix is pretty simple, you must just replace the HDFS file with the one that you have on your local file-system running the following command:
hdfs dfs -rm /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
hdfs dfs -put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
Now restart Hive Server 2 service and done!
NOTE: If something similar happens with other services you can do the same thing. Please check the following link that has more details: https://community.hortonworks.com/articles/30096/foxing-broken-targz-and-jar-files-in-hdp-24.html
Hope this helps!

celerybeat shutdown - initscript order?

I'm trying to setup rabbitmq/celery/django-celery/django so that it is "rebootproof", i.e. it all just comes back up by itself. Everything seems to work fine except this:
When I reboot, all services get started, but it seems celeryd is started before rabbitmq, and celerybeat gets subsequently terminated because it can't connect (?):
[2011-06-14 00:48:35,128: WARNING/MainProcess] celery#inquire has started.
[2011-06-14 00:48:35,130: INFO/Beat] child process calling self.run()
[2011-06-14 00:48:35,131: INFO/Beat] Celerybeat: Starting...
[2011-06-14 00:48:35,134: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 2 seconds...
[2011-06-14 00:48:35,688: INFO/Beat] process shutting down
[2011-06-14 00:48:35,689: WARNING/Beat] Process Beat:
[2011-06-14 00:48:35,689: WARNING/Beat] Traceback (most recent call last):
...
[2011-06-14 00:48:35,756: WARNING/Beat] File "/home/inquire/inquire.env/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 220, in create_transport
[2011-06-14 00:48:35,760: WARNING/Beat] return TCPTransport(host, connect_timeout)
[2011-06-14 00:48:35,761: WARNING/Beat] File "/home/inquire/inquire.env/lib/python2.6/site-packages/amqplib/client_0_8/transport.py", line 58, in __init__
[2011-06-14 00:48:35,761: WARNING/Beat] self.sock.connect((host, port))
[2011-06-14 00:48:35,761: WARNING/Beat] File "<string>", line 1, in connect
[2011-06-14 00:48:35,761: WARNING/Beat] error: [Errno 111] Connection refused
[2011-06-14 00:48:35,761: INFO/Beat] process exiting with exitcode 1
[2011-06-14 00:48:37,137: ERROR/MainProcess] Consumer: Connection Error: [Errno 111] Connection refused. Trying again in 4 seconds...
On Ubuntu, I installed rabbitmq-server with apt, django-celery with pip into my virtualenv, then I symlinked the "celeryd" initscript I got from https://github.com/ask/celery/tree/master/contrib/debian/init.d in /etc/init.d, configured it in /etc/default/celeryd to use the django celeryd from my virtualenv, and made it "rebootproof" via (maybe "defaults" is the problem?)
update-rc.d celeryd defaults
Rather than running celeryd and celerybeat with separate initscripts, I just configured celeryd to include Beat (maybe that's the problem?):
CELERYD_OPTS="-v 2 -B -s celery -E"
Any pointers how to solve this issue?
If I
sudo /etc/init.d/celeryd restart
there are no complaints:
[2011-06-14 00:54:29,157: WARNING/MainProcess] celery#inquire has started.
[2011-06-14 00:54:29,161: INFO/Beat] child process calling self.run()
[2011-06-14 00:54:29,162: INFO/Beat] Celerybeat: Starting...
but I need to eliminate the need for any manual steps.
celerybeat's dependency on the broker service was indeed the issue.
Rather than installing the initscript with
update-rc.d celeryd defaults
with the rabbitmq-server script being installed as sequence number 20 for start and kill, celerybeat's dependency must be resolved by explicitly starting it after (and killing it before) rabbitmq-server by using
update-rc.d celeryd defaults 21 19
NB: I've actually opted for the separate celerybeat service instead of the -B invocation, and only did 21 19 for that script, i.e. the one with the problem.
I think that the problem is not in celery it self but in your script, probably when celeryd starts the broker is not listening yet.
I'm using almost your same command and I don't have any issue, launch the celeryd script with -B option is not wrong.
I think that on your reboot script you have to wait for rabbitmq complete restart before launch celeryd, maybe with test of connection too.