Ambari shows zeppelin server not started but the server is actually up and running - ambari

I am using HDP 2.4.2 and I had previously installed the zeppelin server. It was working fine but today when i restarted the cluster ( AWS nodes were restarted), Ambari shows that Zeppelin server is not running and fails to start the server with the following error:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/2.4/services/ZEPPELIN/package/scripts/master.py", line 235, in <module>
Master().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.4/services/ZEPPELIN/package/scripts/master.py", line 169, in start
+ params.zeppelin_log_file, user=params.zeppelin_user)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/hdp/current/zeppelin-server/lib/bin/zeppelin-daemon.sh start >> /var/log/zeppelin/zeppelin-setup.log' returned 1. /usr/hdp/current/zeppelin-server/lib/bin/zeppelin-daemon.sh: line 187: /var/run/zeppelin-notebook/zeppelin-zeppelin-ip-10-0-0-11.eu-west-1.compute.internal.pid: Permission denied
cat: /var/run/zeppelin-notebook/zeppelin-zeppelin-ip-10-0-0-11.eu-west-1.compute.internal.pid: No such file or directory
In the zeppelin logs:
ERROR [2016-06-06 03:20:36,714] ({main} VFSNotebookRepo.java[list]:140) - Can't read note file:///usr/hdp/current/zeppelin-server/lib/notebook/screenshots java.io.IOException: file:///usr/hdp/current/zeppelin-server/lib/notebook/screenshots/note.json not found
ERROR [2016-06-06 03:34:12,795] ({main} Notebook.java[loadNoteFromRepo]:330) - Failed to load 2BHU1G67J java.io.IOException: file:///usr/hdp/current/zeppelin-server/lib/notebook/2BHU1G67J is not a directory
But for some reason, the zeppelin port is listening and despite these errors, the zeppelin server is running fine and executing all the queries. Please advice on how to correct the issue in Ambari and start the service without error from ambari.

The problem is with the PID file for the zeppelin service. It's either owned by the wrong user or has the wrong permissions. Manually stop the zeppelin service then delete the pid file locate at: /var/run/zeppelin-notebook/zeppelin-zeppelin-ip-10-0-0-11.eu-west-1.compute.internal.pid. Double check the owner/permissions on the /var/run/zeppelin-notebook folder as well. You should then be able to restart the service in the Ambari UI.

Related

Redis: Redisinsight : Error while connecting : "Something went wrong adding the database. Please try again"

I have a Redis server that I am trying to connect to for Mule 4 applications.
My objective is :
Connect with Redis using Mule 4 app : Success
Connect with Redis using Redisinsight to visualise the data -> Problem
While connecting using Redisinsight I do the following :
Launch the Redisinsight tool. It starts the tool at : http://localhost:8001/
Click on "I already have a database"
Click on "Connect to a Redis Database"
Here I provide the host, port, name (which as per documentation I provide anything say redis_test) and password.
I get the error message : "Something went wrong adding the database. Please try again"
Interestingly while connecting by mule, I just need to provide host, port and password and it works.
Please help. Thanks in advance
From the redisinsight logs :
ERROR 2021-02-09 14:14:20,123 django.request Internal Server Error: /api/instance/
Traceback (most recent call last):
File "django\core\handlers\exception.py", line 34, in inner
File "django\core\handlers\base.py", line 115, in _get_response
File "django\core\handlers\base.py", line 113, in _get_response
File "django\views\decorators\csrf.py", line 54, in wrapped_view
File "django\views\generic\base.py", line 71, in view
File "rest_framework\views.py", line 495, in dispatch
File "rest_framework\views.py", line 455, in handle_exception
File "rest_framework\views.py", line 492, in dispatch
File "redisinsight\core\views\instance.py", line 208, in post
File "redisinsight\core\views\instance.py", line 147, in _save_redis_instance
File "redisinsight\core\services\database\_routines.py", line 80, in _wrapped_add_db_func
File "redisinsight\core\services\database\_routines.py", line 765, in add_redis_database
File "redisinsight\core\services\database\_routines.py", line 809, in add_standalone_db
File "redisinsight\core\services\database\_routines.py", line 576, in _add_standalone_db
File "redisinsight\core\services\database\_routines.py", line 190, in _assert_db_type
File "redisinsight\core\services\database\_routines.py", line 175, in _probe_db_type
File "redis\client.py", line 1281, in info
File "redis\client.py", line 878, in execute_command
File "redis\client.py", line 892, in parse_response
File "redis\connection.py", line 752, in read_response
redis.exceptions.ResponseError: unknown command `INFO`, with args beginning with:
It looks like the INFO command is disabled on your Redis server. RedisInsight needs basic commands like INFO and PING to be enabled.
To enable INFO command
Edit the Redis config file:
sudo nano /etc/redis/redis.conf
search for the INFO command
something like:
rename-command INFO ""
Comment the line and restart redis:
systemctl restart redis

Ambari cluster restart error: Timeline Service V2.0 Reader not restarting

Attempting to restart an Ambari-managed cluster and getting errors related to the Timeline Service V2.0 Reader service starting:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/timelinereader.py", line 108, in <module>
ApplicationTimelineReader().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/timelinereader.py", line 51, in start
hbase(action='start')
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/hbase_service.py", line 80, in hbase
createTables()
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/hbase_service.py", line 147, in createTables
logoutput=True)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 308, in _call
raise ExecuteTimeoutException(err_msg)
resource_management.core.exceptions.ExecuteTimeoutException: Execution of 'ambari-sudo.sh su yarn-ats -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/texlive/2016/bin/x86_64-linux:/usr/lib64/qt-3.3/bin:/usr/local/texlive/2016/bin/x86_64-linux:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/maven/bin:/root/bin:/opt/maven/bin:/opt/maven/bin:/var/lib/ambari-agent'"'"' ; sleep 10;export HBASE_CLASSPATH_PREFIX=/usr/hdp/3.0.0.0-1634/hadoop-yarn/timelineservice/*; /usr/hdp/3.0.0.0-1634/hbase/bin/hbase --config /usr/hdp/3.0.0.0-1634/hadoop/conf/embedded-yarn-ats-hbase org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator -Dhbase.client.retries.number=35 -create -s'' was killed due timeout after 300 seconds
I have not changed any configs or installed anything new between the restart attempt; simply stopped the cluster services and attempted to restart them. Not sure what this error message means. Any debugging tips or fixes?
Found the solution on another community post.
navigate to the host where Timeline Reader is installed and Install Hbase Client in that host
Here is how I installed HBase Client from via the Ambari UI...
In the Ambari UI, go to Hosts then click the host you want to install the hbase client component on
In the list on components, you will have option to add more, see...
From here I installed the HBase client
Then stopped and restarted the cluster via Ambari UI (got notification of stale configs (though not sure if this was my problem all along or if installing the HBase Client reaised the stale configs alert))

LetsEncrypt Certbot-Auto freezes when trying to run any command on Apache

I am trying to get LetsEncrypt SSL certificates installed on a Centos 6 server using Cerbot-Auto, however no matter what I try, it just hangs on:
Apache version is 2.2.15
Command
./certbot-auto -v
When I press CTRL + C to exit the program, it takes about 15 seconds and then exits with a stack trace:
Exiting abnormally:
Traceback (most recent call last):
File "/opt/eff.org/certbot/venv/bin/letsencrypt", line 9, in <module>
load_entry_point('letsencrypt==0.7.0', 'console_scripts', 'letsencrypt')()
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/main.py", line 1240, in main
return config.func(config, plugins)
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/main.py", line 981, in run
installer, authenticator = plug_sel.choose_configurator_plugins(config, plugins, "run")
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/selection.py", line 189, in choose_configurator_plugins
authenticator = installer = pick_configurator(config, req_inst, plugins)
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/selection.py", line 25, in pick_configurator
(interfaces.IAuthenticator, interfaces.IInstaller))
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/selection.py", line 77, in pick_plugin
verified.prepare()
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/disco.py", line 248, in prepare
return [plugin_ep.prepare() for plugin_ep in six.itervalues(self._plugins)]
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/disco.py", line 248, in <listcomp>
return [plugin_ep.prepare() for plugin_ep in six.itervalues(self._plugins)]
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot/plugins/disco.py", line 130, in prepare
self._initialized.prepare()
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot_apache/configurator.py", line 225, in prepare
self.parser = self.get_parser()
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot_apache/override_centos.py", line 39, in get_parser
self.version, configurator=self)
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot_apache/override_centos.py", line 47, in __init__
super(CentOSParser, self).__init__(*args, **kwargs)
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot_apache/parser.py", line 74, in __init__
if self.find_dir("Define", exclude=False):
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/certbot_apache/parser.py", line 401, in find_dir
"%s//*[self::directive=~regexp('%s')]" % (start, regex))
File "/opt/eff.org/certbot/venv/lib64/python3.4/site-packages/augeas.py", line 413, in match
ctypes.byref(array))
KeyboardInterrupt
Please see the logfiles in /var/log/letsencrypt for more details.
I thought it may be a python version issue but when checked, the server is running Python 2.6.6, which, according to the Certbot System Requirements is acceptable.
Letsencrypt.log
When I checked the log, it is exactly the same stacktrace as was reported by the script previously.
Any ideas?

Kivy - OSError, Still Apps Run Successfully?

Every time I run a Kivy app I see OSError (see it in last line of my given example). Even though my app runs successfully. What could be the cause of this error?
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/dist-packages/kivy/input/providers/mtdev.py", line 197, in _thread_run
_device = Device(_fn)
File "/usr/lib/python2.7/dist-packages/kivy/lib/mtdev.py", line 131, in __init__
self._fd = os.open(filename, os.O_NONBLOCK | os.O_RDONLY)
OSError: [Errno 13] Permission denied: '/dev/input/event5'
This error is not important, it just means that kivy checked the possible input providers in your OS and found that this one is forbidden.

Using redis with bokeh-server. Permission denied: '/bokehpids.json'

I'm trying to run bokeh-server with supervisor with redis as a backend and I get this error message on startup:
Traceback (most recent call last):
File "/usr/share/nginx/test-status/flask/bin/bokeh-server", line 7, in <module>
bokeh.server.run()
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/__init__.py", line 175, in run
start_server(args)
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/__init__.py", line 179, in start_server
start.start_simple_server(args)
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/start.py", line 54, in start_simple_server
start_redis()
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/start.py", line 40, in start_redis
save=redis_save)
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/services.py", line 81, in start_redis
stdin=subprocess.PIPE
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/services.py", line 32, in __init__
self.add_to_pidfile()
File "/usr/share/nginx/test-status/flask/lib/python2.7/site-packages/bokeh/server/services.py", line 46, in add_to_pidfile
with open(self.pidfilename, "w+") as f:
IOError: [Errno 13] Permission denied: '/bokehpids.json'
Note that I can run the server with supervisor if I use memory as the backend, and I can run bokeh-server manually with redis as a backend just fine. Does anyone know where the permissions I should change lie?
Turns out it was trying to access the pidfile in the root directory...
I solved this by changing the directory in the supervisor config file:
[program:bokeh]
...
directory=/usr/share/nginx/test-status
...