Ambari unable to run custom hook for modifying user hive - ambari
Attempting to add a client node to cluster via Ambari (v2.7.3.0) (HDP 3.1.0.0-78) and seeing odd error
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
fetch_nonlocal_groups = params.fetch_nonlocal_groups,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
shell.checked_call(command, sudo=True)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
Command failed after 1 tries
The problem appears to be
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
caused by
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
This is further reinforced by the fact that manually adding the ambari-hdp-1.repo and yum-installing hdp-select before adding the host to the cluster shows the same error messages, just truncated up to the parts of stdout/err shown here.
When running
[root#HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78
from the ambari server node, I can see the command runs.
Looking at what the hook script is trying to run/access, I see
[root#client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root#client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root#client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root 34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root 34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root#client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root#client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt 3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root 267 Nov 25 10:50 ..
drwxr-xr-x 6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------ 1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x 1 root root 16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x 1 root root 16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x 1 root root 16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x 1 root root 16K Nov 25 13:06 setupAgent1574723163.py
notice there is ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory. Not sure if this is normal, though.
Anyone know what could be causing this or any debugging hints from this point?
UPDATE 01:
Adding some log printing lines near the offending final line in the error trace, ie. File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages, I print the code and stdout:
2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory
So what the heck? It wants hdp-select to already be there, but ambari add-host UI complains if I manually install that binary myself beforehand. When I do manually install it (using the same repo file as in the rest of the existing cluster nodes) all I see is...
0
Packages:
accumulo-client
accumulo-gc
accumulo-master
accumulo-monitor
accumulo-tablet
accumulo-tracer
atlas-client
atlas-server
beacon
beacon-client
beacon-server
druid-broker
druid-coordinator
druid-historical
druid-middlemanager
druid-overlord
druid-router
druid-superset
falcon-client
falcon-server
flume-server
hadoop-client
hadoop-hdfs-client
hadoop-hdfs-datanode
hadoop-hdfs-journalnode
hadoop-hdfs-namenode
hadoop-hdfs-nfs3
hadoop-hdfs-portmap
hadoop-hdfs-secondarynamenode
hadoop-hdfs-zkfc
hadoop-httpfs
hadoop-mapreduce-client
hadoop-mapreduce-historyserver
hadoop-yarn-client
hadoop-yarn-nodemanager
hadoop-yarn-registrydns
hadoop-yarn-resourcemanager
hadoop-yarn-timelinereader
hadoop-yarn-timelineserver
hbase-client
hbase-master
hbase-regionserver
hive-client
hive-metastore
hive-server2
hive-server2-hive
hive-server2-hive2
hive-webhcat
hive_warehouse_connector
kafka-broker
knox-server
livy-client
livy-server
livy2-client
livy2-server
mahout-client
oozie-client
oozie-server
phoenix-client
phoenix-server
pig-client
ranger-admin
ranger-kms
ranger-tagsync
ranger-usersync
shc
slider-client
spark-atlas-connector
spark-client
spark-historyserver
spark-schema-registry
spark-thriftserver
spark2-client
spark2-historyserver
spark2-thriftserver
spark_llap
sqoop-client
sqoop-server
storm-client
storm-nimbus
storm-slider-client
storm-supervisor
superset
tez-client
zeppelin-server
zookeeper-client
zookeeper-server
Aliases:
accumulo-server
all
client
hadoop-hdfs-server
hadoop-mapreduce-server
hadoop-yarn-server
hive-server
Command failed after 1 tries
UPDATE 02:
Printing some custom logging from File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322 (printing the values of err_msg, code, out, err), ie.
....
312 if throw_on_failure and not code in returns:
313 err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c ode, all_output))
314
315 #TODO remove
316 print("\n----------\nMY LOGS\n----------\n")
317 print(err_msg)
318 print(code)
319 print(out)
320 print(err)
321
322 raise ExecutionFailed(err_msg, code, out, err)
323
324 # if separate stderr is enabled (by default it's redirected to out)
325 if stderr == subprocess32.PIPE:
326 return code, out, err
327
328 return code, out
....
I see
Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
So it seems like it is failing to create the hive user (even though it seems to have no problem creating the yarn-ats user before that)
After just giving in and trying to manually create the hive user myself, I see
[root#airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root#airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root#airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)
The fact that this existing user's uid looks like this and is not in the /etc/passwd file made me think that there is some existing Active Directory user (which this client node syncs with via installed SSSD) that already has the name hive. Checking our AD users, this turned out to be true.
Temporarily stopping the SSSD service to stop sync with AD (service sssd stop) (since, not sure if you can get a server to ignore AD syncs on an individual user basis) before rerunning the client host add in Ambari fixed the problem for me.
Related
Permission denied creating group ckan 2.9.5
I'm having an issue with my ckan 2.9.5 instance while doing anything (create groups, organizations, uploading file....). Every time I try to do something I get a Permission denied: '/var/lib/ckan/storage/uploads/group' even if is an sysadmin user. I tried giving full permissions to the /var/lib/ckan/storage but anything happens. These are the permissions of the folder And this is the error log: File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 2449, in wsgi_app response = self.handle_exception(e) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 1866, in handle_exception reraise(exc_type, exc_value, tb) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 2446, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 1951, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 1820, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask_debugtoolbar/__init__.py", line 125, in dispatch_request return view_func(**req.view_args) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "/usr/lib/ckan/venv/lib/python3.8/site-packages/flask/views.py", line 163, in dispatch_request return meth(*args, **kwargs) File "/usr/lib/ckan/venv/src/ckan/ckan/views/group.py", line 859, in post group = _action(u'group_create')(context, data_dict) File "/usr/lib/ckan/venv/src/ckan/ckan/logic/__init__.py", line 504, in wrapped result = _action(context, data_dict, **kw) File "/usr/lib/ckan/venv/src/ckan/ckan/logic/action/create.py", line 871, in group_create return _group_or_org_create(context, data_dict) File "/usr/lib/ckan/venv/src/ckan/ckan/logic/action/create.py", line 701, in _group_or_org_create upload = uploader.get_uploader('group') File "/usr/lib/ckan/venv/src/ckan/ckan/lib/uploader.py", line 60, in get_uploader upload = Upload(upload_to, old_filename) File "/usr/lib/ckan/venv/src/ckan/ckan/lib/uploader.py", line 126, in __init__ os.makedirs(self.storage_path) File "/usr/lib/python3.8/os.py", line 223, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '/var/lib/ckan/storage/uploads/group' Thanx for any help.
If you're doing this as the 'ckan' user, I think you're getting this error because the storage folder is probably owned by 'root'. You should give folder owner to user 'ckan'.
I had the same problem with a Docker instance of ckan. Solution (do this in the ckan container as root user): $ cd /var/lib/ckan $ ls -l total 8 drwxr-xr-x 3 root root 4096 Sep 7 17:17 storage drwxr-xr-x 5 ckan ckan 4096 Sep 7 17:18 webassets $ chown -R ckan.ckan storage $ ls -l total 8 drwxr-xr-x 3 ckan ckan 4096 Sep 7 17:17 storage drwxr-xr-x 5 ckan ckan 4096 Sep 7 17:18 webassets Now CKAN works smoothly.
Databricks considering files as directory
We are facing an issue on the Databrick filesyste that considers files as directory and we are unable to read files with Pandas. The files exist in the Azure Storage Explorer, and are considered as files as seen here : We have mounted the storage with oAuth 2.0. On Databricks, %sh ls -al '<path_to_files>' returns the following : total 1127 drwxrwxrwx 2 root root 4096 Jan 29 09:26 . drwxrwxrwx 2 root root 4096 Jan 9 13:47 .. drwxrwxrwx 1 root root 136705 Jan 28 16:35 AAAA_2019-10-01_2019-12-27.csv drwxrwxrwx 1 root root 183098 Jan 28 16:35 BBBB_2019-10-01_2019-12-27.csv -rwxrwxrwx 1 root root 313120 Jan 28 16:35 CCCC_2019-10-01_2019-12-27.csv -rwxrwxrwx 1 root root 212935 Jan 29 09:26 df_cube.csv -rwxrwxrwx 1 root root 298228 Jan 29 09:26 df_other_cube.csv The thing is, the two first csv files are not directories at all. We can download them and read them as csv, but we cannot load them into a Pandas dataframe. df = pd.read_csv(rootname_source_test + r'AAAA_2019-10-01_2019-12-27.csv',header=0,sep="|",engine='python') >>> IsADirectoryError: [Errno 21] Is a directory: '/dbfs/mnt/<path>/AAA_2019-10-01_2019-12-27.csv' They are generated the same way the 3rd csv is generated, and the 3rd on is loadable in pandas. Sometimes they appear as files, sometimes as directories and we are having trouble recreating and solving this consistently. Cluster config : Runtime 6.2 ML (includes Apache Spark 2.4.4, Scala 2.11) Any help will be very appreciated.
Eprints3: 'ast' is not a valid repository identifier
Our production EPrints (3.3.15) instance has recently failed. Nothing has changed regarding either the EPrints config or the Apache config, but on Sunday some errors appeared in the Apache error.log file: ------------------------------------------------------------------ ---------------- EPrints System Error ---------------------------- ------------------------------------------------------------------ Can't read cfg.d config files from /opt/eprints3/archives/test/cfg/cfg.d: No such file or directory ------------------------------------------------------------------ EPrints System Error inducing stack dump at /opt/eprints3/perl_lib/EPrints.pm line 145 EPrints::abort() called at /opt/eprints3/perl_lib/EPrints/Config.pm line 252 EPrints::Config::load_repository_config_module('test') called at /opt/eprints3/perl_lib/EPrints/Repository.pm line 447 EPrints::Repository::load_config('EPrints::Repository=HASH(0x7fe00c0d1688)') called at /opt/eprints3/perl_lib/EPrints/Repository.pm line 153 EPrints::Repository::new('EPrints::Repository', 'test', 'db_connect', 0) called at /opt/eprints3/perl_lib/EPrints.pm line 491 EPrints::repository('EPrints=HASH(0x7fe00ce7b428)', 'test', 'db_connect', 0) called at /opt/eprints3/perl_lib/EPrints.pm line 581 EPrints::load_repositories('EPrints=HASH(0x7fe00ce7b428)') called at /opt/eprints3/perl_lib/EPrints.pm line 397 EPrints::post_config_handler('APR::Pool=SCALAR(0x7fe00c829928)', 'APR::Pool=SCALAR(0x7fe00c8297c0)', 'APR::Pool=SCALAR(0x7fe00c829880)', 'Apache2::ServerRec=SCALAR(0x7fe00c8297f0)') called at -e line 0 eval {...} called at -e line 0 [Sun Sep 17 06:25:16 2017] [notice] Apache/2.2.16 (Debian) mod_auth_kerb/5.4 mod_ssl/2.2.16 OpenSSL/0.9.8o Phusion_Passenger/3.0.12 mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations (then there is a few hundred of these errors) ------------------------------------------------------------------ ---------------- EPrints System Error ---------------------------- ------------------------------------------------------------------ 'ast' is not a valid repository identifier: PerlSetVar EPrints_ArchiveID ast ------------------------------------------------------------------ EPrints System Error inducing stack dump at /opt/eprints3/perl_lib/EPrints.pm line 145 EPrints::abort('EPrints') called at /opt/eprints3/perl_lib/EPrints/Apache/Rewrite.pm line 62 EPrints::Apache::Rewrite::handler('Apache2::RequestRec=SCALAR(0x7fe00c8297c0)') called at -e line 0 eval {...} called at -e line 0 [Sun Sep 17 06:25:32 2017] [error] [client 46.229.168.67] File does not exist: (null) I have no idea what happened. I just find it strange that the first error relates to it trying to load config for the test archive, which also exists beside our ast archive. Thereafter it's not recognising our ast archive.
The issue was that there was an empty 'test' folder beside our 'ast' archive. The removal of this folder fixed the issue. The reason why this behavior only started now is still a bit of a mystery.
Resource manager is not starting
While starting resource manager from Ambari its not working and services like App Timeline Server, Node Manager and Yarn client have started n status of NodeManagers is Status n/a active / n/a lost / n/a unhealthy / n/a rebooted / n/a decommissioned Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 304, in <module> Resourcemanager().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 124, in start self.wait_for_dfs_directories_created(params.entity_groupfs_store_dir, params.entity_groupfs_active_dir) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 261, in wait_for_dfs_directories_created self.wait_for_dfs_directory_created(dir_path, ignored_dfs_dirs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py", line 55, in wrapper return function(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/resourcemanager.py", line 291, in wait_for_dfs_directory_created raise Fail("DFS directory '" + dir_path + "' does not exist !") resource_management.core.exceptions.Fail: DFS directory '/ats/done/' does not exist !
The service start script is looking for hdfs path /ats/done, check if such path exist with proper ownership and permissions as indicated below. [hdfs#vp-solr2 ~]$ hdfs dfs -ls / | grep ats drwxr-xr-x - yarn hadoop 0 2017-03-27 15:12 /ats [hdfs#vp-solr2 ~]$ hdfs dfs -ls /ats | grep done drwx------ - yarn hadoop 0 2017-06-19 08:33 /ats/done
Apache Ambari : Datanode installation failed while installing in existing cluster
I have created hadoop cluster using apache ambari 2.1.0 with 3 datanodes. Now when i am trying to add another datanode into(existing cluster) it, it throws an error that resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install 'hadoop_2_3_*'' returned 1. No Presto metadata available for base Delta RPMs reduced 3.6 M of updates to 798 k (78% saved) Here is my web UI console log: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 153, in DataNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 34, in install self.install_packages(env, params.exclude_packages) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 376, in install_packages Package(name) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in init self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/init.py", line 45, in action_install self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install 'hadoop_2_3_*'' returned 1. No Presto metadata available for base Delta RPMs reduced 3.6 M of updates to 798 k (78% saved) Error downloading packages: hadoop_2_3_4_0_3485-yarn-proxyserver-2.7.1.2.3.4.0-3485.el6.x86_64: [Errno 256] No more mirrors to try.
This looks like there are two issues with yum and your repositories. First I see the message: No Presto metadata available for base Delta RPMs reduced 3.6 M of updates to 798 k (78% saved) Try running the following command on the host that you are trying to add as a datanode to fix the first issue: sudo yum clean all Then see if you can perform this command successfully: sudo yum -v install hadoop_2_3_* If you get to the prompt that asks if you want to install (y/n) then it was successful, choose the no option, and retry the add datanode action from Ambari. If you get an error or some failure take a look at the verbose output to troubleshoot the problem further.