tidb-ansible error in ansible-playbook bootstrap.yml command - error-handling

I am trying to install TiDB through TiDB ansible. I am referring to documentation in https://pingcap.com/docs/v1.0/op-guide/ansible-deployment/ and in the process I am facing the following error for command ansible-playbook bootstrap.yml . Though the disk is not full I am getting the disk full error. Please do let me know the steps to fix the error. Thank you!
[root#fm42cephnode005 tidb-ansible-master]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 94G 0 94G 0% /dev
tmpfs 94G 0 94G 0% /dev/shm
tmpfs 94G 12M 94G 1% /run
tmpfs 94G 0 94G 0% /sys/fs/cgroup
/dev/sdg5 239G 5.2G 222G 3% /
/dev/sdg3 123G 1.8G 115G 2% /home
/dev/sdg2 1014M 257M 758M 26% /boot
/dev/sdg1 200M 12M 189M 6% /boot/efi
tmpfs 19G 20K 19G 1% /run/user/42
tmpfs 19G 0 19G 0% /run/user/0
/dev/nvme3n1 1.5T 77M 1.4T 1% /home/tidbdeployment
[root#fm42cephnode005 tidb-ansible-master]# ansible-playbook bootstrap.yml
PLAY [initializing deployment target] **********************************************************************************************************************************************************************
TASK [check_config_static : Ensure TiDB host exists] *******************************************************************************************************************************************************
TASK [check_config_static : Ensure PD host exists] *********************************************************************************************************************************************************
TASK [check_config_static : Ensure TiKV host exists] *******************************************************************************************************************************************************
TASK [check_config_static : Check ansible_user variable] ***************************************************************************************************************************************************
TASK [check_config_static : Ensure timezone variable is set] ***********************************************************************************************************************************************
TASK [check_config_static : Close old SSH control master processes] ****************************************************************************************************************************************
ok: [localhost]
TASK [check_config_static : Check ansible version] *********************************************************************************************************************************************************
TASK [check_config_static : Check if jmespath installed] ***************************************************************************************************************************************************
changed: [localhost]
TASK [check_config_static : Check if jinja2 installed] *****************************************************************************************************************************************************
changed: [localhost]
TASK [check_config_static : Preflight check - Fail when jmespath or jinja2 isn't installed] ****************************************************************************************************************
TASK [check_config_static : Get jmespath info] *************************************************************************************************************************************************************
changed: [localhost]
TASK [check_config_static : Get jmespath version] **********************************************************************************************************************************************************
ok: [localhost]
TASK [check_config_static : Get jinja2 info] ***************************************************************************************************************************************************************
changed: [localhost]
TASK [check_config_static : Get jinja2 version] ************************************************************************************************************************************************************
ok: [localhost]
TASK [check_config_static : Preflight check - Fail when the versions of jmespath and jinja2 doesn't meet the requirements] *********************************************************************************
TASK [check_config_static : Check inventory configuration] *************************************************************************************************************************************************
changed: [localhost]
TASK [check_config_static : Preflight check - If the inventory configuration is correct] *******************************************************************************************************************
PLAY [check node config] ***********************************************************************************************************************************************************************************
TASK [pre-ansible : disk space check - fail when disk is full] *********************************************************************************************************************************************
fatal: [localhost]: FAILED! =>
msg: |-
The conditional check ' '100%' in disk_space_st.stdout ' failed. The error was: error while evaluating conditional ( '100%' in disk_space_st.stdout ): Unable to look up a name or access an attribute in template string ({% if '100%' in disk_space_st.stdout %} True {% else %} False {% endif %}).
Make sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable
to retry, use: --limit #/home/tidb-ansible-master/retry_files/bootstrap.retry
PLAY RECAP *************************************************************************************************************************************************************************************************
localhost : ok=8 changed=5 unreachable=0 failed=1
ERROR MESSAGE SUMMARY **************************************************************************************************************************************************************************************
[localhost]: Ansible Failed! ==>
msg: |-
The conditional check ' '100%' in disk_space_st.stdout ' failed. The error was: error while evaluating conditional ( '100%' in disk_space_st.stdout ): Unable to look up a name or access an attribute in template string ({% if '100%' in disk_space_st.stdout %} True {% else %} False {% endif %}).
Make sure your variable name does not contain invalid characters like '-': argument of type 'StrictUndefined' is not iterable

This is to use the df -h. | tail -n1 command to determine the space usage of the home directory partition corresponding to the ansible_user of all machines in the inventory.ini. Partition usage may be 100% when errors occurred. If confirm that there is space left, please try again.

Related

Error running tez in hive. Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

Hadoop 3.3.5
Hive 3.1.3
Tez 0.10.2
I follow the instruction in this link to build tez 0.10.2 for hadoop 3.3.5: https://tez.apache.org/install.html
The db is stored on s3 bucket and I am able to run 'select count(*) from m1.t1' using hive.execution.engine=mr.
When I set hive.execution.engine=tez, and run the same query, I got this error immediately:
2023-02-15T21:21:09,208 INFO [a6e2cd1a-b2c9-42d8-9568-8e0b64677f77 main] client.TezClient: App did not succeed. Diagnostics: Application application_1676506240754_0019 failed 2 times due to AM Contai
ner for appattempt_1676506240754_0019_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2023-02-15 21:21:08.730]Exception from container-launch.
Container id: container_1676506240754_0019_02_000001
Exit code: 1
[2023-02-15 21:21:08.732]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
[2023-02-15 21:21:08.733]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster
If I set tez.use.cluster.hadoop-libs to true in tez-site.xml, I got YARN running but failed with load aws credential error even I have set the fs.s3a credentials in hadoop's core-site.xml, hive's hive-site.xml and .bashrc environment variables.
keys are masked to show sample only:
echo $AWS_ACCESS_KEY_ID
I9U996400005XXXXXXXX
echo $AWS_SECRET_KEY
mPY8GiU6NegNWoVnaODXXXXXXXXXXXXXXXXXXXX
hive> set hive.execution.engine=tez;
hive> select count(*) from m1.t1;
Query ID = hdp-user_20230215210146_62ed9fab-5d4a-42a9-bf54-5fb6f84a9048
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1676506240754_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 container INITIALIZING -1 0 0 -1 0 0
Reducer 2 container INITED 1 0 0 1 0 0
----------------------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 2.03 s
----------------------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1676506240754_0015_3_00, diagnostics=[Vertex vertex_1676506240754_0015_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1676506240754_0015_3_00 [Map 1], java.nio.file.AccessDeniedException: s3a://hadoop-cluster/warehouse/tablespace/managed/hive/m1.db/t1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Tried to add all fs.s3a properties from core-site.xml to tez-site.xml and set fs,s3a,access.key and set fs.s3a.secret.key= inside hive session but still get same error.
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Question: according to tez install instruction
Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml, or if it is set, the value should be false
But when set to false, tez could not run.
When set to true, I got aws credential error even though I set them in every possible location or environment variables.
==========================================================
Update:
Not sure if this is the right answer to this problem but I finally got it working by adding this property to hive-site.xml
<property>
<name>hive.conf.hidden.list</name>
<value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
</property>
Default all fs.s3a credential are hidden config even you don't set this property. I explicitly add this property and remove all fs.s3a credential related from the value.
Now, I can run select count(*) with tez.

Cannot install kernel-devsrc

I'm trying to set up my environment to use Yocto's generated SDK to compile my out-of-tree module, but for some reason, I'm getting an error.
cp: cannot stat 'arch/arm/kernel/module.lds': No such file or directory
I'm using Poky distribution and meta-raspberrypi which is needed because I'm using the RPI ZeroW board.
Apart from this everything works fine. I'm able to compile the entire image and load it on the board.
Here is the line I've added to local.conf
TOOLCHAIN_TARGET_TASK_append = " kernel-devsrc"
as I've found in the documentation.
Also below you can find the whole log from the compilation.
DEBUG: Executing python function extend_recipe_sysroot
NOTE: Direct dependencies are ['virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-bsp/u-boot/u-boot-tools_2020.07.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/binutils/binutils-cross_2.35.1.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-kernel/kmod/kmod-native_git.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/gcc/gcc-cross_10.2.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-support/gmp/gmp_6.2.0.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/pkgconfig/pkgconfig_git.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-extended/xz/xz_5.2.5.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-connectivity/openssl/openssl_1.1.1k.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/bison/bison_3.7.2.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/quilt/quilt-native_0.66.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-extended/bc/bc_1.07.1.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-core/glibc/glibc_2.32.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/patch/patch_2.7.6.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-kernel/kern-tools/kern-tools-native_git.bb:do_populate_sysroot', 'virtual:native:/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/pseudo/pseudo_git.bb:do_populate_sysroot', '/home/pp/yocto-hh/sources/poky/meta/recipes-devtools/gcc/gcc-runtime_10.2.bb:do_populate_sysroot']
NOTE: Installed into sysroot: []
NOTE: Skipping as already exists in sysroot: ['u-boot-tools-native', 'binutils-cross-arm', 'kmod-native', 'gcc-cross-arm', 'gmp-native', 'pkgconfig-native', 'xz-native', 'openssl-native', 'bison-native', 'quilt-native', 'bc-native', 'glibc', 'patch-native', 'kern-tools-native', 'pseudo-native', 'gcc-runtime', 'python3-native', 'gnu-config-native', 'autoconf-native', 'libtool-native', 'gtk-doc-native', 'automake-native', 'zlib-native', 'texinfo-dummy-native', 'readline-native', 'flex-native', 'attr-native', 'libmpc-native', 'mpfr-native', 'linux-libc-headers', 'gettext-minimal-native', 'libgcc', 'libnsl2-native', 'gdbm-native', 'libffi-native', 'bzip2-native', 'util-linux-native', 'sqlite3-native', 'libtirpc-native', 'm4-native', 'ncurses-native', 'libcap-ng-native', 'libpcre2-native']
DEBUG: Python function extend_recipe_sysroot finished
DEBUG: Executing shell function do_install
cp: cannot stat 'arch/arm/kernel/module.lds': No such file or directory
WARNING: exit code 1 from a shell command.
ERROR: Execution of '/home/pp/yocto-hh/build/tmp/work/hhctrl-poky-linux-gnueabi/kernel-devsrc/1.0-r0/temp/run.do_install.109942' failed with exit code 1:
cp: cannot stat 'arch/arm/kernel/module.lds': No such file or directory
WARNING: exit code 1 from a shell command
The command I am using to produce the SDK is:
bitbake name-of-my-image -c populate_sdk
What could be the problem here? Or how should I debug it? I've found a few touching on the subject and it seems that it should be already fixed, but for some reason, in my environment, it still does not work.
Missing the module.lds file in the latest kernel. Apply the following source code as a patch in the kernel and build the image.
diff -Naur a/arch/arm/kernel/module.lds b/arch/arm/kernel/module.lds
--- a/arch/arm/kernel/module.lds 1970-01-01 05:30:00.000000000 +0530
+++ b/arch/arm/kernel/module.lds 2020-02-28 21:53:45.000000000 +0530
## -0,0 +1,5 ##
+/* SPDX-License-Identifier: GPL-2.0 */
+SECTIONS {
+ .plt : { BYTE(0) }
+ .init.plt : { BYTE(0) }
+}

Sqoop Import Error: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)

I am using HDP 2.6 Sandbox. I have created a user space with user root under hdfs group and executing following sqoop hive import and encountering following 2 errors:
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
However, data got imported correctly into hive table.
Please help me to understand the significant of this error and how can I overcome this error.
[root#sandbox-hdp ~]# sqoop import \
> --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db \
> --username retail_dba \
> --password hadoop \
> --table departments \
> --hive-home /apps/hive/warehouse \
> --hive-import \
> --create-hive-table \
> --hive-table retail_db.departments \
> --target-dir /user/root/hive_import \
> --outdir java_files
Warning: /usr/hdp/2.6.3.0-235/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/01/14 09:42:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.3.0-235
18/01/14 09:42:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/01/14 09:42:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
18/01/14 09:42:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
18/01/14 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/01/14 09:42:38 INFO tool.CodeGenTool: Beginning code generation
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:42:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.3.0-235/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/01/14 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e1ec5b443f92219f1f061ad4b64cc824/departments.jar
18/01/14 09:42:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/01/14 09:42:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/01/14 09:42:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/01/14 09:42:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/01/14 09:42:40 INFO mapreduce.ImportJobBase: Beginning import of departments
18/01/14 09:42:41 INFO client.RMProxy: Connecting to ResourceManager at sandbox-hdp.hortonworks.com/172.17.0.2:8032
18/01/14 09:42:42 INFO client.AHSProxy: Connecting to Application History server at sandbox-hdp.hortonworks.com/172.17.0.2:10200
18/01/14 09:42:46 INFO db.DBInputFormat: Using read commited transaction isolation
18/01/14 09:42:46 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`department_id`), MAX(`department_id`) FROM `departments`
18/01/14 09:42:46 INFO db.IntegerSplitter: Split size: 1; Num splits: 4 from: 2 to: 7
18/01/14 09:42:46 INFO mapreduce.JobSubmitter: number of splits:4
18/01/14 09:42:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1515818851132_0050
18/01/14 09:42:47 INFO impl.YarnClientImpl: Submitted application application_1515818851132_0050
18/01/14 09:42:47 INFO mapreduce.Job: The url to track the job: http://sandbox-hdp.hortonworks.com:8088/proxy/application_1515818851132_0050/
18/01/14 09:42:47 INFO mapreduce.Job: Running job: job_1515818851132_0050
18/01/14 09:42:55 INFO mapreduce.Job: Job job_1515818851132_0050 running in uber mode : false
18/01/14 09:42:55 INFO mapreduce.Job: map 0% reduce 0%
18/01/14 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
18/01/14 09:43:09 INFO mapreduce.Job: map 50% reduce 0%
18/01/14 09:43:12 INFO mapreduce.Job: map 75% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: map 100% reduce 0%
18/01/14 09:43:14 INFO mapreduce.Job: Job job_1515818851132_0050 completed successfully
18/01/14 09:43:16 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=682132
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=481
HDFS: Number of bytes written=60
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=44760
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=44760
Total vcore-milliseconds taken by all map tasks=44760
Total megabyte-milliseconds taken by all map tasks=11190000
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=481
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1284
CPU time spent (ms)=5360
Physical memory (bytes) snapshot=561950720
Virtual memory (bytes) snapshot=8531210240
Total committed heap usage (bytes)=176685056
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=60
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Transferred 60 bytes in 34.7351 seconds (1.7274 bytes/sec)
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Retrieved 6 records.
18/01/14 09:43:16 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
18/01/14 09:43:16 WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook
java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.sqoop.mapreduce.PublishJobData.publishJobData(PublishJobData.java:46)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:284)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:507)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
18/01/14 09:43:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `departments` AS t LIMIT 1
18/01/14 09:43:16 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/hdp/2.6.3.0-235/hive/lib/hive-common-1.2.1000.2.6.3.0-235.jar!/hive-log4j.properties
OK
Time taken: 10.427 seconds
Loading data to table retail_db.departments
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setOwner(FSDirAttrOp.java:89) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setOwner(FSNamesystem.java:1873) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setOwner(NameNodeRpcServer.java:828)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setOwner(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
The first error
WARN mapreduce.PublishJobData: Unable to publish import data to publisher org.apache.atlas.sqoop.hook.SqoopHook java.lang.ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
You need to check if Sqoop binaries are ok. Better copy them again so you don't need to ckeck file by file.
The second error
Failed with exception org.apache.hadoop.security.AccessControlException: User null does not belong to Hadoop
Is because you are executing sqoop with "root" user. Change it to user that exists in the hadoop cluster.
Two ideas
ClassNotFoundException: org.apache.atlas.sqoop.hook.SqoopHook
There is a class missing somewhere.
And I see you're trying to run you sqoop command using your root account under LINUX. Make sure root belong to hdfs group. I'm not sure root is included by default.
Sometimes null values will not handle by Sqoop while importing data into Hive from RDBMS so you should handle them explicitly by using the following keys:
--null-string and --null-non-string
Complete command is
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/retail_db --username retail_dba --password hadoop --table departments --hive-home /apps/hive/warehouse --null-string 'na' --null-non-string 'na' --hive-import --create-hive-table --hive-table retail_db.departments --target-dir /user/root/hive_import
It's occurrence is due to the field in in /etc/hive/conf/hive-site.xml:
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
Set the value to false and try to run the same query,
Or else make the --target-dir /user/root/hive_import to read/write access directory or remove it, it will take the hive home directory

flink on yarn - Could not initialize class org.apache.hadoop.fs.s3a.S3AFileSystem

I'm trying to run a Flink job on yarn:
$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ ./${FLINK_HOME}/bin/yarn-session.sh ...
$ ./${FLINK_HOME}/bin/flink run \
/home/clsadmin/messagehub-to-s3-1.0-SNAPSHOT.jar \
--kafka-brokers ${KAFKA_BROKERS} \
...
--output-folder s3://${S3_BUCKET}/${S3_FOLDER}
The startup output shows the hadoop classpath getting added to flink:
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /usr/hdp/2.6.2.0-205/hadoop/conf:/usr/hdp/2.6.2.0-205/hadoop/lib/*:/usr/hdp/2.6.2.0-205/hadoop/.//*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/./:/usr/hdp/2.6.2.0-205/hadoop-hdfs/lib/*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/.//*:/usr/hdp/2.6.2.0-205/hadoop-yarn/lib/*:/usr/hdp/2.6.2.0-205/hadoop-yarn/.//*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/lib/*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/.//*::mysql-connector-java.jar:/home/common/lib/dataconnectorStocator/*:/usr/hdp/2.6.2.0-205/tez/*:/usr/hdp/2.6.2.0-205/tez/lib/*:/usr/hdp/2.6.2.0-205/tez/conf
However, the job fails to start:
01/29/2018 16:11:04 Job execution switched to status RUNNING.
01/29/2018 16:11:04 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to SCHEDULED
01/29/2018 16:11:04 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to DEPLOYING
01/29/2018 16:11:05 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to RUNNING
01/29/2018 16:11:05 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to FAILED
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.s3a.S3AFileSystem
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1196)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:745)
However, if I look at the classpath reported by Flink:
$ grep org.apache.hadoop.fs.s3a.S3AFileSystem /usr/hdp/2.6.2.0-205/hadoop/.//*
I can see the hadoop-aws.jar is on the classpath:
...
Binary file /usr/hdp/2.6.2.0-205/hadoop/.//hadoop-aws-2.7.3.2.6.2.0-205.jar matches
Binary file /usr/hdp/2.6.2.0-205/hadoop/.//hadoop-aws.jar matches
...
Investigating further I can see the class exists in the jar:
jar -tf /usr/hdp/2.6.2.0-205/hadoop/hadoop-aws.jar | grep org.apache.hadoop.fs.s3a.S3AFileSystem
returns
org/apache/hadoop/fs/s3a/S3AFileSystem$1.class
org/apache/hadoop/fs/s3a/S3AFileSystem$2.class
org/apache/hadoop/fs/s3a/S3AFileSystem$3.class
org/apache/hadoop/fs/s3a/S3AFileSystem$WriteOperationHelper.class
org/apache/hadoop/fs/s3a/S3AFileSystem$4.class
org/apache/hadoop/fs/s3a/S3AFileSystem.class
If I submit an example app:
./flink-1.4.0/bin/flink run \
./flink-1.4.0/examples/batch/WordCount.jar \
--input "s3://${S3_BUCKET}/LICENSE-2.0.txt" \
--output "s3://${S3_BUCKET}/license-word-count.txt"
This runs without a problem.
My jar file doesn't contain any hadoop classes:
$ jar -tf /home/clsadmin/messagehub-to-s3-1.0-SNAPSHOT.jar | grep hadoop
<<not found>>
Does any one have any idea why the Flink example runs ok, but Flink has an issue loading the S3AFileSystem with my code?
Update:
I fixed the problem by:
rm -f flink-1.4.0/lib/flink-shaded-hadoop2-uber-1.4.0.jar
Is this a safe solution?

Unable to install httpd service on chef client(node)

I am new to chef and following "Learning Chef" book from "O'Riely" to learn the chef basics.
In its chapter 07 they have described to install httpd service on chef client(node) from chef host using cookbook.
This is how my .kitchen.yaml file looks like :
---
driver:
name: vagrant
provisioner:
name: chef_zero
platforms:
- name: centos_apache
driver:
box: learningchef/centos65
boxurl: learningchef/centos65
suites:
- name: default
run_list:
- recipe[my_apache::default]
attributes:
The recipe for installing httpd service looks like :
#
# Cookbook Name:: my_apache
# Recipe:: default
#
# Copyright (c) 2015 The Authors, All Rights Reserved.
#
yum_package 'httpd' do
source "/home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm"
action :install
end
And this is the log which I get after executing the command "kitchen converge"
-----> Starting Kitchen (v1.4.0)
-----> Converging <default-centos-apache>...
Preparing files for transfer
Preparing dna.json
Preparing current project directory as a cookbook
Removing non-cookbook files before transfer
Preparing validation.pem
Preparing client.rb
-----> Chef Omnibus installation detected (install only if missing)
Transferring files to <default-centos-apache>
Starting Chef Client, version 12.4.0
[2015-07-08T12:56:06+00:00] WARN: Child with name 'dna.json' found in multiple directories: /tmp/kitchen/dna.json and /tmp/kitchen/dna.json
resolving cookbooks for run list: ["my_apache::default"]
Synchronizing Cookbooks:
- my_apache
Compiling Cookbooks...
Converging 1 resources
Recipe: my_apache::default
================================================================================
Error executing action `install` on resource 'yum_package[httpd]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
STDOUT: [option installonlypkgs] kernel kernel-bigmem installonlypkg(kernel-module) installonlypkg(vm) kernel-enterprise kernel-smp kernel-debug kernel-unsupported kernel-source kernel-devel kernel-PAE kernel-PAE-debug
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=6&arch=x86_64&repo=os error was
14: PYCURL ERROR 7 - "Failed to connect to 2a02:2498:1:3d:5054:ff:fed3:e91a: Network is unreachable"
STDERR: yum-dump Repository Error: Cannot find a valid baseurl for repo: base
---- End output of /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
Ran /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 returned 1
Resource Declaration:
---------------------
# In /tmp/kitchen/cache/cookbooks/my_apache/recipes/default.rb
8: yum_package 'httpd' do
9: source "/home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm"
10: action :install
11: end
Compiled Resource:
------------------
# Declared in /tmp/kitchen/cache/cookbooks/my_apache/recipes/default.rb:8:in `from_file'
yum_package("httpd") do
action :install
retries 0
retry_delay 2
default_guard_interpreter :default
package_name "httpd"
source "/home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm"
flush_cache {:before=>false, :after=>false}
declared_type :yum_package
cookbook_name "my_apache"
recipe_name "default"
end
Running handlers:
[2015-07-08T12:56:14+00:00] ERROR: Running exception handlers
Running handlers complete
[2015-07-08T12:56:14+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 11.596431753 seconds
[2015-07-08T12:56:14+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out
[2015-07-08T12:56:14+00:00] ERROR: yum_package[httpd] (my_apache::default line 8) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
STDOUT: [option installonlypkgs] kernel kernel-bigmem installonlypkg(kernel-module) installonlypkg(vm) kernel-enterprise kernel-smp kernel-debug kernel-unsupported kernel-source kernel-devel kernel-PAE kernel-PAE-debug
Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=6&arch=x86_64&repo=os error was
14: PYCURL ERROR 7 - "Failed to connect to 2a02:2498:1:3d:5054:ff:fed3:e91a: Network is unreachable"
STDERR: yum-dump Repository Error: Cannot find a valid baseurl for repo: base
---- End output of /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 ----
Ran /usr/bin/python /opt/chef/embedded/apps/chef/lib/chef/provider/package/yum-dump.py --options --installed-provides --yum-lock-timeout 30 returned 1
[2015-07-08T12:56:14+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
>>>>>> Converge failed on instance <default-centos-apache>.
>>>>>> Please see .kitchen/logs/default-centos-apache.log for more details
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: SSH exited (1) for command: [sh -c '
sudo -E /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level auto --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
']
>>>>>> ----------------------
I want to install httpd service using the local rpm package. Chef client is already installed on the virtual machine.
I have tried various steps, but getting same output always.
Update: So I did yum update in my host and client both.
After that the output log changed.. I says that it couldn't find the package at defined source, while it is present there. Please suggest::
-----> Starting Kitchen (v1.4.0)
-----> Converging <default-centos-apache>...
Preparing files for transfer
Preparing dna.json
Preparing current project directory as a cookbook
Removing non-cookbook files before transfer
Preparing validation.pem
Preparing client.rb
-----> Chef Omnibus installation detected (install only if missing)
Transferring files to <default-centos-apache>
Starting Chef Client, version 12.4.0
[2015-07-09T14:16:57+00:00] WARN: Child with name 'dna.json' found in multiple directories: /tmp/kitchen/dna.json and /tmp/kitchen/dna.json
resolving cookbooks for run list: ["my_apache::default"]
Synchronizing Cookbooks:
- my_apache
Compiling Cookbooks...
Converging 1 resources
Recipe: my_apache::default
================================================================================
Error executing action `install` on resource 'yum_package[httpd]'
================================================================================
Chef::Exceptions::Package
-------------------------
Package httpd not found: /home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm
Resource Declaration:
---------------------
# In /tmp/kitchen/cache/cookbooks/my_apache/recipes/default.rb
8: package "httpd" do
9: source "/home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm"
10: action :install
11: end
Compiled Resource:
------------------
# Declared in /tmp/kitchen/cache/cookbooks/my_apache/recipes/default.rb:8:in `from_file'
yum_package("httpd") do
action :install
retries 0
retry_delay 2
default_guard_interpreter :default
package_name "httpd"
source "/home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm"
flush_cache {:before=>false, :after=>false}
declared_type :package
cookbook_name "my_apache"
recipe_name "default"
end
Running handlers:
[2015-07-09T14:17:01+00:00] ERROR: Running exception handlers
Running handlers complete
[2015-07-09T14:17:01+00:00] ERROR: Exception handlers complete
Chef Client failed. 0 resources updated in 6.822340816 seconds
[2015-07-09T14:17:01+00:00] FATAL: Stacktrace dumped to /tmp/kitchen/cache/chef-stacktrace.out
[2015-07-09T14:17:01+00:00] ERROR: yum_package[httpd] (my_apache::default line 8) had an error: Chef::Exceptions::Package: Package httpd not found: /home/vipul/Downloads/httpd-2.2.15-39.el6.centos.x86_64.rpm
[2015-07-09T14:17:01+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
>>>>>> Converge failed on instance <default-centos-apache>.
>>>>>> Please see .kitchen/logs/default-centos-apache.log for more details
>>>>>> ------Exception-------
>>>>>> Class: Kitchen::ActionFailed
>>>>>> Message: SSH exited (1) for command: [sh -c '
sudo -E /opt/chef/bin/chef-client --local-mode --config /tmp/kitchen/client.rb --log_level auto --force-formatter --no-color --json-attributes /tmp/kitchen/dna.json --chef-zero-port 8889
']
>>>>>> ----------------------
Might want to try following the steps outlined here if you can't work out your proxy/network issues. http://xmodulo.com/how-to-fix-yum-errors-on-centos-rhel-or-fedora.html
Regards,
The answer to the problem is in comments of question,by Mark.
Hence just pasting it here.
Terminal proxies are not enough. Kitchen is running chef client within a virtual machine. See: docs.chef.io/config_yml_kitchen.html#work-with-proxies