sqoop import changes ownership : hive table displays NULL values in columns

sqoop import changes ownership : hive table displays NULL values in columns - hive

I have a table orders_1 with 300 records and I want to load data into hive table.
When I ran the sqoop hive-import command, table data was loaded in the /hive/warehouse/ location.But the user ownership has changed and I'm facing the issue when I am querying the hive table to view the loaded table data.It displays all the columns with NULL values.
sqoop-import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --table orders_1 --hive-import --hive-database my_sqoop_db --hive-table hive_orders1 --hive-overwrite --fields-terminated-by '\t' --lines-terminated-by '\n' -m 1;
Output :
INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar!/hive-log4j.properties
OK
Time taken: 1.039 seconds
Loading data to table my_sqoop_db.hive_orders1
chgrp: **changing ownership** of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/my_sqoop_db.db/hive_orders1/part-m-00000': User does not belong to hive
Table my_sqoop_db.hive_orders1 stats: [numFiles=1, **numRows=0**, totalSize=12466, rawDataSize=0]
OK
FYI , with the same user root the first time I was able to do sqoop hive-import and view the data in hive and do hive operations.
I could not figure out what went wrong. Can anybody please help me in resolving this issue ?

Related

Hive CREATE TABLE takes long time to run and does not complete in Cloudera

HIVE CREATE TABLE not working in CLOUDERA 5.8
CREATE EXTERNAL TABLE IF NOT EXISTS retail_db.products (product_id INT,product_category_id INT ,product_name STRING, product_description STRING ,product_price DECIMAL, product_image STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/cloudera/retail_db/products'
Takes lot of time and never completes executing.
Also Sqoop Hive import is not completing.
sqoop import --connect jdbc:mysql://localhost/retail_db --username root --password cloudera --target-dir /user/cloudera/retail_db/products_test --table products --hive-import --create-hive-table --hive-table retail_db.products
The script stops at:
Log:
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=173993
17/07/03 06:31:48 INFO mapreduce.ImportJobBase: Transferred 169.915 KB in 2,340.3572 seconds (74.3446 bytes/sec)
17/07/03 06:31:48 INFO mapreduce.ImportJobBase: Retrieved 1345 records.
17/07/03 06:31:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products` AS t LIMIT 1
17/07/03 06:31:49 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.8.0.jar!/hive-log4j.properties`
Tried restarting the hive-services.
But the CREATE TABLE tablename works in Impala.

issues with sqoop and hive

We are facing the following issues details as follows, please share your inputs.
1) Issue with --validate option in sqoop
if we run the sqoop command without creating a job for it, validate works. But if we create a job first, with validate option the validate doesn't seem to work.
works with
sqoop import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
Does not work with
sqoop job --create Job_import_emp import --connect "DB connection" --username $USER --password-file $File_Path --warehouse-dir $TGT_DIR --as-textfile --fields-terminated by '|' --lines-teriminated-by '\n' --table emp_table -m 1 --outdir $HOME/javafiles --validate
2) Issue with Hive import
If we are importing data for the first time in hive, it becomes imperative to create hive table ( hive internal), so we keep "--create-hive-table" in sqoop command.
Even thouhg if i keep "--create-hive-table" option, Is there any way to skip create table step in hive while importing, if the table is already exists.
Thanks
Sheik

Sqoop allows --validate option only for sqoop import and sqoop export commands.
From the official Sqoop User guide, the validation has these limitations,
all-tables option
free-form query option
Data imported into Hive or HBase table
import with --where argument
No, the table check cannot be skip if --create-hive-table option is set, the job will fail if the target table exists.

Creating text table from Impala partitioned parquet table

I have a parquet table formatted as follows:
.impala_insert_staging
yearmonth=2013-04
yearmonth=2013-05
yearmonth=2013-06
...
yearmonth=2016-04
Underneath each of these directories are my parquet files. I need to get them into my another table which just has a
.impala_insert_staging
file.
Please help.

The best I found is to pull the files in locally and sqoop them back up into a text table.
To pull the parquet table down I performed the following:
impala-shell -i <ip-addr> -B -q "use default; select * from <table>" -o filename '--output_delimiter=\x1A'
Unfortunately this adds the yearmonth value as another column on my table. So I either go into my 750GB file and sed/awk out that last column or use mysqlimport (since I'm using MySQL as well) to import only the columns I'm interested in.
Finally I'll sqoop up the data to a new text table.
sqoop import --connect jdbc:mysql://<mysqlip> --table <mysql_table> -uroot -p<pass> --hive-import --hive-table <new_db_text>

Sqoop import into Hive Sequence table

I am trying to load a Hive table using the Sqoop import commands. But when I run it says that Sqoop doesn't support SEQUENCE FILE FORMAT while loading into hive.
Is this correct , I through SQOOP has matured for all the formats present in Hive. Can anyone guide me on this. And if at all standard procedure to load Hive tables which have SEQUENCE FILE FORMAT using sqoop.

Currently importing of sequence files directly into Hive is not supported yet. But you can import data --as-seuquencefile into HDFS and then you can create an external table on top of that. As you are saying you are getting exceptions even with this approach, please paste your sample code & logs, so that I can help you.

PFB code
sqoop import --connect jdbc:mysql://xxxxx/Emp_Details --username xxxx--password xxxx --table EMP --as-sequencefile --hive-import --target-dir /user/cloudera/emp_2 --hive-overwrite

Difference between 2 commands in Sqoop

Please tell me what is the difference between the 2 commands below
sqoop import --connect jdbc:mysql://localhost:3306/db1
--username root --password password
--table tableName --hive-table tableName --create-hive-table --hive-import;
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1
--username root --password password;
What is the difference of using --create-hive-table & just create-hive-table in both the commands?

Consider the two queries:
1) When --create-hive-table is used, the contents of the RDBMS table will be copied to the location mentioned by --target-dir (HDFS Location). This will check whether the table sqoop.emp exists in Hive or not.
If the table in Hive doesn't exist, data from the HDFS location is moved to the hive table and everything goes well.
In case, if the table (sqoop.emp) already exists in Hive, an error is thrown: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table emp already exists)
Example:
sqoop import \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--target-dir /user/hive/sqoop/employee \
--delete-target-dir \
--hive-import \
--hive-table sqoophive.emp \
--create-hive-table \
--fields-terminated-by ',' \
--num-mappers 3
2) When create-hive-table is used without hive-import
The schema of the swoop.employee (in RDBMS) is fetched and using that a table is created under the default database in hive (default.employee). But no data is transferred.
Example (Modified form of one given in the book (Hadoop Definitive Guide by Tom White):
sqoop create-hive-table \
--connect jdbc:mysql://jclient.ambari.org/sqoop \
--username hdfs -P \
--table employee \
--fields-terminated-by ','
Now the question is, when to use what. Former is used when no data is only present in the RDBMS and we need to not only create but populate the table in Hive in one go.
The latter is used when the table has to be created in the Hive but not to be populated. Or in case when the data already exists in HDFS and it is to be used to populate the hive table.

sqoop-import --connect jdbc:mysql://localhost:3306/db1
>-username root -password password
>--table tableName --hive-table tableName --create-hive-table --hive-import;
The above command will import data from db into hive with hive default settings and if table is not already present it will create a table in Hive with same schema as it was in DB.
sqoop create-hive-table --connect jdbc:mysql://localhost:3306/db1
>-username root -password password;
The create-hive-table tool will create a table in Hive Metastore, with a definition for a table based on a database table previously imported to HDFS, or one planned to be imported(it will pick from sqoop job). This effectively performs the "--hive-import" step of sqoop-import without running the preceeding import.
For example consider you have imported table1 from db1 into hdfs using sqoop. If you execute create-hive-table next it will create a table in hive metastore with table schema from db1 of table1. So it will be usefull for you to load data into this table in future whenever needed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sqoop import changes ownership : hive table displays NULL values in columns - hive

Related

Hive CREATE TABLE takes long time to run and does not complete in Cloudera

issues with sqoop and hive

Creating text table from Impala partitioned parquet table

Sqoop import into Hive Sequence table

Difference between 2 commands in Sqoop

Categories

Resources