Sqoop hive import from mysql to hive is failing - hive

I am trying to load a table from mysql to hive using --hive-import in parquet format, We want to do incremental update of hive table. When we try below command. its failing with the error mentioned below. Can anybody please help here.
sqoop job --create users_test_hive -- import --connect 'jdbc:mysql://dbhost/dbname?characterEncoding=utf8&dontTrackOpenResources=true&defaultFetchSize=1000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8' --table users --incremental lastmodified --check-column n_last_updated --username username --password password --merge-key user_id --mysql-delimiters --as-parquetfile --hive-import --warehouse-dir /usr/hive/warehouse/ --hive-table users_test_hive.
Error while running it.
16/02/27 21:33:17 INFO mapreduce.Job: Task Id : attempt_1454936520418_0239_m_000000_1, Status : FAILED
Error: parquet.column.ParquetProperties.newColumnWriteStore(Lparquet/schema/MessageType;Lparquet/column/page/PageWriteStore;I)Lparquet/column/ColumnWriteStore;

Related

sqoop export for hive views

I am trying to sqoop hive view to SQL server database however i'm getting "object not found error". Does sqoop export works for hive views?
sqoop export --connect 'jdbc:jtds:sqlserver:<Connection String>' --table 'tax_vw' --hcatalog-database default --hcatalog-table tax_vw --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXX --password YYY --update-mode allowinsert
INFO hive.metastore: Connected to metastore. ERROR tool.ExportTool:
Encountered IOException running export job: java.io.IOException:
NoSuchObjectException(message:default.tax_vw table not found)
Need help on this.
Unfortunately, this is not possible to do using sqoop export, even if --hcatalog-table specified, it works only with tables and if not in HCatalog mode, it supports only exporting from directories, also no queries are supported in sqoop-export.
You can load your view data into table:
create table tax_table as select * from default.tax_vw;
And use --hcatalog-table tax_table

sqoop all table from mysql to Hive import

I am trying to import all tables from mysql schema to hive by using blow sqoop query:-
sqoop import-all-tables --connect jdbc:mysql://ip-172-31-20-247:3306/retail_db --username sqoopuser -P --hive-import --hive-import --create-hive-table -m 3
it is saying ,
18/09/01 09:24:52 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
hdfs://ip-172-31-35-141.ec2.internal:8020/user/kumarrupesh2389619/categories already exists
Run the below command
hdfs dfs -rmr /user/kumarrupesh2389619/categories
Your command is failing since the directory already exists.

sqoop import to hive failing with java.net.UnknownHostException

I have a sqoop import script to load data from Oracle to Hive.
Query
sqoop-import -D mapred.child.java.opts="-Djava.security.egd=file:/dev /../dev/abc" -D mapreduce.job.queuename="queue_name" \
--connect ${jdbc_url} \
--username ${username} \
--password ${password} \
--query "$query" \
--target-dir ${target_hdfs_dir} \
--delete-target-dir \
--fields-terminated-by ${hiveFieldsDelimiter} \
--hive-import ${o_write} \
--null-string '' \
--hive-table ${hiveTableName} \
$split_opt \
$numMapper\
Logs
18/02/01 12:38:45 DEBUG hive.TableDefWriter: Create statement: CREATE
TABLE IF NOT EXISTS db_test.test ( USR_ID STRING, ENT_TYPE
STRING, VAL STRING) COMMENT 'Imported by sqoop on 2018/02/01
12:38:45' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES
TERMINATED BY '\012' STORED AS TEXTFILE 18/02/01 12:38:45 DEBUG
hive.TableDefWriter: Load statement: LOAD DATA INPATH
'hdfs://nn-sit/user/test_tmp' OVERWRITE INTO TABLE db_test.test
18/02/01 12:38:45 INFO hive.HiveImport: Loading uploaded data into
Hive 18/02/01 12:38:45 DEBUG hive.HiveImport: Using in-process Hive
instance. 18/02/01 12:38:45 DEBUG util.SubprocessSecurityManager:
Installing subprocess security manager
Logging initialized using configuration in
jar:file:/path/demoapp/lib/demoapp-0.0.1-SNAPSHOT.jar!/hive-log4j.properties
OK Time taken: 2.189 seconds Loading data to table db_test.test
Failed with exception java.net.UnknownHostException: nn-dev FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
Observations:
We recently migrated to new cluster. The new namenode is nn-sit,
earlier it was nn-dev. Can somebody enlighten me where from Sqoop is
reading the older namenode name nn-dev as shown in the error
message:
Failed with exception java.net.UnknownHostException: nn-dev
It is successful in importing the data from Oracle to the target
hdfs path: hdfs://nn-sit/user/test_tmp. However it is failing while
loading to the Hive table.
The below individual command is successful from beeline.
LOAD DATA INPATH 'hdfs://nn-sit/user/test_tmp' OVERWRITE INTO TABLE db_test.test
Any help would be greatly appreciated.

Sqoop import from oracle to hdfs: No more data to read from socket

I'm trying to import data from Oracle to HDFS using Sqoop. Oracle version: 10.2.0.2
Table is not having constraints. When I mention number of mappers(-m) and --split-by parameters, it's showing the error: No more data to read from socket. If I mention -m 1(setting the number of mappers as 1), it's running, but taking too much time.
Sqoop command:
sqoop import --connect jdbc:oracle:thin:#host:port:SID --username uname --password pwd --table abc.market_price --target-dir /ert/etldev/etl/market_price -m 4 --split-by MNTH_YR
Please help me.
Instead of giving the num of mappers why dont you try using --direct ..
what does it show then??
sqoop import --connect jdbc:oracle:thin:#host:port:SID --username uname --password pwd --table abc.market_price --target-dir /ert/etldev/etl/market_price --direct
or
sqoop import --connect jdbc:oracle:thin:#host:port:SID --username uname --password pwd --table abc.market_price --target-dir /ert/etldev/etl/market_price --split-by MNTH_YR --direct

Sqoop command-How to give schema name for Import All Tables

I am importing all tables from rdbms to hive using the sqoop command(v1.4.6).
Below is the command
sqoop-import-all-tables --verbose --connect jdbcconnection --username user --password pass --hive-import -m 1
This command works fine and it is loading all the tables in default schema.Is there a way to load the tables in particular schema?
Regards
Prakash
Use --hive-database <db name> in your import query.
Modified command:
sqoop-import-all-tables --verbose --connect jdbcconnection --username user --password pass --hive-import --hive-database new_db -m 1