I am getting null rows in hive after sqoop import from oracle to hive
in sqoop --query, I mentioned where pk is not null .
sqoop query :
sqoop import \
--connect "${SQOOP_CONN_STR}" \
--connection-manager "${SQOOP_CONNECTION_MANAGER}" \
--username ${SQOOP_USER} \
--password ${SQOOP_PASSWORD} \
--fields-terminated-by ${SQOOP_DELIM} \
--null-string '' \
--null-non-string '' \
--query \""${SQOOP_QUERY}"\" \
--target-dir "${SQOOP_OP_DIR}" \
--split-by ${SQOOP_SPLIT_BY} \
-m ${SQOOP_NUM_OF_MAPPERS} 1> ${SQOOP_TEMP_LOG}
It is due to change in field delimiter.
You are importing in HDFS without specifying any field delimiter. So, it will use default comma.
Hive table you created might have CTRL^A(default) as field delimiter.
Make these in sync, it should work.
Related
I want to import a Netezza table in a specific non-default schema with the following command:
sqoop import \
--connect jdbc:netezza://netezza-host-name:5480/NZDATABASE \
--table MY_SCHEMA.MY_TABLE \
--username user \
-P \
--hive-import \
--hive-database demo \
--create-hive-table \
--hive-table MY_NEW_TABLE
This however fails because because it checks only the default schema "ADMIN":
org.netezza.error.NzSQLException: ERROR: relation does not exist NZDATABASE.ADMIN.MY_SCHEMA.MY_TABLE
versions:
Sqoop 1.4.7
nzjdbc.jar release 7.2.1.8 driver
I don't have the environment to test it but two more options which you can try:
Specify the schema name in the JDBC connection string:
jdbc:netezza://netezza-host-name:5480/NZDATABASE?currentSchema=MY_SCHEMA
or
jdbc:netezza://netezza-host-name:5480/NZDATABASE?searchpath=MY_SCHEMA
or
jdbc:netezza://netezza-host-name:5480/NZDATABASE;schema=MY_SCHEMA
Pass the --schema argument into the subsystem:
`-- --schema MY_TABLE` - two sets of '--' are needed there
so the full command will be the following:
sqoop import \
--connect jdbc:netezza://netezza-host-name:5480/NZDATABASE \
--table MY_TABLE \
-- --schema MY_TABLE \
--username user \
-P \
--hive-import \
--hive-database demo \
--create-hive-table \
--hive-table MY_NEW_TABLE
sqoop import-all-tables into hive with default database works fine but Sqoop import-all-tables into hive specified database is not working.
As --hive-database is depreciated how to specify database name
sqoop import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password XXX \
--hive-import \
--create-hive-table
The above code creates tables in /user/hive/warehouse/ i.e default directory
How to import all tables into /user/hive/warehouse/retail.db/
you can set the HDFS path of your database using the option --warehouse-dir.
The next example worked for me:
sqoop import-all-tables \
--connect jdbc:mysql://localhost:3306/retail_db \
--username user \
--password password \
--warehouse-dir /apps/hive/warehouse/lina_test.db
--autoreset-to-one-mapper
I am facing issues while importing Table from postgresql to hive. Query I am using is :
sqoop import \
--connect jdbc:postgresql://IP:5432/PROD_DB \
--username ABC_Read \
--password ABC#123 \
--table vw_abc_cust_aua \
-- --schema ABC_VIEW \
--target-dir /tmp/hive/raw/test_trade \
--fields-terminated-by "\001" \
--hive-import \
--hive-table vw_abc_cust_aua \
--m 1
Error I am getting
ERROR tool.ImportTool: Error during import: No primary key could be found for table vw_abc_cust_aua. Please specify one with --split-by or perform a sequential import with '-m 1'.
PLease let me know what is wrong with my query
I am considering -- --schema ABC_VIEW is a typo error, it should be --schema ABC_VIEW
The other issue is the option to provide number of mapper is either -m or --num-mappers and not --m
Solution
in you script change --m to -m or --num-mappers
I'm trying to do a Sqoop incremental import to a Hive table using "--incremental append".
I did an initial sqoop import and then create a job for the incremental imports.
Both are executed successfully and new files have been added to the same original Hive table directory in HDFS, but when I check my Hive table, the imported observations are not there. The Hive table is equal before the sqoop incremental import.
How can I solve that?
I have about 45 Hive tables and would like to update them daily automatically after the Sqoop incremental import.
First Sqoop Import:
sqoop import \
--connect jdbc:db2://... \
--username root \
-password 9999999 \
--class-name db2fcs_cust_atu \
--query "SELECT * FROM db2fcs.cust_atu WHERE \$CONDITIONS" \
--split-by PTC_NR \
--fetch-size 10000 \
--delete-target-dir \
--target-dir /apps/hive/warehouse/fcs.db/db2fcs_cust_atu \
--hive-import \
--hive-table fcs.cust_atu \
-m 64;
Then I run Sqoop incremental import:
sqoop job \
-create cli_atu \
--import \
--connect jdbc:db2://... \
--username root \
--password 9999999 \
--table db2fcs.cust_atu \
--target-dir /apps/hive/warehouse/fcs.db/db2fcs_cust_atu \
--hive-table fcs.cust_atu \
--split-by PTC_NR \
--incremental append \
--check-column TS_CUST \
--last-value '2018-09-09'
It might be difficult to understand/answer your question without looking at your full query because your outcome also depends on your choice of arguments and directories. Mind to share your query?
sqoop import \
--connect jdbc:oracle:thin:#db1icvp.supermedia.com:1521:ICVP \
--username=USER --password=password \
--table table_name \
--split-by col_name --verbose \
--hive-import \
--hive-overwrite \
--hive-table schema .table_name \
--hive-home /path
Using above command I can create internal table only.
But my requirement is to create external table in a specific location using above commands..
Please suggest.