Import Data from Postgresql to Hive - hive

I am facing issues while importing Table from postgresql to hive. Query I am using is :
sqoop import \
--connect jdbc:postgresql://IP:5432/PROD_DB \
--username ABC_Read \
--password ABC#123 \
--table vw_abc_cust_aua \
-- --schema ABC_VIEW \
--target-dir /tmp/hive/raw/test_trade \
--fields-terminated-by "\001" \
--hive-import \
--hive-table vw_abc_cust_aua \
--m 1
Error I am getting
ERROR tool.ImportTool: Error during import: No primary key could be found for table vw_abc_cust_aua. Please specify one with --split-by or perform a sequential import with '-m 1'.
PLease let me know what is wrong with my query

I am considering -- --schema ABC_VIEW is a typo error, it should be --schema ABC_VIEW
The other issue is the option to provide number of mapper is either -m or --num-mappers and not --m
Solution
in you script change --m to -m or --num-mappers

Related

Sqoop import from NON-DEFAULT schema Netezza

I want to import a Netezza table in a specific non-default schema with the following command:
sqoop import \
--connect jdbc:netezza://netezza-host-name:5480/NZDATABASE \
--table MY_SCHEMA.MY_TABLE \
--username user \
-P \
--hive-import \
--hive-database demo \
--create-hive-table \
--hive-table MY_NEW_TABLE
This however fails because because it checks only the default schema "ADMIN":
org.netezza.error.NzSQLException: ERROR: relation does not exist NZDATABASE.ADMIN.MY_SCHEMA.MY_TABLE
versions:
Sqoop 1.4.7
nzjdbc.jar release 7.2.1.8 driver
I don't have the environment to test it but two more options which you can try:
Specify the schema name in the JDBC connection string:
jdbc:netezza://netezza-host-name:5480/NZDATABASE?currentSchema=MY_SCHEMA
or
jdbc:netezza://netezza-host-name:5480/NZDATABASE?searchpath=MY_SCHEMA
or
jdbc:netezza://netezza-host-name:5480/NZDATABASE;schema=MY_SCHEMA
Pass the --schema argument into the subsystem:
`-- --schema MY_TABLE` - two sets of '--' are needed there
so the full command will be the following:
sqoop import \
--connect jdbc:netezza://netezza-host-name:5480/NZDATABASE \
--table MY_TABLE \
-- --schema MY_TABLE \
--username user \
-P \
--hive-import \
--hive-database demo \
--create-hive-table \
--hive-table MY_NEW_TABLE

Hive table outdated after Sqoop incremental import

I'm trying to do a Sqoop incremental import to a Hive table using "--incremental append".
I did an initial sqoop import and then create a job for the incremental imports.
Both are executed successfully and new files have been added to the same original Hive table directory in HDFS, but when I check my Hive table, the imported observations are not there. The Hive table is equal before the sqoop incremental import.
How can I solve that?
I have about 45 Hive tables and would like to update them daily automatically after the Sqoop incremental import.
First Sqoop Import:
sqoop import \
--connect jdbc:db2://... \
--username root \
-password 9999999 \
--class-name db2fcs_cust_atu \
--query "SELECT * FROM db2fcs.cust_atu WHERE \$CONDITIONS" \
--split-by PTC_NR \
--fetch-size 10000 \
--delete-target-dir \
--target-dir /apps/hive/warehouse/fcs.db/db2fcs_cust_atu \
--hive-import \
--hive-table fcs.cust_atu \
-m 64;
Then I run Sqoop incremental import:
sqoop job \
-create cli_atu \
--import \
--connect jdbc:db2://... \
--username root \
--password 9999999 \
--table db2fcs.cust_atu \
--target-dir /apps/hive/warehouse/fcs.db/db2fcs_cust_atu \
--hive-table fcs.cust_atu \
--split-by PTC_NR \
--incremental append \
--check-column TS_CUST \
--last-value '2018-09-09'
It might be difficult to understand/answer your question without looking at your full query because your outcome also depends on your choice of arguments and directories. Mind to share your query?

where is the default sqoop hive import destination directory? is it controllable?

I need to do a sqoop import all tables from an existing mysql database to hive, the first table is categories.
The command is as below:
sqoop import-all-tables -m 1 \
--connect=jdbc:mysql://ms.itversity.com/retail_db \
--username=retail_user \
--password=itversity \
--hive-import \
--hive-overwrite \
--create-hive-table \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_output0322
It failed for the following reason:
Output directory
hdfs://nn01.itversity.com:8020/user/paslechoix/categories already
exists
I am wondering how can I import them into /apps/hive/warehouse/paslechoix.db/
paslechoix is the hive database name.
UPDATE1 on 20180323 to Bala who commented at the first place:
I've updated the script to:
sqoop import-all-tables -m 1 \
--connect=jdbc:mysql://ms.itversity.com/retail_db \
--username=retail_user \
--password=itversity \
--hive-import \
--hive-overwrite \
--create-hive-table \
--hive-database paslechoix_new \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_output0323
added what you suggested: --hive-database paslechoix_new
paslechoix_new is a new hive database just created.
I still receive error of:
AlreadyExistsException: Output directory
hdfs://nn01.itversity.com:8020/user/paslechoix/categories already
exists
Now, it is really interesting, why it keeps referring to paslechoix? I already indicate in the script that the hive database is paslechoix_new, why it doesn't get recognized?
Update 2 on 20180323:
I took the other suggestion in Bala's comment:
sqoop import-all-tables -m 1 \
--connect=jdbc:mysql://ms.itversity.com/retail_db \
--username=retail_user \
--password=itversity \
--hive-import \
--hive-overwrite \
--create-hive-table \
--hive-database paslechoix_new \
--warehouse-dir /apps/hive/warehouse/paslechoix_new.db \
--compress \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--outdir java_output0323
So now the import doesn't throw error any more, however, I checked the hive database, all the tables are created, with no data
add the option --warehouse-dir to import into a specific directory
--warehouse-dir /apps/hive/warehouse/paslechoix.db/
if you want to import to specific hive database, then use
--hive-database paslechoix

can we do sqoop import into hive external table?

sqoop import \
--connect jdbc:oracle:thin:#db1icvp.supermedia.com:1521:ICVP \
--username=USER --password=password \
--table table_name \
--split-by col_name --verbose \
--hive-import \
--hive-overwrite \
--hive-table schema .table_name \
--hive-home /path
Using above command I can create internal table only.
But my requirement is to create external table in a specific location using above commands..
Please suggest.

How to stop nulls from sqoop import (oracle to hive)

I am getting null rows in hive after sqoop import from oracle to hive
in sqoop --query, I mentioned where pk is not null .
sqoop query :
sqoop import \
--connect "${SQOOP_CONN_STR}" \
--connection-manager "${SQOOP_CONNECTION_MANAGER}" \
--username ${SQOOP_USER} \
--password ${SQOOP_PASSWORD} \
--fields-terminated-by ${SQOOP_DELIM} \
--null-string '' \
--null-non-string '' \
--query \""${SQOOP_QUERY}"\" \
--target-dir "${SQOOP_OP_DIR}" \
--split-by ${SQOOP_SPLIT_BY} \
-m ${SQOOP_NUM_OF_MAPPERS} 1> ${SQOOP_TEMP_LOG}
It is due to change in field delimiter.
You are importing in HDFS without specifying any field delimiter. So, it will use default comma.
Hive table you created might have CTRL^A(default) as field delimiter.
Make these in sync, it should work.