Extra backslash in Sqoop import result - hive

Currently i'm using Sqoop to import data from HP Vertica database to Hive, for some column with special character, the result is different from the data in the Vertica DB, here is the code:
sqoop import --driver com.vertica.jdbc.Driver --connect jdbc:vertica://db.foo.com/corp \
--username xx --P --where 'SRC_SYS_CD=xxx' --null-string '\\N' --null-non-string '\\N' \
--m 1 --fields-terminated-by '\001' --hive-drop-import-delims --table addr \
--target-dir /xxxx/addr
Data in vertica DB:
SRC_SYS_CD CTRY_CD ADDR_ID ADDR_TYP_CD ADDR_STR_1_LG_NM
123456 NZ 107560 NULL C\ - 108 Waiatarua Road
Data showed in Hive DB:
SRC_SYS_CD CTRY_CD ADDR_ID ADDR_TYP_CD ADDR_STR_1_LG_NM
123456 NZ 107560 NULL C\\ - 108 Waiatarua Road
The only difference is in column ADDR_STR_1_LG_NM, which after sqoop importing, one backslash(\) was added. While other column that does not have a backslash (\) was not changed.
Since there is NULL in vertica, we must use --null-string '\\N' --null-non-string '\\N'.
I've tried some other options like:
--escaped-by \\ --optionally-enclosed-by '\"'
But that doesn't work.

For DBs that sqoop supports direct connect, use --direct and remove --hive-drop-import-delims will import data as-is.
This link lists DB that sqoop supported
While i've confirmed Vertica is supported with direct connect but not listed.

Related

SQL queries in BASH script

I am using Hadoop to execute my queries.
What I want is using BASH variables within my query. Here is an example :
export month="date +%m"
export year="date +%Y"
beeline -u jdbc:hive2://clustername.azurehdinsight.net:443/tab'
-n myname -e "select * from mytable where month = '$month' and
year = '$year';"
But the query is empty so that in reality, it's not the case within Hive.
select * from mytable where month = '$month' and
year = '$year';
is not an empty query in Hive.
Is there a problem in my bash script ?
You need execute date command using $(), change
export month="date +%m"
export year="date +%Y"
with
export month=$(date +%m)
export year=$(date +%Y)
You can use hivevar arguments with beeline
beeline -u jdbc:hive2://clustername.azurehdinsight.net:443/tab \
-n myname \
--hivevar month=$month \
--hivevar year=$year \
-e "select * from mytable where month = '${hivevar:month}' and year = '${hivevar:year}';"

unable to sqoop data from oracle data source using split by on date column

query:
sqoop import --connect "*****" \
--username ****
-P ******
--query "select * from table_name where trunc(date_column)>=ADD_MONTHS(TRUNC(sysdate,'YEAR'),-12) and \$CONDITIONS" \
--split-by date_column \
-m 4
error:
error Image
the error is caused by java.sql.SQL.DataException: ORA-01861: literal does not match string format. Please find the attached image for more description of error.
Looks like an issue with the format of the split-by column. Try formatting the split-by column as: --split-by "to_number(to_char(date_column, 'YYYYMMDDHHMISS'))"
or to whatever format is required.

NO data for the hive table created from avsc schema

I imported data from mysql to hdfs as avroformat. I moved the .avsc file to hdfs and creating hive table by using the .avsc file. Please see the table below:
.AVSC file : -rw-r--r-- 3 jonnavithulasivakrishna hdfs 1041 2017-09-13 00:05 hdfs://nn01.itversity.com:8020/user/jonnavithulasivakrishna/products.avsc
Table created :
hive (siv_sqoop_import)> CREATE EXTERNAL TABLE Products_1
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> location '/user/jonnavithulasivakrishna/products'
> TBLPROPERTIES('avro.schema.url'='hdfs://nn01.itversity.com:8020/user/jonnavithulasivakrishna/products.avsc');
OK
Time taken: 0.155 seconds
hive (siv_sqoop_import)> select * from Products_1 limit 10;
OK
Time taken: 0.294 seconds
As you see, it has created with no data.Could you please help me why I'm not getting data in this table.
Please find the sqoop command below :
sqoop import \
--connect "jdbc:mysql://nn01.itvserity.com:3306/retail_db" \
--username retail_dba -P \
--table products \
--as-avrodatafile \
--num-mappers 6 \
--target-dir "/user/jonnavithulasivakrishna/products" \

Does HCatalog supports incremental import with dynamic partition in target table in Sqoop?

If I execute below command, it works fine -
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/test \ --driver com.mysql.jdbc.Driver \ --username xxxx --password xxxx \ --query 'select execution_id, project_id, project_name, flow_name, job_id, start_time, end_time, update_time FROM metrics WHERE $CONDITIONS' \ --split-by project_id \ --hcatalog-table metrics;
But when I include incremental parameters
sqoop import --connect jdbc:mysql://sandbox.hortonworks.com:3306/test \
--driver com.mysql.jdbc.Driver \
--username xxxx --password xxxx \
--query 'select execution_id, project_id, project_name, flow_name, job_id, start_time, end_time, update_time FROM metrics WHERE $CONDITIONS' \
--check-column update_time \
--incremental lastmodified \
--last-value 0 \
--split-by project_id \
--hcatalog-table metrics;
It gives below Error message -
ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.hadoop.fs.FileSystem.fixRelativePart(FileSystem.java:2207)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1310)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:320)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

Sqoop Incremental Import Error using Date Column

I'm trying to incrementally import using sqoop . It is working for the column id but when I'm trying to do the same thing in date column it is showing an error , the date column in mysql is in varchar format .
code :
sqoop import --connect jdbc:mysql://$hostName/$dbName --username $userName --password $pass --query "select * , 'username' as user_name, 'date' as created_date from $tableName WHERE $cloumnName between ${values[0]} and ${values[1]} AND \$CONDITIONS" --target-dir outputPath --append --m mapperNo --split-by splitByValue --check-column cloumnName --incremental $imode
Error :
16/12/20 01:18:46 ERROR tool.ImportTool: Error during import: Character column (date_field) can not be used to determine which rows to incrementally import.