sqoop-export is failing when I have \N as data - hive

Iam getting below error when I run my sqoop export command.
This is my content to be exported by sqoop command
00001|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N
00002|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N
sqoop command
sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema
15/06/09 08:05:16 INFO mapreduce.Job: Task Id :
attempt_1431442954745_1210_m_000001_0, Status : FAILED Error:
java.io.IOException: Can't export data, please check failed map task
logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
at duser.__loadFromFields(duser.java:690)
at duser.parse(duser.java:558)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:202)
at duser.__loadFromFields(duser.java:627)
Can you help me resolve it ?

Try adding these arguments to the export statement
--input-null-string "\\\\N" --input-null-non-string "\\\\N"
From the documentation:
If --input-null-string is not specified, then the string "null" will
be interpreted as null for string-type columns. If
--input-null-non-string is not specified, then both the string "null" and the empty string will be interpreted as null for non-string
columns.
If you don't add those arguments, it won't be able to understand that the \N in your data is actually null.

The problem seems to be the order in which columns are being imported. Sqoop doesn't automatically understand the column mapping. Try using --columns argument to specify the order the columns appear in. Here's how to use it:
sqoop export --connect jdbc:postgresql://10.11.12.13:5432/reports ... --columns col1,col2,col3,...
See http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_purpose_4 for documentation on how to use --columns.

Related

Sqoop Fails to Import from Postgres to S3

I import data from Postgresql to hdfs and hdfs to S3 in my daily operation. (sqoop import [postgres to hdfs] & distcp [from hdfs to s3])
I wanted to remove intermediate step (hdfs) and directly import data to S3 bucket by using sqoop.
However, same sqoop string fails in the end of the import operation.
sqoop import
-Dmapreduce.map.memory.mb="8192"
-Dmapreduce.map.java.opts="-Xmx7200m"
-Dmapreduce.task.timeout=0
-Dmapreduce.task.io.sort.mb="2400"
--connect $conn_string$
--fetch-size=20000
--username $user_name$
--p $password$
--num-mappers 20
--query "SELECT * FROM table1 WHERE table1.id > 10000000 and table1.id < 20000000 and \$CONDITIONS"
--hcatalog-database $schema_name$
--hcatalog-table $table_name$
--hcatalog-storage-stanza "STORED AS PARQUET LOCATION s3a://path/to/destination"
--split-by table1.id
I also tried --target-dir s3a://path/to/destination instead of ....... LOCATION s3a://path/to/destination
After "mapping: %100 completed" it throws error message below:
Error: java.io.IOException: Could not clean up TaskAttemptID:attempt_1571557098082_15536_m_000004_0#s3a://path/to/destination_DYN0.6894861001907555/ingest_day=__HIVE_DEFAULT_PARTITION__
at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:83)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:145)
at org.apache.hadoop.mapred.Task.commit(Task.java:1200)
at org.apache.hadoop.mapred.Task.done(Task.java:1062)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Could not rename
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/_temporary/attempt_1571557098082_15536_m_000004_0
to
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/task_1571557098082_15536_m_000004
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:579)
at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
at org.apache.hive.hcatalog.mapreduce.DynamicPartitionFileRecordWriterContainer$1.commitTask(DynamicPartitionFileRecordWriterContainer.java:125)
at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:80)
... 9 more```
Sqoop tries to rename the S3 folder after import job is completed which is not possible due to the architecture of S3 which is object storage.
I found out that this issue has been resolved in Sqoop 1.4.7.*

How to locate/export Hive query?

I am new at Hive and am attempting to export a hive query to a local file on my computer that way I can import results to excel.
When I do from inside hive;
hive -e select * from TABLE limit 10'>output.txt;
I get "FAILED: ParseException line 1:0 cannot recognize input near 'hive' '-' 'e'"
when I do
hive -S -e "USE DATABASE; select * from TABLE limit 10" > /tmp/test/test.csv;
from shell OR
insert overwrite local directory '/tmp/hello'
select * from TABLE limit 10;
It goes to the hdfs system in Hive -- how do I get this to my local machine?
You can export query to CSV file like:
hive -e 'select * from your_Table' > /home/yourfile.csv
to get this file to your local machine, you should use HDFS:
HDFS DFS -get /tmp/hello /PATHinLocalMachine
Check out this Question
You are seeing the error as you are running the hive -e commands in the hive repl as show below
hive (venkat)> hive -e 'select * from a';
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1084)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1219)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1260)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1156)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1146)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:168)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:379)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:739)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 1:0 cannot recognize input near 'hive' '-' 'e'
you have to do it in the OS shell as shown below
[venkata_udamala#gw02 ~]$ hive -e 'use database_name;select * from table_name;' > temp.txt

Import data from sqoop to hive

sqoop import –connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username=retail_dba –password=cloudera –table export1 –hive-import \
–hive-table export_3 –create-hive-table –fields-terminated-by “|” \
–lines-terminated-by “\n” –null-string nvl –null-non-string -2 –outdir java_files
If I use the above command it gives an error that
either use split by or -m 1 for sequential import
when I used split-by it ignored null values and imported other into hive
Can you explain the reason?
Thanks
Varun
The NULL value issues you are getting are not related to split-by.
Sqoop will by default import NULL values as string null. Hive is however using string \N to denote NULL values and therefore predicates dealing with NULL (like IS NULL) will not work correctly. You should append parameters --null-string and --null-non-string in case of import job or --input-null-string and --input-null-non-string in case of an export job if you wish to properly preserve NULL values. Because sqoop is using those parameters in generated code, you need to properly escape value \N to \N:
$ sqoop import ... --null-string '\\N' --null-non-string '\\N'

Using Sqoop1 with SAP Hana using a table name that contains forward slash '/' causes error

Trying to import data from SAP Hana using table that contains a forward slash '/'. Not sure if escaping the '/' will work.
My connection attempt:
sqoop import –connect jdbc:sap://mysaphost:30015 --driver com.sap.db.jdbc.Driver --username xxxxxx --password xxxx --table xxx./xxx/xxx
Produces the following error:
2016-05-20 13:12:23,098 ERROR - [main:] ~ Error executing statement: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "/": line 1 col 24 (at pos 24) (SqlManager:43)
com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "/": line 1 col 24 (at pos 24)
In order to use object names with slashes (or other special characters) you need to enclose them into double quotation marks (").
As you seem to use a command line interface and want to pass the table name as an argument, you most likely have to escape those quotation.
Try something like
sqoop import –connect jdbc:sap://mysaphost:30015 --driver com.sap.db.jdbc.Driver
--username xxxxxx --password xxxx --table \"xxx./xxx/xxx\"
(still just one line!) and see how that goes.
Not sure, But maybe you can just try to enclose table name with double quotations (")
sqoop import –connect jdbc:sap://mysaphost:30015 --driver com.sap.db.jdbc.Driver --username xxxxxx --password xxxx --table xxx."/xxx/xxx"
Can you try something like --table "XXX".\" XYZ\"
Updated to latest version of JDBC driver for SAP Hana.

Sqoop Export with Missing Data

I am trying to use Sqoop to export data from HDFS into Postgresql. However, I receive an error partially through the export that it can't parse the input. I manually went into the file I was exporting and saw that this row had two columns missing. I have tried a bunch of different arguments with the Sqoop command, but cannot get it to work. Here is what I was running thus far:
sqoop export --connect jdbc:postgresql://localhost:5432/XX -username
XX -password XX --table XX --input-fields-terminated-by
"\t" --input-lines-terminated-by "\n" --input-null-string '\n' --input-null
non-string '\n' -m 1 --export-dir /user/dan/output
I have also tried it without the "--input-null-string" and "--input-null-non-string" args and got the same result. My table has 6 columns and the file I am reading has tab separated values that are inserted into the table if all 6 are there. Any help would be appreciated.
I solved the problem by changing my reduce function so that if there were not the correct amount of fields to output a certain value and then I was able to use the --input-null-non-string with that value and it worked.