Getting error message Unexpected character ' ' when running LOAD command in PIG - apache-pig

I have a file stored in HDFS at this path: /user/hdfs/countries
(the file is in comma separated format).
To import this HDFS data into PIG I ran the below command in PIG:
test = load ‘/ user/hdfs/countries’ using PigStorage(',') as (id:int, Name:chararray, Language:chararray);
where,
ID: is the primary key column in HDFS file
Name and Language are the column names in HDFS file
I am getting below error when I run the above mentioned pig command:
Pig Stack Trace
ERROR 1200: <line 1, column 12> Unexpected character ''
Failed to parse: <line 1, column 12> Unexpected character ''
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:243)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:541)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Can someone please help me with this? Is my command incorrect or any jar file is missing?
Thank you in advance!

It tells you exactly where the problem is: the ‘ should be replaced by ' which is not the same character.
Also, the space after the / seems fishy.

Related

Spark parquet reading error

I am working on a Spark project, Here i had one file which is in parquet format when I try to load this file using java it gives me the below error. But when I loaded the same file in hive with the same path and write a query select * from table_name, so its working fine and data is also coming properely. Please help me regarding this issue.
java.io.IOException: Could not read footer:
java.lang.RuntimeException: corrupted file: the footer index is not
within the file at
org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:247)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$28.apply(ParquetRelation.scala:754)
at
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$28.apply(ParquetRelation.scala:743)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710)
at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:710)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:88) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.lang.RuntimeException: corrupted file: the footer index is not
within the file at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:427)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:237)
at
org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:233)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
You can try below options
1) sqlContext.read.parquet("path")
2) sqlContext.read.format(fileFormat)
.option("header", header) // Use first line of all files as header
.option("inferSchema", inferSchema) // Automatically infer data types
.load(source)
If your issue didn't resolved, please post the sample of code.

how to remove headers using piggybank?

I have a directory which contains 10 files and I want to remove the headers from the files present in directory and while executing using piggybank, I am getting an error. Is there any other way which can remove header from all the files present in a directory.My code is:-
REGISTER /usr/lib/pig/piggybank.jar;
input = LOAD 'insurance_data' using CSVExcelStorage(
',','default','NOCHANGE','SKIP_INPUT_HEADER')
as (population:int, private:int,public:int,uninsecured:int);
dump input;
The error which I am getting is :-
2016-09-13 14:01:48,239 [main] ERROR org.apache.pig.PigServer -
exception during parsing: Error during parsing. mismatched input 'input' expecting EOF Failed to parse:
mismatched input 'input' expecting
EOF at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688) at
org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421) at
org.apache.pig.PigServer.parseAndBuild(PigServer.java:354) at
org.apache.pig.PigServer.executeBatch(PigServer.java:379) at
org.apache.pig.PigServer.executeBatch(PigServer.java:365) at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at
org.apache.pig.Main.run(Main.java:613) at
org.apache.pig.Main.main(Main.java:158) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136) 2016-09-13
14:01:48,250 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1200: mismatched input 'input'
expecting EOF Details at logfile: /home/cloudera/pig_1473800504430.log
'input' is a keyword.Reference here.Change the name of the relation 'input' to something else.
REGISTER /usr/lib/pig/piggybank.jar;
A = LOAD 'insurance_data' USING CSVExcelStorage(',','default','NOCHANGE','SKIP_INPUT_HEADER') as (population:int, private:int,public:int,uninsecured:int);
DUMP A;

Something rarely in Pig, cloudera quickstart

I do not understand because when I run a pig script in the editor,
a workflow is created in ozzie and also three jobs like image , rather than simply running the script like in hive.
Image
entrada = LOAD '/user/cloudera/Divisas/Barril-WTI.csv' using PigStorage (',') AS (Fecha:chararray, Valor: float);
entrada_sin_cabecera = filter entrada by Fecha != 'Date';
orden = order entrada_sin_cabecera by Valor;
dump orden;
Also gives me the following error when running: pig -f WTI.pig
Error before Pig is launched
ERROR 2997: Encountered IOException. File WTI.pig does not exist
java.io.FileNotFoundException: File WTI.pig does not exist at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:424)
at
org.apache.pig.impl.io.FileLocalizer.fetchFilesInternal(FileLocalizer.java:747)
at
org.apache.pig.impl.io.FileLocalizer.fetchFile(FileLocalizer.java:688)
at org.apache.pig.Main.run(Main.java:424) at
org.apache.pig.Main.main(Main.java:158) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136)

sqoop-export is failing when I have \N as data

Iam getting below error when I run my sqoop export command.
This is my content to be exported by sqoop command
00001|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N
00002|Content|1|Content-article|\N|2015-02-1815:16:04|2015-02-1815:16:04|1 |\N|\N|\N|\N|\N|\N|\N|\N|\N
sqoop command
sqoop export --connect jdbc:postgresql://10.11.12.13:1234/db --table table1 --username user1 --password pass1--export-dir /hivetables/table/ --fields-terminated-by '|' --lines-terminated-by '\n' -- --schema schema
15/06/09 08:05:16 INFO mapreduce.Job: Task Id :
attempt_1431442954745_1210_m_000001_0, Status : FAILED Error:
java.io.IOException: Can't export data, please check failed map task
logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
at duser.__loadFromFields(duser.java:690)
at duser.parse(duser.java:558)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:202)
at duser.__loadFromFields(duser.java:627)
Can you help me resolve it ?
Try adding these arguments to the export statement
--input-null-string "\\\\N" --input-null-non-string "\\\\N"
From the documentation:
If --input-null-string is not specified, then the string "null" will
be interpreted as null for string-type columns. If
--input-null-non-string is not specified, then both the string "null" and the empty string will be interpreted as null for non-string
columns.
If you don't add those arguments, it won't be able to understand that the \N in your data is actually null.
The problem seems to be the order in which columns are being imported. Sqoop doesn't automatically understand the column mapping. Try using --columns argument to specify the order the columns appear in. Here's how to use it:
sqoop export --connect jdbc:postgresql://10.11.12.13:5432/reports ... --columns col1,col2,col3,...
See http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_purpose_4 for documentation on how to use --columns.

pig file load error

I am trying to run this commang over pig env.
grunt> A = LOAD inp;
But I am getting this error in the log files:
Pig Stack Trace:
ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Failed to parse: mismatched input 'inp' expecting QUOTEDSTRING
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:226)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
And in console Iam getting like this:
grunt> A = LOAD inp;
2012-10-26 12:18:34,627 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Details at logfile: /usr/local/hadoop/pig_1351232517175.log
Can any body provide me appropriate solution for this?
The syntax for load has been used wrongly. Check out the correct example provided herewith.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#LOAD
Suppose we have a data file called myfile.txt. The fields are tab-delimited. The records are newline-separated.
1 2 3
4 2 1
8 3 4
In this example the default load function, PigStorage, loads data from myfile.txt to form relation A. The two LOAD statements are equivalent. Note that, because no schema is specified, the fields are not named and all fields default to type bytearray.
A = LOAD 'myfile.txt';
A = LOAD 'myfile.txt' USING PigStorage('\t');
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
Example from http://pig.apache.org/docs
I believe the error log is self explanatory, it says - expecting QUOTEDSTRING
Please put the file name in single quotes to solve this issue.