Error during parsing. Encountered " <IDENTIFIER> "Error "" at line 1, column 2 - apache-pig

I have a simple pig script used to import data file.
My data file is located in : /home/fs188
It's a csv file which contains data as :
011958029,00000024,,1,20100209,1
011951228,00000036,,1,20100209,1
011964431,00000814,,1,20100227,1
003526500,00000863,,1,20080122,1
011950864,00001478,,1,20100209,1
011999168,00002495,X0,1,20100331,0
001684881,00002641,,1,19861126,1
001677981,00003165,,1,19861119,1
001677457,00003311,,1,19870114,1
001677161,00003440,,1,19870116,1
002594705,00003475,,1,19870122,1
011958074,00004327,,1,20100210,1
I just want to execute my script pig named PigScript and test it in local mode.
It contains this code :
ENEE_ENR_FILTER = LOAD '/home/fs188/DataExempleUdf.csv' USING PigStorage(',') AS (idt_gcp:chararray,idt_ent_pse:chararray,cd_not:chararray,idc_pse_pci:chararray,da_pram_ett:chararray,idc_cd_not:chararray);
DUMP ENEE_ENR_FILTER;
So I call my script :
pig -x local PigScript.pig
I get this error :
2019-08-07 12:03:14,277 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 1000: Error during parsing. Encountered " "-x "" at line 1, column 2.
This is weird because I don't have any synthax error

Related

Load File delimited by double colon :: in pig

Following is a sample dataset delimited by double colon(::).
1::Toy Story (1995)::Animation|Children's|Comedy
I want to extract three fields from above data set as movieID,title and genre. I have written following code for that
movies = LOAD 'location/of/dataset/on/hdfs '
using PigStorage('::')
as
(MovieID:int,title:chararray,genre:chararray);
But i am getting following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file script.pig, line 1, column 9> pig script failed to validate:
java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[::]'
Use MyRegExloader: You will need piggybank.jar for this.
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('([^\\:]+)::([^\\:]+)::([^\\:]+)')
as (movieid:int, title:chararray, genre:chararray);
Output :
(1,Toy Story (1995),Animation|Children's|Comedy)

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Hi , Google big query - bq fail load display file number how to get the file name

I'm running the following bq command
bq load --source_format=CSV --skip_leading_rows=1 --max_bad_records=1000 --replace raw_data.order_20150131 gs://raw-data/order/order/2050131/* order.json
and
getting the following message when loading data into bq .
*************************************
Waiting on bqjob_r4ca10491_0000014ce70963aa_1 ... (412s) Current status: DONE
BigQuery error in load operation: Error processing job
'orders:bqjob_r4ca10491_0000014ce70963aa_1': Too few columns: expected
11 column(s) but got 1 column(s). For additional help: http://goo.gl/RWuPQ
Failure details:
- File: 844 / Line:1: Too few columns: expected 11 column(s) but got
1 column(s). For additional help: http://goo.gl/RWuPQ
**********************************
The message display only the file number .
checked the files content most of them are good .
gsutil ls and the cloud console on the other hand display file names .
how can I know which file is it according to the file number?
There seems to be some weird spacing introduced in the question, but if the desired path to ingest is "/order.json" - that won't work: You can only use "" at the end of the path when ingesting data to BigQuery.

End of line anchor $ throwing error in Hue Pig Editor

Shen I use the below regex from Hue's pig editor I get an error.
REGEX_EXTRACT(cs, '((;|^)u=(.*?)([^0-9a-z-A-Z]|\\$))', 3)
It looks like an old issue with prior version of Hue but looks like they have resolved in 3.6 version.
https://issues.cloudera.org/browse/HUE-1958
I am running Hue 3.6 and pig 0.12 but still getting the same error. Can someone help out. Thx.
Error:
2014-11-20 08:18:29,282 [main] ERROR org.apache.pig.impl.PigContext - Encountered " <OTHER> ")=$ "" at line 1, column 1.
Was expecting one of:
<EOF>
<IDENTIFIER> ...
<COMMENT> ...
2014-11-20 08:18:29,289 [main] ERROR org.apache.pig.Main - ERROR 2997: Encountered IOException. org.apache.pig.tools.parameters.ParseException: Encountered " <OTHER> ")=$ "" at line 1, column 1.
Was expecting one of:
<EOF>
<IDENTIFIER> ...
<COMMENT> ...

Syntax error when storing Pig output

I am having some issues with storing my pig output to a file. This is what I am using to store:
'STORE rel INTO 'simple'; '
If I Dump 'rel' I get:
>(car,0.5,(door,tire,jello,truck,random))
(toy,0.5,(jeep,bunny toy))
(door,0.5,(car,jello,random))
>(jeep,0.5,(toy,bunny toy))
What I get in the file is:
<Yulias-MacBook-Pro:~ yuliatolskaya$ /Users/yuliatolskaya/Documents/misc/pig_clustering/simple/part-r-00000 ; exit;
/Users/yuliatolskaya/Documents/misc/pig_clustering/simple/part-r-00000: line 1: syntax error near unexpected token `('
/Users/yuliatolskaya/Documents/misc/pig_clustering/simple/part-r-00000: line 1: `car 0.5 (door,tire,jello,truck,random)'
logout
[Process completed] >
I am really not sure what the problem is, as there are no errors in the logfiles...Please Help!