Pig problem with load file with complicated name - apache-pig

i need to load file in pig which has a long and complicated name:
dealnews-2011-04-01T12:00:00:00.211-02:00.csv
Pig complained:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. java.net.URISyntaxException: Relative path in absolute URI:
anyone knows what's the problem? Thanks.

If it's forming a URI from that, the : is a reserved character.
Think about it: file://a:b ... this would be taken as an FTP login.
Your error message seems to complain that what's left after the string is parsed is a relative path (I guess 00.csv after the last colon). Obviously no longer the whole filename.
You will need to escape any reserved characters in the filename before forming a URI.
You could do this on the command line, with for example:
ls | sed -e 's/:/%3A/g'
to transform the colons in the filename.
Or you could rename any files in the directory that use any of ";?:#&=+,$"

not exactly the same case, but we got:
ERROR 2999: Unexpected internal error. java.net.URISyntaxException cannot be cast to java.lang.Error
java.lang.ClassCastException: java.net.URISyntaxException cannot be cast to java.lang.Error
for everything we tried to load, and the problem was that the PIG_CONF_DIR env variable was pointing to a folder that did not exist. We've reset it in the .bash_profile to a folder with valid core-site.xml and mapred-site.xml and everything's good now.
export PIG_CONF_DIR=/my_good_folder

Related

AzureSynapse Lookup UserErrorFileNotFound with Wildcard path

I am facing an odd issue where my lookup is returning a filenotfound error when I use a wildcard path. If I specify and exact file path, the lookup runs without error. However, if I replace the filename with a *, I get a filenotfound error.
The file is Data_643.json, located in my Azure Data Lake Storage Gen2, under the labournavigatorfile system. The exact file path is:
labournavigatorfile/raw_data/Scraped/HeadHunter/Saudi_Arabia/Data_643.json.
If I put this exact path into the Integration dataset configuration, the pipeline runs without issue. However, as soon as I replace the 'Data_643.json' with a '*', the pipeline crashes with a filenotfound error.
What am I doing wrong? Many Thanks for any support. This must be something very simple that I am missing.
Exact path works:
Wildcrad path throws error:
I have 3 files in my container as file1.json, file2.json, file3.json as shown below:
The following is how I configured my dataset to read using wildcard with configuration same as in the image provided in the question.
When I used this in lookup I got the same error:
To overcome this, go to your lookup activity. When you want to use wildcards to read a file/files, check the wildcard file path option. Then specify the folder structure and use wildcard where required. The following is an image for reference.
The following is the debug output when I run the pipeline (Each of my files had 10 rows):

error: failed to encode '---------_dict.sql' from UTF-8 to Windows-1250

when I download the repository I get this error from the git
error: failed to encode '---------_dict.sql' from UTF-8 to Windows-1250.
Then while I want to commit and push I get the same error with the same files with the .sql extension. Anyone have any idea? Someone had a similar problem? Could it be related to the .gitattributes file which has
* .sql text working-tree-encoding = Windows-1250
This error message means that some part of the conversion failed, most likely because the contents of the file cannot be converted to windows-1250. It's likely that the file contains UTF-8 sequences corresponding to Unicode characters that have no representation in windows-1250.
You should contact the author of the repository and notify them of this problem and ask them to fix it. In your local system, you can add .git/info/attributes which has the following to force the files to UTF-8 instead:
*.sql text working-tree-encoding=UTF-8
Note that if you do this, you must ensure that the files you check in are actually UTF-8 and not windows-1250.

Unexpected error running Liquibase: Unknown parameter: '#Liquibase.properties

I am setting up a new user for liquibase (3.5.3). When we run the following command:
liquibase --defaultsFile=Config /Liquibase.properties --logLevel=Info
We get this error message:
--contexts=initial update Unexpected error running Liquibase: Unknown parameter: '#Liquibase.properties '
SEVERE 2/7/17 11:39 AM: liquibase: Unknown parameter:
'#Liquibase.properties'
liquibase.exception.CommandLineParsingException: Unknown parameter:
'#Liquiba se.properties'
at liquibase.integration.commandline.Main.parsePropertiesFile(Main.java:
476)
at liquibase.integration.commandline.Main.run(Main.java:164)
at liquibase.integration.commandline.Main.main(Main.java:103)
For more information, use the --logLevel flag
I thought there may have been a funny character in the file, so we recreated it, but still received the same error. We also, took a working copy of a properties file from another project and modified it. This also produced the same result.
Any ideas on what is going wrong or thoughts on how to fix it, would be greatly appreciated.
m
 is a UTF-8 Byte order mark (or short BOM). Some text editors write one by default when using UTF-8 encoding, even though, most programs do not understand it.
In your case, liquibase seems to be one of the programs which do not understand the BOM and treat it as the beginning of a parameter. To fix this, make sure you save the file as UTF-8 without BOM if your editor supports this option, or alternatively, as ASCII or ISO 8859 (ANSI) if you only use characters defined in ASCII.

LESS Compiler: Unexpected token u

When I attempt to compile a LESS template in Visual Studio using Web Essentials, I receive an error that says "Unexpected token u" with no file name, no line number, and no column number. Why is this happening?
Go to %USERPROFILE%\AppData\Local\Microsoft\VisualStudio\12.0\Extensions which is the folder where per-user Visual Studio extensions reside. WebEssentials will be located in a subfolder with a randomly generated name.
From inside the WebEssentials folder, open up the file Resources\nodejs\tools\server\services\srv-less.js and go to line 65, which reads:
map = JSON.parse(output.map);
The problem is source map output may be the undefined value. JSON.parse can only parse strings, so it casts that to the string value "undefined" before parsing, but JSON does not recognize that as valid token. (It only understands the null value, not the undefined value.)
So... change line 65 to read:
map = JSON.parse(output.map || "null");
And voilà; LESS compilation on files with empty output works again.
Source:
https://github.com/madskristensen/WebEssentials2013/issues/1696
From my experience, this error occurs when LESS attempts to output a CSS file from a LESS file, and the resulting CSS file is empty. In my case, this happened after removing some font-face declarations, which left the resulting CSS file empty. LESS would not compile until I added a class that would output to the CSS file.
Details may be found here: https://github.com/madskristensen/WebEssentials2013/issues/1696
I'm adding this to StackOverflow because I'm unable to access Github at my workplace. I hope this helps someone.
You can also add in your less file an important comment /**/ or #charset "utf-8"; as described here https://github.com/madskristensen/WebEssentials2013/issues/1696

How to force STORE (overwrite) to HDFS in Pig?

When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers:
2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: Output Location Validation Failed for: 'hdfs://[server]/user/[user]/foo/bar More info to follow:
Output directory hdfs://[server]/user/[user]/foo/bar already exists
So I'm searching for an in-Pig solution to automatically remove the directory, also one that doesn't choke if the directory is non-existent at call time.
In the Pig Latin Reference I found the shell command invoker fs. Unfortunately the Pig script breaks whenever anything produces an error. So I can't use
fs -rmr foo/bar
(i. e. remove recursively) since it breaks if the directory doesn't exist. For a moment I thought I may use
fs -test -e foo/bar
which is a test and shouldn't break or so I thought. However, Pig again interpretes test's return code on a non-existing directory as a failure code and breaks.
There is a JIRA ticket for the Pig project addressing my problem and suggesting an optional parameter OVERWRITE or FORCE_WRITE for the STORE command. Anyway, I'm using Pig 0.8.1 out of necessity and there is no such parameter.
At last I found a solution on grokbase. Since finding the solution took too long I will reproduce it here and add to it.
Suppose you want to store your output using the statement
STORE Relation INTO 'foo/bar';
Then, in order to delete the directory, you can call at the start of the script
rmf foo/bar
No ";" or quotations required since it is a shell command.
I cannot reproduce it now but at some point in time I got an error message (something about missing files) where I can only assume that rmf interfered with map/reduce. So I recommend putting the call before any relation declaration. After SETs, REGISTERs and defaults should be fine.
Example:
SET mapred.fairscheduler.pool 'inhouse';
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
%default name 'foobar'
rmf foo/bar
Rel = LOAD 'something.tsv';
STORE Rel INTO 'foo/bar';
Once you use the fs command, there a lot of ways to do this. For an individual file, I wound up adding this to the beginning of my scripts:
-- Delete file (won't work for output, which will be a directory
-- but will work for a file that gets copied or moved during the
-- the script.)
fs -touchz top_100
rm top_100
For a directory
-- Delete dir
fs -rm -r out