Syntax error when I try to load file .log - apache-pig

I have the next line:
(2013-03-21 14:36:21,686) WARN TP-Processor16 com.lolo.exc - fail load)
(2013-03-21 14:36:21,686) WARN TP-Processor11 com.lolo.exc - fail load)
it's a log4j log, and I have to use PIG to process it.
I'm not sure what kind of method I have to use.... textloader or pigstorage, let me know your thugts

Related

How to specify the searchPath when using liquibase commandline liquibase.integration.commandline.Main

(Using liquibase 4.18.0 and also tried 4.19.0)
I want to add two additional parameters to my (working) liquibase call
--hub-mode=off
--searchPath="some/resources"
Working:
java liquibase.integration.commandline.Main --logLevel=info --defaultsFile=project.properties update
Not working:
java liquibase.integration.commandline.Main --logLevel=info --searchPath="some/resources" --defaultsFile=project.properties update
I always get:
Unknown option 'searchPath'
If I remove this option I get the same for hub-mode. If I remove both the resource could not be found and liquibase tells me
"More locations can be added with the 'searchPath' parameter."
I checked the declaredFields variable and there are the following options defined and the two are missing:
runningFromNewCli
newCliChangelogParameters
outputStream
LOG
coreBundle
classLoader
driver
username
password
url
hubConnectionId
hubProjectId
hubProjectName
databaseClass
defaultSchemaName
outputDefaultSchema
outputDefaultCatalog
liquibaseCatalogName
liquibaseSchemaName
databaseChangeLogTableName
databaseChangeLogLockTableName
databaseChangeLogTablespaceName
defaultCatalogName
changeLogFile
overwriteOutputFile
classpath
contexts
labels
labelFilter
driverPropertiesFile
propertyProviderClass
changeExecListenerClass
changeExecListenerPropertiesFile
promptForNonLocalDatabase
includeSystemClasspath
defaultsFile
diffTypes
changeSetAuthor
changeSetContext
dataOutputDirectory
referenceDriver
referenceUrl
referenceUsername
referencePassword
referenceDefaultCatalogName
referenceDefaultSchemaName
currentDateTimeFunction
command
commandParams
logLevel
logFile
changeLogParameters
outputFile
excludeObjects
includeCatalog
includeObjects
includeSchema
includeTablespace
deactivate
outputSchemasAs
referenceSchemas
schemas
snapshotFormat
liquibaseProLicenseKey
liquibaseProLicenseValid
liquibaseHubApiKey
liquibaseHubUrl
managingLogConfig
outputsLogMessages
sqlFile
delimiter
rollbackScript
rollbackOnError
suspiciousCodePoints
Any idea how to specify the searchpath for the commandline executable?
I did read this post but the solution did not help.

DBT: How to fix Database Error Expecting Value?

I was running into troubles today while running Airflow and airflow-dbt-python. I tried to debug a bit using the logs and the error shown in the logs was this one:
[2022-12-27, 13:53:53 CET] {functions.py:226} ERROR - [0m12:53:53.642186 [error] [MainThread]: Encountered an error:
Database Error
Expecting value: line 2 column 5 (char 5)
Quite a weird one.
Possibly check your credentials file that allows DBT to run queries on your DB (in our case we run DBT with BigQuery), in our case the credentials file was empty. We even tried to run DBT directly in the worker instead of running it through airflow, giving as a result exactly the same error. Unfortunately this error is not really explicit.

PhantomJS Showing exit code 0 even after network error

I am using PhantomJS for printing pdf from my web page and then storing the resultant pdf to S3 if the pdf is generated successfully.
My problem is PhantomJS is returning exit code 0 i.e. success even after a network error occur and the resultant pdf is not which I want.
So I want to know is there any way to abort PhantomJS when error occur with an exit code .
Currently the error in which this happening is NETWORK Error : 101
But even though PhantomJS do not abort and return with exit code 0
The only time that PhantomJS exits on its own is when it encounters a Syntax Error of your script (PhantomJS 2.0.0 has a bug and simply freezes and doesn't print anything at this point).
For everything else you need to call phantom.exit() to exit. phantom.exit() takes an optional argument which is the exit code. So when you encounter an error for example by checking the success argument of the page.open() callback or when onResourceError or onResourceTimeout events are triggered, you can exit PhantomJS with your own intended exit code such as
phantom.exit(1);

Apache Pig 0.12.0 on Hue not preprocessing statements as expected

I'm using Hue for PIG scripts on amazon EMR. I am using the declare and default statements as mentioned in the documentation.
I have some %default and %declare statements and it looks like they are
not preprocessed within Hue. Therefore, although the parameters are defined
in my script, the editor keeps popping in a parameter configuration window. If I leave the parameter blank, the job fails with an error.
Sample Script
%declare OUTPUT_FOLDER 'testingOutput01';
ts = LOAD 's3://testbucket1/input/testdata-00000.gz' USING PigStorage('\t');
STORE ts INTO 's3://testbucket1/$OUTPUT_FOLDER' USING PigStorage('\t');
Upon execution, it shows the pop-up window asking for values for OUTPUT_FOLDER. If I leave it blank it fails with the following error:
2015-06-23 20:15:54,908 [main] ERROR org.apache.pig.Main - ERROR 2997:
Encountered IOException. org.apache.pig.tools.parameters.ParseException:
Encountered "<EOF>" at line 1, column 12.
Was expecting one of:
<IDENTIFIER> ...
<OTHER> ...
<LITERAL> ...
<SHELLCMD> ...
Is that the expected behavior? Is this a known issue or am I missing something?
Configuration details:
AMI version:3.7.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Impala 1.2.4, Hue
The same behavior is seen with default instead of declare.
If you need any clarifications then please do comment on this question. I will update it as needed.
Hue does not support %declare with a default statement. It will be fixed with: https://issues.cloudera.org/browse/HUE-2508
The current temporary workaround is to put any value in the popup.

Error using CSVLoader from piggybank

I am trying to use CSVLoader from Piggybank. Below are the first two lines of my code:
register 'piggybank.jar' ;
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
It throws the following error:
2013-10-24 14:26:51,427 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file
system at: file:///
2013-10-24 14:26:52,029 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.pig.piggybank.storage.CSVLoader using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Can someone tell me what's going on? I am executing this script from the same folder where my piggybank.jar is located.
I ran into a similar problem when I was experimenting with pig, although it was the XMLLoader for me. The solution that worked for me was to register the entire path to the jar, instead of the relative path. so if the jar is located at /usr/lib/pig/piggybank.jar run the code as follows:
register '/usr/lib/pig/piggybank.jar' ;
define CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
I checked out the code from the url 'http://svn.apache.org/repos/asf/pig/trunk/' and re-built the jar file. IT works fine now. :)
The same is working fine
register 'piggybank.jar' ;
A = load '/xmlinput/demo.xml' using org.apache.pig.piggybank.storage.XMLLoader('property') as (x:chararray);
B = foreach A generate REPLACE(x,'[\n]','') as x;
C = foreach B generate REGEX_EXTRACT_ALL(x,'.(?:)([^<]).(?:)([^<]).*');
D =FOREACH C GENERATE FLATTEN (($0));
STORE D INTO 'xmlcsvpig' USING org.apache.pig.piggybank.storage.CSVExcelStorage();