Apache Pig 0.12.0 on Hue not preprocessing statements as expected - apache-pig

I'm using Hue for PIG scripts on amazon EMR. I am using the declare and default statements as mentioned in the documentation.
I have some %default and %declare statements and it looks like they are
not preprocessed within Hue. Therefore, although the parameters are defined
in my script, the editor keeps popping in a parameter configuration window. If I leave the parameter blank, the job fails with an error.
Sample Script
%declare OUTPUT_FOLDER 'testingOutput01';
ts = LOAD 's3://testbucket1/input/testdata-00000.gz' USING PigStorage('\t');
STORE ts INTO 's3://testbucket1/$OUTPUT_FOLDER' USING PigStorage('\t');
Upon execution, it shows the pop-up window asking for values for OUTPUT_FOLDER. If I leave it blank it fails with the following error:
2015-06-23 20:15:54,908 [main] ERROR org.apache.pig.Main - ERROR 2997:
Encountered IOException. org.apache.pig.tools.parameters.ParseException:
Encountered "<EOF>" at line 1, column 12.
Was expecting one of:
<IDENTIFIER> ...
<OTHER> ...
<LITERAL> ...
<SHELLCMD> ...
Is that the expected behavior? Is this a known issue or am I missing something?
Configuration details:
AMI version:3.7.0
Hadoop distribution:Amazon 2.4.0
Applications:Hive 0.13.1, Pig 0.12.0, Impala 1.2.4, Hue
The same behavior is seen with default instead of declare.
If you need any clarifications then please do comment on this question. I will update it as needed.

Hue does not support %declare with a default statement. It will be fixed with: https://issues.cloudera.org/browse/HUE-2508
The current temporary workaround is to put any value in the popup.

Related

DBT: How to fix Database Error Expecting Value?

I was running into troubles today while running Airflow and airflow-dbt-python. I tried to debug a bit using the logs and the error shown in the logs was this one:
[2022-12-27, 13:53:53 CET] {functions.py:226} ERROR - [0m12:53:53.642186 [error] [MainThread]: Encountered an error:
Database Error
Expecting value: line 2 column 5 (char 5)
Quite a weird one.
Possibly check your credentials file that allows DBT to run queries on your DB (in our case we run DBT with BigQuery), in our case the credentials file was empty. We even tried to run DBT directly in the worker instead of running it through airflow, giving as a result exactly the same error. Unfortunately this error is not really explicit.

why dbt runs in cli but throws an error on cloud UI for the exact same model?

I am executing dbt run -s model_name on CLI and the task completes successfully. However, when I run the exact same command on dbt cloud, I get this error:
Syntax or semantic analysis error thrown in server while executing query.
Error message from server: org.apache.hive.service.cli.HiveSQLException:
Error running query: org.apache.spark.sql.AnalysisException: cannot
resolve '`pv.meta.uuid`' given input columns: []; line 6 pos 4;
\n'Project ['pv.meta.uuid AS page_view_uuid#249595,
'pv.user.visitorCookieId AS (80) (SQLExecDirectW)")
it looks like it fails recognizing 'pv.meta.uuid' syntax which extract data from a json format. It is not clear to me what is going on. Any thoughts? Thank you!

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Hive error with the command show tables

I am using Apache-Hadoop and Hive as a setup. The hive do get connected with the Hadoop,tables are also created. But with the command show tables this exception occurs:Failed with the exception java.io.IOException:org.apache.hadoop.mapred.InvalidInputException:Input Pattern file:/tmp/${hduser}/034cbea3-2b60-49f5-8284-d6fba957dda3/hive_2015-06-18_05-10-04_183_5811447541305606525-1/-local-10000 matches 0 files
What is the exception and how should i solve it. Please help me.
So please check out the file: vim $HIVE_HOME/conf/hive-site.xml, and you should check the <name>system:user.name, it's value should be hduser not ${hduser}.
please take the right/correct name for the user.

Not able to generate liquibase changelog

I am trying to execute command on my command prompt:
liquibase --driver=com.mysql.jdbc.Driver --classpath=E:\mysqljar\mysql.jar --changeLogFile=E:\1.xml --url="jdbc:mysql://localhost:3306/abc" --username=root --password=root generateChangeLog
But I am getting this error:
Liquibase Update Failed: Empty result set, expected one row
SEVERE 24/9/13 6:29 PM:liquibase: Empty result set, expected one row
liquibase.exception.DatabaseException: Error getting jdbc:mysql://localhost:3306/abc view with liquibase.statement.core.GetViewDefinitionStatement#53330681
at liquibase.snapshot.jvm.JdbcDatabaseSnapshotGenerator.readView(JdbcDatabaseSnapshotGenerator.java:168)
at liquibase.snapshot.jvm.JdbcDatabaseSnapshotGenerator.readViews(JdbcDatabaseSnapshotGenerator.java:304)
at liquibase.snapshot.jvm.JdbcDatabaseSnapshotGenerator.createSnapshot(JdbcDatabaseSnapshotGenerator.java:241)
at liquibase.snapshot.DatabaseSnapshotGeneratorFactory.createSnapshot(DatabaseSnapshotGeneratorFactory.java:69)
at liquibase.diff.Diff.compare(Diff.java:63)
at liquibase.integration.commandline.CommandLineUtils.doGenerateChangeLog(CommandLineUtils.java:145)
at liquibase.integration.commandline.Main.doMigration(Main.java:760)
at liquibase.integration.commandline.Main.main(Main.java:134)
Caused by: liquibase.exception.DatabaseException: Empty result set, expected one row
at liquibase.util.JdbcUtils.requiredSingleResult(JdbcUtils.java:124)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:159)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:167)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:163)
at liquibase.database.AbstractDatabase.getViewDefinition(AbstractDatabase.java:748)
at liquibase.snapshot.jvm.JdbcDatabaseSnapshotGenerator.readView(JdbcDatabaseSnapshotGenerator.java:166)
... 7 more
Could anyone help me to interpret this?
I had the same problem.
My fix:
There needs to be an entry in the database change log lock table; it needs to have id=1, locked=false and the rest of the values set to null.
I'm getting the same error also using MySQL and tried v3.0.0 and 3.0.5 of liquibase. The error was the same when I tried to do a migrate as well as a generateChangeLog.
./liquibase --logLevel=debug --changeLogFile=./db.changelog-test-v0.1.xml --username=abc --password=abc99 --url="jdbc:mysql://localhost:3306/test" migrate
FYI, here is the select statement that it had trouble executing for the 'migrate' command:
select view_definition from information_schema.views where table_name='patient_info' and table_schema='test'
The information_schema.views table is empty.