Order by date time in pig script - apache-pig

After pig 0.11.0, datetime basic variable type is introduced for processing. In my case i have to order by date time. I used this way
data = LOAD 'database_name.table_name' USING org.apache.hcatalog.pig.HCatLoader() AS (id:chararray,name:chararray,birth_date_time:chararray);
selected_data = FOREACH data GENERATE id, name,ToDate(birth_date_time,'yyyy-MM-dd HH:mm:ss') AS birth_date_time;
ordered_data = ORDER selected_data BY birth_date_time DESC;
DUMP ordered_data;
But i t doesn't work. throws this error
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias cba_ordered. Backend error : Unable to
recreate exception from backed error:
AttemptID:attempt_1452577118821_0005_m_000000_3 Info:Error:
org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)I
at org.apache.pig.PigServer.openIterator(PigServer.java:872)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
Unable to recreate exception from backed error:
AttemptID:attempt_1452577118821_0005_m_000000_3 Info:Error:
org.joda.time.DateTime.compareTo(Lorg/joda/time/ReadableInstant;)I
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:429)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1324)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1309)
at org.apache.pig.PigServer.storeEx(PigServer.java:980)
at org.apache.pig.PigServer.store(PigServer.java:944)
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
... 12 more
How can we order by date_time field?

I will suggest you to use ToUnixTime(datetime) function to order date column.
Please check below and do like this.
data = LOAD 'database_name.table_name' USING org.apache.hcatalog.pig.HCatLoader() AS (id:chararray,name:chararray,birth_date_time:chararray);
selected_data = FOREACH data {
bdt = (datetime)ToDate(birth_date_time,'yyyy-MM-dd HH:mm:ss');
GENERATE id, name, bdt AS birth_date_time, ToUnixTime(bdt) AS birth_date_time_unix;
};
ordered_data = ORDER selected_data BY birth_date_time_unix DESC;
DUMP ordered_data;
Please let me know if it works.

Related

How to use WHERE statement on JSON stored in Presto SQL column to filter?

In Presto, I have data for a column in a table is as follows:
header
header 2
{Data: [{'item1': 'stuff1', 'item2': 'stuff2', 'item3': 'stuff3'}, {...}]}
cell 2
{Data: [{'item1': 'stuff11', 'item2': 'stuff21', 'item3': 'stuff31'}, {...}]}
cell 4
I was able to SELECT using JSON syntax using:
SELECT header.Data[1].item1 FROM table
and returns:
header
stuff1
stuff11
However, if I want to filter the table using the WHERE statement:
SELECT * FROM table WHERE header.Data[1].item1 = 'stuff1'
The above statement threw an error and didn't work.
I would like to return something like
header
header 2
{Data: [{'item1': 'stuff1', 'item2': 'stuff2', 'item3': 'stuff3'}, {...}]}
cell 2
Any input would be helpful. Thanks
I've tried several other queries using SQL as well such as but all returned similar error:
WHERE header.Data[1].item1 = 'stuff1'
An example of the error:
Query:
`SELECT header.Data[1].item1 AS f FROM table WHERE f LIKE '%stuff%'
'''
An error occurred while calling o12.execute. : java.sql.SQLException: Query failed (#20220330_200148_01673_9bq5k): line 2:7: Column 'f' cannot be resolved at io.prestosql.jdbc.AbstractPrestoResultSet.resultsException(AbstractPrestoResultSet.java:1761) at io.prestosql.jdbc.PrestoResultSet.getColumns(PrestoResultSet.java:252) at io.prestosql.jdbc.PrestoResultSet.create(PrestoResultSet.java:54) at io.prestosql.jdbc.PrestoStatement.internalExecute(PrestoStatement.java:249) at io.prestosql.jdbc.PrestoStatement.execute(PrestoStatement.java:227) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:750) Caused by: io.prestosql.spi.PrestoException: line 2:7: Column 'f' cannot be resolved at io.prestosql.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48) at io.prestosql.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43) at io.prestosql.sql.analyzer.SemanticExceptions.missingAttributeException(SemanticExceptions.java:33) at io.prestosql.sql.analyzer.Scope.lambda$resolveField$7(Scope.java:228) at java.base/java.util.Optional.orElseThrow(Optional.java:408) at io.prestosql.sql.analyzer.Scope.resolveField(Scope.java:228) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitIdentifier(ExpressionAnalyzer.java:438) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitIdentifier(ExpressionAnalyzer.java:342) at io.prestosql.sql.tree.Identifier.accept(Identifier.java:72) at io.prestosql.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:27) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:365) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitLikePredicate(ExpressionAnalyzer.java:702) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.visitLikePredicate(ExpressionAnalyzer.java:342) at io.prestosql.sql.tree.LikePredicate.accept(LikePredicate.java:76) at io.prestosql.sql.tree.StackableAstVisitor.process(StackableAstVisitor.java:27) at io.prestosql.sql.analyzer.ExpressionAnalyzer$Visitor.process(ExpressionAnalyzer.java:365) at io.prestosql.sql.analyzer.ExpressionAnalyzer.analyze(ExpressionAnalyzer.java:303) at io.prestosql.sql.analyzer.ExpressionAnalyzer.analyzeExpression(ExpressionAnalyzer.java:1691) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeExpression(StatementAnalyzer.java:2606) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.analyzeWhere(StatementAnalyzer.java:2465) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.lambda$visitQuerySpecification$23(StatementAnalyzer.java:1528) at java.base/java.util.Optional.ifPresent(Optional.java:183) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:1528) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:322) at io.prestosql.sql.tree.QuerySpecification.accept(QuerySpecification.java:144) at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:339) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:349) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1039) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:322) at io.prestosql.sql.tree.Query.accept(Query.java:107) at io.prestosql.sql.tree.AstVisitor.process(AstVisitor.java:27) at io.prestosql.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:339) at io.prestosql.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:308) at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:83) at io.prestosql.sql.analyzer.Analyzer.analyze(Analyzer.java:75) at io.prestosql.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:256) at io.prestosql.execution.SqlQueryExecution.(SqlQueryExecution.java:182) at io.prestosql.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:757) at io.prestosql.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:123) at io.prestosql.$gen.Presto_343____20220330_135137_2.call(Unknown Source) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
'''
Alias f introduced by SELECT header.Data[1].item1 AS f is not available in WHERE so you need to use the whole expression:
where header.Data[1].item1 LIKE '%stuff%'

TypeCast Exception while running Pig Script

Below is my pig script. Its very straightforward. Loading some data. Filtering the data by a column. Generating the schema with datatypes. Storing the data in a hive table.
When I am executing the data, its throwing
emp = load '/root/emp.nulls' using PigStorage(',');
filt = filter emp by $2 is not null;
f = foreach filt generate $0 as id:int, $1 as bdate:chararray, $2 as fname:chararray, $3 as lname:chararray, $4 as gender:chararray, $5 as hdate:chararray;
store f into 'emp_null' using org.apache.hive.hcatalog.pig.HCatStorer();
When I am executing the data, its throwing the below error
2017-09-15 11:21:04,523 [Thread-12] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1554819907_0001
java.lang.Exception: java.io.IOException: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Integer
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Integer
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Can someone help me?
EDIT:
If I generate the schema during loading itself, it works fine.
When you use the following syntax $0 as id:int you are not casting the field but using a new field to store the value in $0.The correct way to do this is to prefix the datatype in front of the field.This might have been fixed in the newer versions of Pig.Here is the issue being discussed to fix it.
f = foreach filt generate (int)$0 as id,
(chararray)$1 as bdate,
(chararray)$2 as fname,
(chararray)$3 as lname,
(chararray)$4 as gender,
(chararray)$5 as hdate;

Reference subquery fields in QueryDSL

I have to build a query in QueryDSL with a subquery, like this:
Expression<String> caseExpression = new CaseBuilder()
...;
Expression<?>[] queryProjection={
table.parameter1,
caseExpression
};
Expression<?>[] subqueryProjection={
table.parameter1.as("alias1"),
table.parameter2.as("alias2"),
table.parameter3.as("alias3"),
table.parameter4.as("alias4")
};
SQLSubQuery subQuery = new SQLSubQuery()
.from(table)
.where(...);
JPASQLQuery query = new JPASQLQuery(entityManager, ORACLE_TEMPLATE)
.from(subQuery.list(subqueryProjection));
query.list(queryProjection);
I am getting the following exception:
9/10/2014 09:45:55 AM org.apache.catalina.core.StandardWrapperValve invoke
GRAVE: Servlet.service() para servlet rtve-rest lanzó excepción
java.sql.SQLException: ORA-00904: "TABLE"."PARAMETER1": invalid identifier
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:331)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:288)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:745)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:219)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:813)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1049)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:854)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1154)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3370)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3415)
at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.hibernate.engine.jdbc.internal.proxy.AbstractStatementProxyHandler.continueInvocation(AbstractStatementProxyHandler.java:122)
at org.hibernate.engine.jdbc.internal.proxy.AbstractProxyHandler.invoke(AbstractProxyHandler.java:81)
at $Proxy78.executeQuery(Unknown Source)
at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
at org.hibernate.loader.Loader.doQuery(Loader.java:829)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:289)
at org.hibernate.loader.Loader.doList(Loader.java:2438)
at org.hibernate.loader.Loader.doList(Loader.java:2424)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2254)
at org.hibernate.loader.Loader.list(Loader.java:2249)
at org.hibernate.loader.custom.CustomLoader.list(CustomLoader.java:331)
at org.hibernate.internal.SessionImpl.listCustomQuery(SessionImpl.java:1784)
at org.hibernate.internal.AbstractSessionImpl.list(AbstractSessionImpl.java:229)
at org.hibernate.internal.SQLQueryImpl.list(SQLQueryImpl.java:156)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:257)
at com.mysema.query.jpa.sql.AbstractJPASQLQuery.list(AbstractJPASQLQuery.java:145)
This is caused because the field in my queryProjection is not the same as in my subqueryProjection. This field has to be the same as in the subquery ("alias1").
How could I reference a field by its alias? Or how could I reference a field in a subquery from outside it?
Thank you very much in advance
You need to use a Path instance with alias1 as the name or get rid of the aliasing. The simplest way to create that Path is
Expressions.path(Object.class, "alias1")
Replace Object with a more specific type if needed.
I managed to do it with Expressions.stringTemplate:
Expression<?>[] queryProjection={
Expressions.stringTemplate("alias1"),
caseExpression
};

Hive throws ArrayIndexOutOfBoundsException when select count(1) on ORC table

I have a simple table with 9 fields, using ORCFile format (I followed the steps mentioned here). When I try to count the number of rows in that table (350 million rows, btw) by submitting:
select count(1) from my_orc_table;
I get an 'ArrayIndexOutOfBoundsException'. Let me copy the stack, just in case it provides more information:
Error: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:197)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:183)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Thanks!!

HQL Select using Date as criteria error

In HQL, I am trying to use date as a criteria get some data but i am getting an error:
Code:
Date DatePlaced=new Date();
auction.setDatePlaced(DatePlaced);
auction.setUser((Users) getThreadLocalRequest().getSession(true).getAttribute("User"));
UpdateDatabase(auction);
Session session = gileadHibernateUtil.getSessionFactory().openSession();
Long UserId=(Long) getThreadLocalRequest().getSession(true).getAttribute("UserId");
String HQL="From Auction auction where User.UserId="+UserId +" and DatePlaced='"+DatePlaced+"'";
Auction A1=(Auction) session.createQuery(HQL).list().get(0);
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at com.BiddingSystem.server.ServiceImpl.UpdateAuction(ServiceImpl.java:543)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at net.sf.gilead.gwt.PersistentRemoteService.processCall(PersistentRemoteService.java:174)
What is happening here is mainly due to the date but because no data has been found because the date does not match but the data being used to search is the same date i have set above so 1 row should have been returned
try using named parameters instead of trying to build a query from a string
Auction A1=(Auction) session
.createQuery("From Auction auction where User.UserId=:userId and DatePlaced=:placed")
.setLong("userId",userId)
.setDate("placed",placed)
.list().get(0);