PIG Algebraic UDF with arguments : Combiner optimizer error - optimization

I'm trying to build a Pig UDF that performs some aggregation on a variable of type double. To do so, I built an algebraic UDF called Aggreg. It is called in the following script:
REGISTER 'Test.jar';
DEFINE Aggreg com.pig.test.Agreg();
records = LOAD '/tmp/Test.csv' USING PigStorage(',') AS (v1:chararray, v2:double);
grouped_rec = GROUP records ALL;
test = FOREACH grouped_rec GENERATE Aggreg(records.v2) AS val;
DUMP test;
This works fine as it is. Then, I wanted to use the arguments for this UDF so I added a public constructor with one String argument.
I just changed the DEFINE statement in the previous script but haven't yet used the argument in the UDF Java code:
DEFINE Aggreg com.pig.test.Agreg('Test');
And now I get the following error:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2018: Internal error. Unable to introduce the combiner for optimization.
Any ideas where this could come from?

Using Algebraic interface, you must implement two constructors in classes Initial, Intermed and Final, the default constructor and constructor with the parameter you use.
static public class Initial extends EvalFunc<Tuple> {
public Initial(){}
public Initial(String str){Aggreg.string=trs;}
#Override
public Tuple exec(Tuple input) throws IOException {
...
}
}
static public class Intermed extends EvalFunc<Tuple> {
public Intermed(){}
public Intermed(String str){Aggreg.string=trs;}
#Override
public Tuple exec(Tuple input) throws IOException {
...
}
}
static public class Final extends EvalFunc<Tuple> {
public Final(){}
public Final(String str){Aggreg.string=trs;}
#Override
public Tuple exec(Tuple input) throws IOException {
...
}
}
public String getInitial() {
return Initial.class.getName();
}
public String getIntermed() {
return Intermed.class.getName();
}
public String getFinal() {
return Final.class.getName();
}

Related

Use of "RecordCondition.ExcludeIfMatchRegex"

Library version: v2.0.0.0
I would like to use ExcludeIfMatchRegex to exclude certain lines in the input file.
I have tested next code but the system is displaying the usual message error Object reference not set to an instance of an object.
If I remove the line containing "ConditionalRecord", the system reads the file and returns the usual validation messages.
using FileHelpers;
using System;
[IgnoreEmptyLines()]
[ConitionalRecord(RecordCondition.ExcludeIfMatchRegex, "[0-9 A-Za-z.,]{1}S[0-9 A-Za-z.,]{10}")]
[FixedLengthRecord(FixedMode.ExactLength)]
public sealed class PurchaseOrder : INotifyRead
{
[FieldFixedLength(1)]
[FieldTrim(TrimMode.Both)]
public string C;
[FieldFixedLength(1)]
[FieldTrim(TrimMode.Both)]
public string A;
[FieldFixedLength(10)]
[FieldTrim(TrimMode.Both)]
public string item;
public void AfterRead(EngineBase engine, string line)
{
// not exist the property "SkipThisRecord"??
}
}
Looks like a small bug in the 2.0.0.0 library.
When the FileHelpers engine reads a file but ALL lines are excluded AND the class is decorated with INotifyRead it throws the Object Reference error.
However you can work around it by using the AfterReadRecord event instead.
[IgnoreEmptyLines()]
[ConditionalRecord(RecordCondition.ExcludeIfMatchRegex, "[0-9 A-Za-z.,]{1}S[0-9 A-Za-z.,]{10}")]
[FixedLengthRecord(FixedMode.ExactLength)]
public sealed class PurchaseOrder
{
[FieldFixedLength(1)]
[FieldTrim(TrimMode.Both)]
public string C;
[FieldFixedLength(1)]
[FieldTrim(TrimMode.Both)]
public string A;
[FieldFixedLength(10)]
[FieldTrim(TrimMode.Both)]
public string item;
}
internal class Program
{
static void Main(string[] args)
{
FileHelperEngine engine = new FileHelperEngine(typeof(PurchaseOrder));
// use the AfterReadRecord event instead of the INotifyRead interface
engine.AfterReadRecord += Engine_AfterReadRecord;
// The record will be skipped because of the Regex
var records = engine.ReadString("0S0123456789");
Debug.Assert(records.Length == 0);
Console.Write("All OK. No records were imported.");
Console.ReadKey();
}
// Define the event here instead of in your FileHelpers class
private static void Engine_AfterReadRecord(EngineBase engine, AfterReadRecordEventArgs e)
{
// not exist the property "SkipThisRecord"??
}

How to see arguments when creating a new class?

When creating a new class or method I used to be able to see the parameters needed. But, now they don't come up anymore. How do I view parameters when creating a class?
Running the latest windows version.
public class Main {
public static void main(String args[]) {
Case theCase = new Case("Default", "Corsair", "500W");
}
}
public class Case {
private String model;
private String manufacturer;
private String powerSupply;
public Case(String model, String manufacturer, String powerSupply,) {
this.model = model;
this.manufacturer = manufacturer;
this.powerSupply = powerSupply;
}
public void pressPowerButton() {
System.out.println("Power button pressed");
}
public String getModel() {
return model;
}
public String getManufacturer() {
return manufacturer;
}
public String getPowerSupply() {
return powerSupply;
}
}
When making theCase I can't see what my parameters are and have to move to the "Case" class back and forth
You can explicitly call Parameter Info action which is usually mapped to Ctrl/(Cmd) - p.
Nevermind in order to see the parameters as you type you must type them while in the editor without moving your cursor.

camel custom marshalling with dataFormat name in header

I'm having two routes in two separated projects :
First route is setting the header with a data format bean name as a constant :
setHeader("dataFormatBeanName", constant("myFirstList"))
First route :
public class MyTest {
#Configuration
public static class MyTestConfig extends CamelConfiguration {
#Bean(name = "myFirstList")
public DataFormat getMyFirstListDataFormat() {
return new MyFirstListDataFormat();
}
#Bean(name = "mySecondList")
public DataFormat getMySecondListDataFormat() {
return new MySecondListDataFormat();
}
#Bean
public RouteBuilder route() {
return new RouteBuilder() {
#Override
public void configure() throws Exception {
from("direct:testFirstDataFormat").setHeader("dataFormatBeanName", constant("myFirstList")).to("direct:myRoute");
from("direct:testSecondDataFormat").setHeader("dataFormatBeanName", constant("mySecondList")).to("direct:myRoute");
}
};
}
}
}
Second route is supposed to retrieve the bean name from the header and use it as a custom marshaller. Something like :
custom(header("dataFormatBeanName"))
(doesn't compile)
Anyone knows how I'm supposed to get my bean name from the header to use it in the custom method ?
#Component
public class MyRouteBuilder extends RouteBuilder {
#Override
public void configure() throws Exception {
final RouteDefinition routedefinition = this.from("direct:myRoute");
routedefinition.marshal().custom(??????????).to("netty4:tcp://{{route.address}}:{{port}}?textline=true&sync=true");
}
After a few more hours searching, here is the solution a found :
No changes in the first class.
Second class uses an anonymous DataFormat in which I retrieve the bean name from the header and get the spring bean from camel context before calling its marshal method.
The AbstractXxxDataFormat class belongs to project2 and is inherited by the Project1 DataFormat.
#Override
public void configure() throws Exception {
final RouteDefinition routedefinition = this.from("direct:myRoute");
routedefinition.marshal(new DataFormat() {
#Override
public void marshal(final Exchange exchange, final Object graph, final OutputStream stream) throws Exception {
AbstractXxxDataFormat myDataFormat = (AbstractGoalDataFormat) getContext().getRegistry().lookupByName(exchange.getIn().getHeader("dataFormatBeanName", String.class));
myDataFormat.marshal(exchange, graph, stream);
}
#Override
public Object unmarshal(final Exchange exchange, final InputStream stream) throws Exception {
return null;
}
});
routedefinition.to("netty4:tcp://{{route.address}}:{{port}}?textline=true&sync=true");
}
If there's any better solution available, I'll be interested.
Have you tried simple("${header.dataFormatBeanName}") to access the header?
Also, rather than passing the format bean name in a header in the first place, why not factor out each .marshal() call into two subroutes (one for formatBeanA and one for formatBeanB) and then call the appropriate subroute rather than setting the header in the first place? I believe this could be a cleaner approach.
If you really need to get it in the route as a variable (as opposed to a predicate to be used in the builder api) you could use an inline processor to extract it:
public class MyRouteBuilder extends RouteBuilder {
public void configure() throws Exception {
from("someEndpoint")
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
String beanName = exchange.getHeader("beanNameHeader");
}
});
}
}
Just be careful of scope and concurrency when storing the extracted beanName however.
A collegue of mine (thanks to him) found the definite solution :
set bean name in the exchange properties :
exchange.setProperty("myDataFormat", "myDataFormatAutowiredBean");
retrieve the dataFormat bean with RecipientList pattern and (un)marshal :
routedefinition.recipientList(simple("dataformat:${property.myDataFormat}:marshal"));
routedefinition.recipientList(simple("dataformat:${property.myDataFormat}:unmarshal"));
Very concise and works just fine.

Morphia Interface for List of enum does not work (unmarshalling)

I have the following interface
#JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = JsonTypeInfo.As.PROPERTY, property = "className")
public interface InfoChartInformation {
public String name();
}
And the following implementation (enum):
public class InfoChartSummary {
public static enum Immobilien implements InfoChartInformation {
CITY, CONSTRUCTION_DATE;
}
public static enum Cars implements InfoChartInformation {
POWER, MILEAGE;
}
}
Then I use all of It in the following entity:
#Entity(noClassnameStored = true)
#Converters(InfoChartInformationMorphiaConverter.class)
public class TestEntity{
#Id
public ObjectId id;
#Embedded
public List<InfoChartInformation> order;
}
Jackson, in order to detect the type on the unmarshalling time, will add to every enum on the list the className.
I thought morphia would do the same, but there's no field className in the List of enum and the unmarshalling cannot be done correctly: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String cannot be cast to com.mongodb
.DBObject
I guess the correct behavior should be to save all the enum route (package+name), not only the enum name. At least in that way the unmarshalling could be performed. There's a way morphia supports that by default or I need to create my own converter (similar to this) ?
I tried creating a Custom Converter:
public class InfoChartInformationMorphiaConverter extends TypeConverter{
public InfoChartInformationMorphiaConverter() {
super(InfoChartInformation.class);
}
#Override
public Object decode(Class targetClass, Object fromDBObject, MappedField optionalExtraInfo) {
if (fromDBObject == null) {
return null;
}
String clazz = fromDBObject.toString().substring(0, fromDBObject.toString().lastIndexOf("."));
String value = fromDBObject.toString().substring(fromDBObject.toString().lastIndexOf(".") + 1);
try {
return Enum.valueOf((Class)Class.forName(clazz), value);
} catch (ClassNotFoundException e) {
return null;
}
}
#Override
public Object encode(final Object value, final MappedField optionalExtraInfo) {
return value.getClass().getName() + "." + ((InfoChartInformation) value).name();
}
}
Then, I added the converter information to morphia morphia.getMapper().getConverters().addConverter(new InfoChartInformationMorphiaConverter());.
However, when serializing (or marshalling) the object to save it into the database, the custom converter is ignored and the Enum is saved using the default Morphia converter (only the enum name).
If I use in the TestEntity class only an attribute InfoChartInformation; instead of the List<>InfoChartInformation>, my customer converter will work. However I need support for List
Use:
public class InfoChartInformationMorphiaConverter extends TypeConverter implements SimpleValueConverter
It is a marker interface required to make your Convertor work.

the type must implement the inherited abstract method Reducer.reduce(Object, Iterator, OutputCollector, Reporter)

I am new to Hadoop, and this is my first Hadoop program.
I am trying to create a Mapper class called WordMapper, but it throws be the below error.
The type WordMapper must implement the inherited abstract method Mapper.map(Object, Object, OutputCollector, Reporter)
public class WordMapper extends MapReduceBase implements Mapper
{
public void map(WritableComparable key, Writable values, OutputCollector output, Reporter reporter) throws IOException
{
String line=values.toString();
StringTokenizer tok=new StringTokenizer(line);
while(tok.hasMoreTokens())
{
String t=tok.nextToken();
output.collect(new Text(t), new IntWritable(1));
}
}
}
Can someone please tell me where i am going wrong and suggest to overcome the problem
try to fulfill your Mapper parameter like this:
public class WCMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
public void map(LongWritable key, Text values, OutputCollector output, Reporter reporter)
You haven't provided any of the type parameters. Mapper is a generic interface; it's parameterized with type parameters for the input and output key and value types. Fill in K1, V1, K2, and V2 in the following code with the types you need:
public class WordMapper extends MapReduceBase implements Mapper<K1, V1, K2, V2> {
public void map(K1 key,
V1 value,
OutputCollector<K2, V2> output,
Reporter reporter)
throws IOException {
whatever();
}
}