I have a use-case in which i need to take in the date of a month to return the previous month's last date.
Ex: input:20150331 output:20150228
I will be using this previous month's last date to filter a daily partition(in pig script).
B = filter A by daily_partition == GetPrevMonth(20150331);
I have created an UDF(GetPrevMonth) which takes the date and returns the previous month's last date.But unable to use it on the filter.
ERROR:Could not infer the matching function for GetPrevMonth as multiple or none of them fit. Please use an explicit cast.
My udf takes tuple as input.
Googling it says that UDF cannot be applied on filters.
Is there any workaround? or am i going wrong somewhere?
UDF:public class GetPrevMonth extends EvalFunc<Integer> {
public Integer exec(Tuple input) throws IOException {
String getdate = (String) input.get(0);
if (getdate != null){
try{
//LOGIC to return prev month date
}
Need help.Thanks in advance.
You can call a UDF in a FILTER, but you are passing a number to the function while you expect it to receive a String (chararray inside Pig):
String getdate = (String) input.get(0);
The simple solution would be to cast it to chararray when calling the UDF:
B = filter A by daily_partition == GetPrevMonth((chararray)20150331);
Generally, when you see some error like Could not infer the matching function for X as multiple or none of them fit, 99% of the time the reason is that the values you are trying to pass to the UDF are wrong.
One last thing, even if it is not necessary, in a future you might want to write a pure FILTER UDF. In that case, instead of inheriting from EvalFunc, you need to inherit from FilterFunc and return a Boolean value:
public class IsPrevMonth extends FilterFunc {
#Override
public Boolean exec(Tuple input) throws IOException {
try {
String getdate = (String) input.get(0);
if (getdate != null){
//LOGIC to retrieve prevMonthDate
if (getdate.equals(prevMonthDate)) {
return true;
} else {
return false;
}
} else {
return false;
}
} catch (ExecException ee) {
throw ee;
}
}
}
Related
When we send a URL with request parameters that needs to be converted to date, in SpringMVC we can do something like the code below in the controller and the fasterxml json library does the automatic conversion!
public String getFare(##RequestParam(value = "flightDate") #DateTimeFormat(iso = ISO.DATE) LocalDate date)
But how to achieve the same when we use the HandlerFunction (Spring webflux)? For example, in my HandlerFunction
public HandlerFunction<ServerResponse> getFare = serverRequest ->
{
Optional<String> flightDate = serverRequest.queryParam("flightDate");
}
The code serverRequest.queryParam("flightDate") gives a String. Is it possible to get the same automatic conversion here?
No. (you can look at Spring's source code and see that no other way to get the queryParams other than getting it as Optional<String>)
You must convert the field to Date yourself
Date flightDate = request.queryParam("flightDate ")
.map(date -> {
try {
return new SimpleDateFormat("dd-MMM-yyyy").parse(date);
} catch (ParseException e) {
return null;
}
}).orElse(null);
I am loading a CSV file with 56 fields. I want to apply TRIM() function in Pig for all fields in the tuple.
I tried:
B = FOREACH A GENERATE TRIM(*);
But it fails with below error-
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching
function for org.apache.pig.builtin.TRIM as multiple or none of them
fit. Please use an explicit cast.
Please help. Thank you.
To Trim a tuple in the Pig, you should create a UDF. Register the UDF and apply the UDF with Foreach statement to the field of the tuple you want to trim. Below is the code for trimming the tuple with UDF.
public class StrTrim extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String)input.get(0);
return str.trim();
}
catch(Exception e) {
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}
I have a script which is loading some data about venues:
venues = LOAD 'venues_extended_2.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (Name:chararray, Type:chararray, Latitude:double, Longitude:double, City:chararray, Country:chararray);
Then I want to create UDF which has a constructor that is accepting venues type.
So I tried to define this UDF like that:
DEFINE GenerateVenues org.gla.anton.udf.main.GenerateVenues(venues);
And here is the actual UDF:
public class GenerateVenues extends EvalFunc<Tuple> {
TupleFactory mTupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
private static final String ALLCHARS = "(.*)";
private ArrayList<String> venues;
private String regex;
public GenerateVenues(DataBag venuesBag) {
Iterator<Tuple> it = venuesBag.iterator();
venues = new ArrayList<String>((int) (venuesBag.size() + 1)); // possible fails!!!
String current = "";
regex = "";
while (it.hasNext()){
Tuple t = it.next();
try {
current = "(" + ALLCHARS + t.get(0) + ALLCHARS + ")";
venues.add((String) t.get(0));
} catch (ExecException e) {
throw new IllegalArgumentException("VenuesRegex: requires tuple with at least one value");
}
regex += current + (it.hasNext() ? "|" : "");
}
}
#Override
public Tuple exec(Tuple tuple) throws IOException {
// expect one string
if (tuple == null || tuple.size() != 2) {
throw new IllegalArgumentException(
"BagTupleExampleUDF: requires two input parameters.");
}
try {
String tweet = (String) tuple.get(0);
for (String venue: venues)
{
if (tweet.matches(ALLCHARS + venue + ALLCHARS))
{
Tuple output = mTupleFactory.newTuple(Collections.singletonList(venue));
return output;
}
}
return null;
} catch (Exception e) {
throw new IOException(
"BagTupleExampleUDF: caught exception processing input.", e);
}
}
}
When executed the script is firing error at the DEFINE part just before (venues);:
2013-12-19 04:28:06,072 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file script.pig, line 6, column 60> mismatched input 'venues' expecting RIGHT_PAREN
Obviously I'm doing something wrong, can you help me out figuring out what's wrong.
Is it the UDF that cannot accept the venues relation as a parameter. Or the relation is not represented by DataBag like this public GenerateVenues(DataBag venuesBag)?
Thanks!
PS I'm using Pig version 0.11.1.1.3.0.0-107.
As #WinnieNicklaus already said, you can only pass strings to UDF constructors.
Having said that, the solution to your problem is using distributed cache, you need to override public List<String> getCacheFiles() to return a list of filenames that will be made available via distributed cache. With that, you can read the file as a local file and build your table.
The downside is that Pig has no initialization function, so you have to implement something like
private void init() {
if (!this.initialized) {
// read table
}
}
and then call that as the first thing from exec.
You can't use a relation as a parameter in a UDF constructor. Only strings can be passed as arguments, and if they are really of another type, you will have to parse them out in the constructor.
This is the way I am looking to process my data.. from pig..
A = Load 'data' ...
B = FOREACH A GENERATE my.udfs.extract(*);
or
B = FOREACH A GENERATE my.udfs.extract('flag');
So basically extract either has no arguments or takes an argument... 'flag'
On my udf side...
#Override
public DataBag exec(Tuple input) throws IOException {
//if flag == true
//do this
//else
// do that
}
Now how do i implement this in pig?
The preferred way is to use DEFINE.
,,Use DEFINE to specify a UDF function when:
...
The constructor for the
function takes string parameters. If you need to use different
constructor parameters for different calls to the function you will
need to create multiple defines – one for each parameter set"
E.g:
Given the following UDF:
public class Extract extends EvalFunc<String> {
private boolean flag;
public Extract(String flag) {
//Note that a boolean param cannot be passed from script/grunt
//therefore pass it as a string
this.flag = Boolean.valueOf(flag);
}
public Extract() {
}
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
return null;
}
try {
if (flag) {
...
}
else {
...
}
}
catch (Exception e) {
throw new IOException("Caught exception processing input row ", e);
}
}
}
Then
define ex_arg my.udfs.Extract('true');
define ex my.udfs.Extract();
...
B = foreach A generate ex_arg(); --calls extract with flag set to true
C = foreach A generate ex(); --calls extract without any flag set
Another option (hack?) :
In this case the UDF gets instantiated with its noarg constructor and you pass the flag you want to evaluate in its exec method. Since this method takes a tuple as a parameter you need to first check whether the first field is the boolean flag.
public class Extract extends EvalFunc<String> {
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0) {
return null;
}
try {
boolean flag = false;
if (input.getType(0) == DataType.BOOLEAN) {
flag = (Boolean) input.get(0);
}
//process rest of the fields in the tuple
if (flag) {
...
}
else {
...
}
}
catch (Exception e) {
throw new IOException("Caught exception processing input row ", e);
}
}
}
Then
...
B = foreach A generate Extract2(true,*); --use flag
C = foreach A generate Extract2();
I'd rather stick to the first solution as this smells.
As the code shown below, I want to get value from the OracleParameter object. Its datatype is datetime.
...
Dim cmd As New OracleCommand("stored_proc_name", cnObject)
cmd.Parameters.Add("tran_date_out", OracleDbType.Date, ParameterDirection.Output)
...
cmd.ExecuteNonQuery()
...
Dim tranDate As Date
tranDate = cmd.Parameters("tran_date_out").Value
When I assign value to tranDate variable, I get an error. But if I code as below, I get only the date.
tranDate = CDate(cmd.Parameters("tran_date_out").Value.ToString)
So how can I get the value both date and time to tranDate variable?
off the top of my head, the OracleParameter.Value, when an out parameter, is assigned to a strange Oracle boxed type. It seems like a completely terrible design or Oracle's part... but instead of returning String, you will get OracleString, etc.
Each of the Oracle types has a .Value that has the system type, but of course they don't all implement a common interface to expose this, so what I did was basically write a method to unbox the types:
/// <summary>
/// The need for this method is highly annoying.
/// When Oracle sets its output parameters, the OracleParameter.Value property
/// is set to an internal Oracle type, not its equivelant System type.
/// For example, strings are returned as OracleString, DBNull is returned
/// as OracleNull, blobs are returned as OracleBinary, etc...
/// So these Oracle types need unboxed back to their normal system types.
/// </summary>
/// <param name="oracleType">Oracle type to unbox.</param>
/// <returns></returns>
internal static object UnBoxOracleType(object oracleType)
{
if (oracleType == null)
return null;
Type T = oracleType.GetType();
if (T == typeof(OracleString))
{
if (((OracleString)oracleType).IsNull)
return null;
return ((OracleString)oracleType).Value;
}
else if (T == typeof(OracleDecimal))
{
if (((OracleDecimal)oracleType).IsNull)
return null;
return ((OracleDecimal)oracleType).Value;
}
else if (T == typeof(OracleBinary))
{
if (((OracleBinary)oracleType).IsNull)
return null;
return ((OracleBinary)oracleType).Value;
}
else if (T == typeof(OracleBlob))
{
if (((OracleBlob)oracleType).IsNull)
return null;
return ((OracleBlob)oracleType).Value;
}
else if (T == typeof(OracleDate))
{
if (((OracleDate)oracleType).IsNull)
return null;
return ((OracleDate)oracleType).Value;
}
else if (T == typeof(OracleTimeStamp))
{
if (((OracleTimeStamp)oracleType).IsNull)
return null;
return ((OracleTimeStamp)oracleType).Value;
}
else // not sure how to handle these.
return oracleType;
}
This probably isn't the cleanest solution, but... it was quick and dirty,a nd does work for me.
Just pass the OracleParameter.Value into this method.
Actually, I might have only 1/2 read your question before answering. I think Oracle's Date type only contains the date not the time.
The oracle type Timestamp has both the date and time.
Hope that helps! :)
I have not tested heavily, but here is a less verbose version of CodingWithSpike's answer using reflection...
public static object UnBoxOracleType(object oracleType) {
if(oracleType==null) {
return null;
}
if((bool)oracleType.GetType().GetProperty("IsNull").GetValue(oracleType)) {
return null;
}
return oracleType.GetType().GetProperty("Value").GetValue(oracleType);
}
Again you are passing OracleParameter.Value into this method.
Or for a more typesafe version you could do this:
public static object GetValue(OracleParameter param) {
if(param == null || param.Value==null) {
return null;
}
var oracleType=param.Value;
if((bool)oracleType.GetType().GetProperty("IsNull").GetValue(oracleType)) {
return null;
}
return oracleType.GetType().GetProperty("Value").GetValue(oracleType);
}
In this case you would pass the oracle parameter itself in to get the value back. This could also be implemented as an extension method if you so desired...