Issues with ToDate and MonthsBetween function in pig latin - apache-pig

I am trying to calculate number of months between two datetime objects with the following code.
abc = load '/tmp/abc_2013_06_29/*' using PigStorage('\u0001') as ( open_dte: datetime, clsd_dte: datetime);
duration_in_months = MonthsBetween(open_dte, clsd_dte);
I am trying to generate the relation duration_in_months in another relation. However I am facing the following error,
Could not infer the matching function for org.apache.pig.builtin.GetMonth as multiple or none of them fit. Please use an explicit cast.
Appreciate any your help and also of any in-depth guide for learning casting and functions in pig.
Thanks,
Murali

Your code does not look correct.
Try instead
duration_in_months = FOREACH abc GENERATE MonthsBetween(open_dte, clsd_dte);

Related

BigQuery : Returning timestamp from JS udf throwing "Failed to coerce output value to type TIMESTAMP"

I have a bigquery code.
CREATE TEMP FUNCTION to_struct_attributes(input STRING)
RETURNS STRUCT<status_code STRING, created_time TIMESTAMP>
LANGUAGE js AS """
let res = JSON.parse(input);
res['created_time'] = Date(res['created_time'])
return res;
""";
SELECT
5 AS ID,
to_struct_attributes(
TO_JSON_STRING(
STRUCT(
TIMESTAMP(PARSE_TIMESTAMP('%Y%m%d%H%M%S', '20220215175959','America/Los_Angeles')) AS created_time
)
)
) AS ATTRIBUTES;
When I execute this, I'm getting the following error:
Failed to coerce output value "2022-02-16 01:59:59+00" to type TIMESTAMP
I feel this is quite strange, since BigQuery should be able to interpret it correctly and I haven't had this issue with any other datatypes. Also, if I do:
SELECT TIMESTAMP("2022-02-16 01:59:59+00")
It returns:
2022-02-16 01:59:59 UTC
So BigQuery can indeed parse it correctly. I'm not sure why it doesn't happen for the UDF. On searching the internet, I found this question and as the answer suggests, if I change the return statement to:
return Date(res.created_time);
It resolves the issue. But for a project of mine, doing it for every timestamp is not feasible due to the high number of struct columns.
So, I wanted to know if someone has a better alternative to it?
PS : I have removed a lot of non-essential parts from the above example, so this might look a bit abstract. Also, the actual use-case is a bit different and complex that's why I need that JS udf.
The best way to do what you want is to implement the following code.
return Date(res.created_time);
This happens when you pass a TIMESTAMP to a UDF, it is represented as a DATE object, as stated in the documentation. This is like a return of a TIMESTAMP from a JavaScript UDF, where you need to construct and return a DATE object.

Parsing a SQL spatial column in Python

I am struggling a bit as I am new to programming. I am currently writing a python script and I am a bit stuck. The goal is to parse some spatial information the gets pulled from SQL to a format that is usable for my py script down the line.
I was able to CAST through a SQL query and fetchall using the obdc module. However once I fetch the data that is where it gets trick for me. Here is an example of a print from the fetchall:
[(u'POLYGON ((7014.186279296875 6602.99658203125 1612.5, 7015.984375 6600.416015625 1612.5))',), (u'POLYGON ((6730.962646484375 6715.2490234375 1522.5, 6730.0869140625 6714.13916015625 1522.5))',)]
I am not exactly sure what I am getting here it is like a list of tuples. which I have tried converting to a list of list, but there must be something I am missing.
Here is the usable format I am looking for:
[[7014.186279296875, 6602.99658203125, 1612.5], [7015.984375, 6600.416015625, 1612.5]]
[[6730.962646484375, 6715.2490234375, 1522.5], [6730.0869140625, 6714.13916015625, 1522.5]]
Any ideas of how I can accomplish this? Maybe there is a better way to CAST in SQL or a module in python that would be easier to use instead of just doing a cursor.fetchall() and parsing? Or any any parsing help would be useful. Thanks.
If you want to do parsing, that should be straight forward. For example you've provided next code would do the thing:
result = []
for element in data:
single_elements = element[0][10:-2].split(', ')
for se in single_elements:
row = str(se).split(' ')
result.append([float(a) for a in row])
Result will contain what you need. If parsing is not an option, then paste some of your code so I can see how you're fetching data.

SSRS if field value in list

I've looked through a number of tutorials and asks, and haven't found a working solution to my problem.
Suppose my dataset has two columns: sort_order and field_value. sort_order is an integer and field_value is a numerical (10,2).
I want to format some rows as #,#0 and others as #,#0.00.
Normally I would just do
iif( fields!sort_order.value = 1 or fields!sort_order.value = 23 or .....
unfortunately, the list is fairly long.
I'd like to do the equivalent of if fields!sort_order.value in (1,2,21,63,78,...) then...)
As recommended in another post, I tried the following (if sort in list, then just output a 0, else a 1. this is just to test the functionality of the IN operator):
=iif( fields!sort_order.Value IN split("1,2,3,4,5,6,8,10,11,15,16,17,18,19,20,21,26,30,31,33,34,36,37,38,41,42,44,45,46,49,50,52,53,54,57,58,59,62,63,64,67,68,70,71,75,76,77,80,81,82,92,98,99,113,115,116,120,122,123,127,130,134,136,137,143,144,146,147,148,149,154,155,156,157,162,163,164,165,170,171,172,173,183,184,185,186,192,193,194,195,201,202,203,204,210,211,212,213,263",","),0,1)
However, it doesn't look like the SSRS expression editor wants to accept the "IN" operator. Which is strange, because all the examples I've found that solve this problem use the IN operator.
Any advice?
Try using IndexOf function:
=IIF(Array.IndexOf(split("1,2,3,4,...",","),fields!sort_order.Value)>-1,0,1)
Note all values must be inside quotations.
Consider the recommendation of #Jakub, I recommend this solution if
your are feeding your report via SP and you can't touch it.
Let me know if this helps.

xslt 1.0 yyyy-MM-ddHH:mm:ss to YYYY-MM-DDTHH:mm:ss.sssZ

I'm using xslt 1.0 and trying to convert my date from yyyy-MM-ddHH:mm:ss to YYYY-MM-DDTHH:mm:ss.sssZ
Input sample
<StartDate>2014-07-2217:18:15</StartDate>
Required output format <startDate>2014-07-22T17:18:15.899+12:00</startDate> in NZST timezone
I tried with sample as in http://wiki.apache.org/cocoon/Tips/JavaInXslt without success as I'm getting the error that "The function 'sdf:new' was not defined". Also looked at EXSLT extensions for processing dates and times, but EXSLT don't have the function for timezone.
Kindly advice on how can I convert date from yyyy-MM-ddHH:mm:ss to YYYY-MM-DDTHH:mm:ss.sssZ other than using the concat "+12:00" to the end of startDate value?
Thanks in advance!
how can I convert date from yyyy-MM-ddHH:mm:ss to
YYYY-MM-DDTHH:mm:ss.sssZ other than using the concat "+12:00" to the
end of startDate value?
There is no other way*. XSLT 1.0 has no concept of dates; your data is a meaningless text string, and needs to be manipulated as such.
--
(*) Unless you create your own way by extending your processor capabilities with a user-defined function, as you have tried. But that 's a different question - and IMHO a solution using string functions is trivial and satisfactory.

Pig - How to cast datetime to chararray

I'm using CurrentTime(), which is a datetime data type. However, I need it as a chararray. I have the following:
A = LOAD ...
B = FOREACH A GENERATE CurrentTime() AS todaysDate;
I've tried various approaches, such as the following:
B = FOREACH A GENERATE (chararray)CurrentTime() AS todaysDate;
However, I always get ERROR 1052: Cannot cast datetime to chararray.
Anyone know how I can do this? By the way, I'm very new to pig. Thanks in advance!
I had a similar issue and I didn't want to use a custom UDF as described in the other answer. I am pretty new with Pig but it seems a pretty basic operation to justify the need of an UDF. This command works great for me:
B = FOREACH A GENERATE ToString(yourdatetimeobject, 'yyyy-MM-dd\'T\'HH:mm:ssz') AS yourfieldname;
You can select the format you want by looking at the SimpleDateFormat javadoc
You need to create a custom UDF which does the conversion
(e.g: see CurrentTime() implementation). Alternatively you may check out my answer on a similar topic for workarounds.
If you are on AWS, then use their DATE_TIME UDF.