Convert yyyymmdd chararray to yyyy/mm/dd date format - apache-pig

I want to do a filter between the DAT_015_X column and a fixed date '01/01/2019'
I have loaded the DAT_015_X as a chararray column
Then
Data_2 = FOREACH Data GENERATE
ToString(DAT_015_X,'yyyy/MM/dd) AS mydate;
Then
Data_3 = FILTER Data_2 BY ((ToDate(mydate,'yyyy/MM/dd) > ToDate('01/01/2019','yyyy/MM/dd));
My Origin Data look like :
99991231
20200605
20190605
20200605
And this a part of the global data
MGM_COMPTEUR;CIA_CD_CRV_CIA;CIA_DA_EM_CRV;CIA_CD_CTRL_BLCE;CIA_IDC_EXTR_RDJ;CIA_VLR_IDT_CRV_LOQ;CIA_VLR_REF_CRV;CIA_NO_SEQ_CRV;CIA_VLR_LG_ZON_RTG;CIA_HEU_CIA;CIA_TM_STP_CRE;CIA_CD_SI;CIA_VLR_1;CIA_DA_ARR_FIC;CIA_TY_ENR;CIA_CD_BTE;CIA_CD_PER;CIA_CD_EFS;CIA_CD_ETA_VAL_CRV;CIA_CD_EVE_CPR;CIA_CD_APLI_TDU;CIA_CD_STE_RTG;CIA_DA_TT_RTG;CIA_NO_ENR_RTG;CIA_DA_VAL_EVE;PSE_001;STR_002;STR_003;CPR_006_VLR;CPR_006_DCM;CPR_006_DVS;CPR_008_VLR;CPR_008_DCM;CPR_008_DVS;CPR_009_VLR;CPR_009_DCM;CPR_009_DVS;CPR_059_VLR;CPR_059_DCM;CPR_059_DVS;CPR_060_VLR;CPR_060_DCM;CPR_060_DVS;RUB_205;RUB_216;DAT_015_X;NB_005_VLR;NB_005_DCM;NB_007_VLR;NB_007_DCM;NB_012_VLR;NB_012_DCM;EUR_061_VLR;EUR_061_DCM;EUR_061_CD_DVS;EUR_062_VLR;EUR_062_DCM;EUR_062_CD_DVS
00000000000000000000;22002;20190731;9;9; ;22002 0000000001;0000000001;ZZZZZZZZZZZZZZZZZZZZ; ;2019-07-31-18.03.27.880010;002;00000000000000000001;20190731; ;2200;M;02;V;00001; ; ;ZZZZZZZZ;ZZZZZZZZZZZZZZZZZZZZ;20081112;50421451;065000;060100;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;3 ;40 ;99991231;+00000000000000000;00;+00000000000000000;00;+00000000000000000;00;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;
00000000000000000001;22002;20190731;9;9; ;22002 0000000002;0000000002;ZZZZZZZZZZZZZZZZZZZZ; ;2019-07-31-18.03.27.880010;002;00000000000000000001;20190731; ;2200;M;02;V;00001; ; ;ZZZZZZZZ;ZZZZZZZZZZZZZZZZZZZZ;20081112;52289527;065000;060100;+00000000000000000;02;EUR ;+00000000003000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;1 ;40 ;20200605;+00000000000000000;00;+00000000000000000;00;+00000000000000000;00;+00000000003000000;02;EUR ;+00000000000000000;02;EUR ;
00000000000000000002;22002;20190731;9;9; ;22002 0000000003;0000000003;ZZZZZZZZZZZZZZZZZZZZ; ;2019-07-31-18.03.27.880010;002;00000000000000000001;20190731; ;2200;M;02;V;00001; ; ;ZZZZZZZZ;ZZZZZZZZZZZZZZZZZZZZ;20081112;52439938;065000;060100;+00000000000000000;02;EUR ;+00000000001000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;+00000000000000000;02;EUR ;1 ;40 ;20190605;+00000000000000000;00;+00000000000000000;00;+00000000000000000;00;+00000000001000000;02;EUR ;+00000000000000000;02;EUR ;
But this return this message error
Could not infer the matching function for
org.apache.pig.builtin.ToString as multiple or none of them fit.
Please use an explicit cast.
How can I resolve this problem please ?

Note that you are missing end quote i.e. ' in the format 'yyyy/MM/dd.
Assuming your DAT_015_X is in yyyyMMdd format use this
Data_2 = FILTER Data BY ((ToDate(DAT_015_X,'yyyyMMdd') > ToDate('20190101','yyyyMMdd'));

The solution for that issue is very simple.
In fact I declared the variable DAT_015_X as long data instead of chararray.
Then, I create the filter as :
Data_2 = FILTER Data BY (DAT_015_X > 20190101);
And it gives the needed result.

Related

QlikView Convert 1753-01-01 00:00:00.000 to NULL

I am trying to convert data from SQL as 1753-01-01 00:00:00.000 to be shown as NULL values in QlikView.
I do the following in the QlikView Load statements -
SET NullTimeStamp = if ($1 = '1753-01-01 00:00:00', null(), $1);
Then use it when LOAD:
LOAD
$(NullTimeStamp(YourDateField1)) AS YOURDATEFIELD1,
$(NullTimeStamp(YourDateField2)) AS YOURDATEFIELD2,
$(NullTimeStamp(YourDateField3)) AS YOURDATEFIELD3
However, I have many fields with Time and Dates in my tables so I was wondering if there is a more elegant way of solving this issue?
Ive done something similar in the past. The idea is to generate part of the load script in a variable and then use this variable as part of the next load script
DummyData:
Load * Inline [
Something1 , Something2, Something3, Something4, Something5
1753-01-01 00:00:00.000, 2, 3, 4, 1753-01-01 00:00:00.000
];
SET NullTimeStamp = if ($1 = '1753-01-01 00:00:00', null(), $1);
// Define a temp table
// that holds list of fields that have to be checked with NullTimeStamp
Fields:
Load * Inline [
FieldNames
Something1
Something2
Something3
Something4
Something5
];
let FieldsConcatenation = '';
// loop through the NullTimeStamp-ed fields
for a = 1 to NoOfRows('Fields')
let f = FieldValue('FieldNames', a);
// concatenate each iteration to form part of the RealLoad table script
let FieldsConcatenation = '$(FieldsConcatenation)' & '$(NullTimeStamp(' & '$(f)' & ')) as ' & Upper('$(f)') & ',' & chr(13);
next
// remove the last comma
let FieldsConcatenation = left('$(FieldsConcatenation)', Index('$(FieldsConcatenation)', ',' , -1) -1);
// we dont need this anymore
Drop Table Fields;
// add FieldsConcatenation variable as part of the load script
RealLoad:
Load
$(FieldsConcatenation),
'a' as LoadTheRestHere
Resident
DummyData;
// we dont need this anymore
Drop Table DummyData;
FieldsConcatenation variable will have the following content:
The original table:
And the final table:

Dynamic table type declaration

I need to write a FM where I will receive the data type of an element as a string parameter and I would like to declare it like:
DATA: lt_test TYPE TABLE OF (iv_data_type).
where the iv_data type whould be the received type.
You should create your internal table dynamically:
DATA lt_test type ref to data.
FIELD-SYMBOLS: <lts_test> type standard table.
CREATE DATA lt_test type (iv_data_type).
ASSIGN lt_test->* to <lts_test>.
CALL FUNCTION 'TEXT_CONVERT_CSV_TO_SAP'
EXPORTING
I_TAB_RAW_DATA = lt_raw_data
TABLES
I_TAB_CONVERTED_DATA = <lts_table>
EXCEPTIONS
CONVERSION_FAILED = 1
OTHERS = 2.
you can try the following
DATA : lo_struct_des TYPE REF TO cl_abap_structdescr,
lo_result_struct TYPE REF TO cl_abap_structdescr.
DATA: lo_new_tab TYPE REF TO cl_abap_tabledescr .
DATA: lt_struct_tab TYPE abap_component_tab.
DATA: tab TYPE REF TO data,
line TYPE REF TO data.
FIELD-SYMBOLS: <fs_data> TYPE ANY TABLE,
<fs_line> TYPE any.
lo_struct_des ?= cl_abap_typedescr=>describe_by_name( 'your_Structure_name_here' ).
lt_struct_tab = lo_struct_des->get_components( ) .
lo_result_struct = cl_abap_structdescr=>create( p_components = lt_struct_tab ) .
lo_new_tab = cl_abap_tabledescr=>create( p_line_type = lo_result_struct
p_table_kind = cl_abap_tabledescr=>tablekind_std
p_unique = abap_false ).
CREATE DATA tab TYPE HANDLE lo_new_tab.
CREATE DATA line TYPE HANDLE lo_result_struct .
ASSIGN tab->* TO <fs_data>.
ASSIGN line->* TO <fs_line> .

Regex to extract first part of string in Apache Pig

I need to extract post code district from the input data below
AB55 4
DD7 6LL
DD5 2HI
My Code
A = load 'data' as postcode:chararray;
B = foreach A {
code_district = REGEX_EXTRACT(postcode,'<SOME EXP>',1);
generate code_district;
};
dump B;
Output should look like
AB55
DD7
DD5
what should be the regular expression to extract the first part of the string?
Can you try the below Regex?
Option1:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'(\\w+).*',1);
DUMP code_district;
Option2:
A = LOAD 'input' as postcode:chararray;
code_district = FOREACH A GENERATE REGEX_EXTRACT(postcode,'([a-zA-Z0-9]+).*',1);
DUMP code_district;
Output:
(AB55)
(DD7)
(DD5)

Pig function to read characters after a separator

This is my input file
a1,hello.VDF
a2,rim.VIM
a3.dr.VDD
I need output as below
a1,VDF
a2,VIM
a3,VDD
My script is the following:
myinput = LOAD 'file' USING PigStorage(',')
AS(t1:chararray,t2:chararray); foreached= FOREACH myinput GENERATE
t1,SUBSTRING(t2,INDEXOF(t2,'.',1),SIZE(t2));
It's throwing some error. Please help
Try this:
output = foreach myinput generate ((t1 matches '(.*)\\.(.*)'?SUBSTRING(t1, 0, 2):t1), (t1 matches '(.*)\\.(.*)'?SUBSTRING(t1, INDEXOF(t1,'.',0)+1, (int)SIZE(t1)):t2));
SIZE returns long, but SUBSTRING takes integers, so you need to do conversion:
foreached =
FOREACH myinput GENERATE t1,SUBSTRING(t2,INDEXOF(t2,'.',1)+1,(int)SIZE(t2));

String to Date with convert

I would like a tip, please :
To convert a String to Date in velocity:
It is $convert.parseDate($currentMessage.date.begin)
I try too :
#set($str = $!currentMessage.date.begin)
$str
#set($dateTransforme = $date.toDate('yyyy-MM-dd', $date.date))
$dateTransforme
$dateTransforme.parseDate($str) <br />
N.B. $currentMessage.date.begin is a string.
I continue to have at the running :
$dateTransforme.parseDate($str)
Why? My string is in the format '2014-02-26'
Thanks,
Here is the working code with datetool
#set ($toDateRegDate= $dateTool.toDate('yyyy-MM-dd hh:mm:ss Z', $Video.Date.getData()))
#set ($videoDate = $dateTool.format('MMMM dd,yyyy', $toDateRegDate,$locale))
$videoDate
Finally, It's OK:
## tranformation of the data to dates : date.begin
#set($dateBegin = $date.toDate('yyyy-MM-dd',$!currentMessage.date.begin))
## tranformation of the data to dates : date.end
#set($dateEnd = $date.toDate('yyyy-MM-dd',$!currentMessage.date.end))
Ale
Convert a string to a date object
#set($dateObj = $date.toDate("dd/MM/yyyy", "08/09/2015"))
Format the date objet to "yyyy-MM-dd"
#set( $dateFormated = $date.format("yyyy-MM-dd", $dateObj))