PIG filter by string - apache-pig

I'm trying to filter by a string but it fails. How do I filter by String? I tried looking here http://pig.apache.org/docs/r0.10.0/basic.html#comparison but matches only works for regular expressions.
f = FILTER finished_set by item is not matches '.*0000000000000.*';
ERROR org.apache.pig.PigServer - exception during parsing: Error
during parsing. mismatched
input 'matches' expecting NULL

f = FILTER finished_set BY NOT(item MATCHES '.*0000000000000.*');

Related

Datatable.Select() cannot filter a value contains "#"

A Database has a data like these
OVS0001#R1
OVS0001#R2
OVS0001#R3
I try to filter by using Select() function .
Datatable.Select(" ITEM_CODE='OVS0001#R1' ")
The error message shows like these
The expression contains invalid date constant '#R1"'.

Using REPLACE_REGEXPR in BW transformation throws syntax error

I'm trying to implement a routine for replacing some invalid characters in a BW transformation. But I keep getting a syntax error. This is my current code:
METHOD S0001_G01_R40 BY DATABASE PROCEDURE FOR HDB LANGUAGE SQLSCRIPT
OPTIONS READ-ONLY.
-- target field: 0POSTXT
-- Note the _M class are not considered for DTP execution.
-- AMDP Breakpoints must be set in the _A class instead.
outTab = SELECT REPLACE_REGEXPR('([^[:print:]|^[\x{00C0}-\x{017F}]|[#])'
IN "SGTXT" WITH '' OCCURRENCE ALL ) AS "/BI0/OIPOSTXT"
FROM :inTab;
errorTab = SELECT '' AS ERROR_TEXT,
'' AS SQL__PROCEDURE__SOURCE__RECORD FROM DUMMY
WHERE DUMMY <> 'X';
ENDMETHOD.
I keep getting the following error:
SQLSCRIPT message: return type mismatch: Procedure
/BIC/QCW72C4IJDC8JAFRICAU_M=>S0001_G01_R40: OUTTAB[ /BI0/OIPOSTXT:NVARCHAR(5000) ]
!= expected result [ POSTXT:NVARCHAR(60) RECORD:NVARCHAR(56)
SQL__PROCEDURE__SOURCE__RECORD:NVARCHAR(56) ]
Can anyone give me an idea of what I'm doing wrong here?
For those wondering how to correct this problem, here is the solution.
Everything is in the error message:
OUTTAB[ /BI0/OIPOSTXT:NVARCHAR(5000) ]
!= expected result [ POSTXT:NVARCHAR(60) RECORD:NVARCHAR(56)
SQL__PROCEDURE__SOURCE__RECORD:NVARCHAR(56) ]
It means the result table OutTab contains only one field (/BI0/OIPOSTXT) and so is different by the OutTab expected which should contain 3 fields POSTXT, RECORD and SQL__PROCEDURE__SOURCE__RECORD.
The expected structure can usually be seen on top of the public section:
types:
begin of TN_S_IN_S0001_G01_R1_1,
POSTXT type C length 60,
RECORD type C length 56,
SQL__PROCEDURE__SOURCE__RECORD type C length 56,
end of TN_S_IN_S0001_G01_R1_1 .
So the correct syntax would be:
outTab =
SELECT CAST(REPLACE_REGEXPR('([^[:print:]|^[\x{00C0}-\x{017F}]|[#])' IN "SGTXT" WITH '' OCCURRENCE ALL) AS NVARCHAR(60)) AS "POSTXT"
,"RECORD" AS "RECORD"
,SQL__PROCEDURE__SOURCE__RECORD AS "SQL__PROCEDURE__SOURCE__RECORD"
FROM :inTab;
Regards,
Jean-Guillaume
You might want to enclose the regex expression in a CAST( ... AS NVARCHAR(60)) to ensure that the resulting record structure matches the expected return type.

query using string in PyTables 3

I have a table:
h5file=open_file("ex.h5", "w")
class ex(IsDescription):
A=StringCol(5, pos=0)
B=StringCol(5, pos=1)
C=StringCol(5, pos=2)
table=h5file.create_table('/', 'table', ex, "Passing string as column name")
table=h5file.root.table
rows=[
('abc', 'bcd', 'dse'),
('der', 'fre', 'swr'),
('xsd', 'weq', 'rty')
]
table.append(rows)
table.flush()
I am trying to query as per below:
find='swr'
creteria='B'
if creteria=='B':
condition='B'
else:
condition='C'
value=[x['A'] for x in table.where("""condition==find""")]
print(value)
It returns:
ValueError: there are no columns taking part in condition condition==find
Is there a way to use condition as a column name in above query?
Thanks in advance.
Yes, you can use Pytables .where() to search based on a condition. The problem is how you constructed your query for the table.where(condition). See Note about strings under Table.where() in the Pytables Users Guide:
A special care should be taken when the query condition includes string literals. ... Python 3 strings are unicode objects.
in Python 3, “condition” should be defined like this:
condition = 'col1 == b"AAAA"'
The reason is that in Python 3 “condition” implies a comparison between a string of bytes (“col1” contents) and an unicode literal (“AAAA”).
The simplest form of your query is shown below. It returns a subset of rows that match the condition. Note use of single and double quotes for string and unicode:
query_table = table.where('C=="swr"') # search in column C
I rewrote your example as best I could. See below. It shows several ways to enter the condition. I'm not smart enough to figure out how to combine your creteria and find variables into a single condition variable with string and unicode characters.
from tables import *
class ex(IsDescription):
A=StringCol(5, pos=0)
B=StringCol(5, pos=1)
C=StringCol(5, pos=2)
h5file=open_file("ex.h5", "w")
table=h5file.create_table('/', 'table', ex, "Passing string as column name")
## table=h5file.root.table
rows=[
('abc', 'bcd', 'dse'),
('der', 'fre', 'swr'),
('xsd', 'weq', 'rty')
]
table.append(rows)
table.flush()
find='swr'
query_table = table.where('C==find')
for row in query_table :
print (row)
print (row['A'], row['B'], row['C'])
value=[x['A'] for x in table.where('C == "swr"')]
print(value)
value=[x['A'] for x in table.where('C == find')]
print(value)
h5file.close()
Output shown below:
/table.row (Row), pointing to row #1
b'der' b'fre' b'swr'
[b'der']
[b'der']

QueryDSL like operation SimplePath

Similarly to this question I would like to perform an SQL "like" operation using my own user defined type called "AccountNumber".
The QueryDSL Entity class the field which defines the column looks like this:
public final SimplePath<com.myorg.types.AccountNumber> accountNumber;
I have tried the following code to achieve a "like" operation in SQL but get an error when the types are compared before the query is run:
final Path path=QBusinessEvent.businessEvent.accountNumber;
final Expression<AccountNumber> constant = Expressions.constant(AccountNumber.valueOfWithWildcard(pRegion.toString()));
final BooleanExpression booleanOperation = Expressions.booleanOperation(Ops.STARTS_WITH, path, constant);
expressionBuilder.and(booleanOperation);
The error is:
org.springframework.dao.InvalidDataAccessApiUsageException: Parameter value [7!%%] did not match expected type [com.myorg.types.AccountNumber (n/a)]
Has anyone ever been able to achieve this using QueryDSL/JPA combination?
Did you try using a String constant instead?
Path<?> path = QBusinessEvent.businessEvent.accountNumber;
Expression<String> constant = Expressions.constant(pRegion.toString());
Predicate predicate = Expressions.predicate(Ops.STARTS_WITH, path, constant);
In the end, I was given a tip by my colleague to do the following:
if (pRegion != null) {
expressionBuilder.and(Expressions.booleanTemplate("{0} like concat({1}, '%')", qBusinessEvent.accountNumber, pRegion));
}
This seems to do the trick!
It seems like there is bug/ambiguity. In my case, I need to search by couple fields with different types (String, Number), e.g. SQL looks like:
SELECT * FROM table AS t WHERE t.name = "%some%" OR t.id = "%some%";
My code looks like:
BooleanBuilder where = _getDefaultPredicateBuilder();
BooleanBuilder whereLike = new BooleanBuilder();
for(String likeField: _likeFields){
whereLike = whereLike.or(_pathBuilder.getString(likeField).contains(likeValue));
}
where.and(whereLike);
If first _likeFields is type of String - request works fine, otherwise it throws Exception.

Group by expression in pig

Consider I have a dataset with tuples (f1, f2). I want to get my data in two bags: one where fi is null and the other where f1 values are not null. I try:
raw = LOAD 'somedata' USING PigStorage() AS (f1:chararray, f2:chararray);
raw_group = GROUP raw BY f1 is null;
raw_count = FOREACH raw_group GENERATE group, COUNT_STAR(raw);
I expect to get two groups with keys true and false. When I run it in grunt I get the following:
2013-12-26 14:56:10,958 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: <line 1046, column 25> Syntax error, unexpected symbol at or near 'f1'
I can do a workaround:
raw_group = GROUP raw BY (f1 is null)?0:1;
, but I really like to understand what's going on here, as I just started to learn Pig. According to Pig documentation I can use expressions as a grouping key. Do I miss something here or nulls are treated differently in Pig?
The boolean datatype was introduced in Pig 0.10. The expression f1 is null is a boolean, so it can't appear as a field in a relation, which it would do if it were the value of group. Prior to Pig 0.10, booleans could only be used in FILTER statements or in the ternary operator, as you showed in your workaround.
While I haven't tried this out, presumably if you were to attempt the same thing in Pig 0.10 or later, your original attempt would succeed.