PDI Kettle Parameter in RegEx - pentaho

I'm new in pentaho's world and I wanted to use a parameter to use my regex.
It's working when I try my regex with the field Test regEx of a step Regex Evaluation.
But it's not working when I'm trying to run the kettle.
My RegEx is .*(?i)${TOOL_NAME}.*
I'm getting this from logs :
ERROR (version 5.0.6, build 1 from 2014-04-26_17-32-54 by buildguy) : Step [test regex.0] failed to initialize!
I'm already using parameters for other fields and it's working fine.
Anthony

The use of system variables in the regex component requires the setting use variable substitution to be activated. Otherwise the string ${PARAM} will be interpreted literally, most likely resulting in an invalid regular expression which can not be compiled.

Related

Django __iregex crashing for regular expression ^(\\. \\.)$

When I try to make an __iregex call using the regular expression '^(\\. \\.)$' I get:
DataError: invalid regular expression: parentheses () not balanced
I am using PSQL backend so the django documentation states that the equivalent SQL command should be
SELECT ... WHERE title ~* '^(\\. \\.)$';
When I run this query manually through the PSQL command line it works fine. Is there some bug with Django that I don't know about that is causing this to crash?
Edit: Also, it fails for variations of this regular expression, for example
'^(S\\. \\.)$'
'^(\\. S\\.)$'
'^(\\. \\.S)$'
The solution is to replace all " " characters with \s before sending the regexp into __iregex.

BigQuery - Illegal Escape Sequence

I'm having an issue matching regular expression in BigQuery. I have the following line of code that tries to identify user agents:
when regexp_contains((cs_user_agent), '^AppleCoreMedia\/1\.(.*)iPod') then "iOS App - iPod"
However, BigQuery doesn't seem to like escape sequences for some reason and I get this error that I can't figure out:
Syntax error: Illegal escape sequence: \/ at [4:63]
This code works fine in a regex validator I use, but BigQuery is unhappy with it and I can't figure out why. Thanks in advance for the help
Use regexp_contains((cs_user_agent), r'^AppleCoreMedia\/1\.(.*)iPod')

awk pattern to match an XML PI at the start of a line

I have an XML document containing a number of XML Processing Instructions which are of the form:
<?cpdoc something?>
I am trying to match them in awk with the pattern
/^\<\?cpdoc/
but it's not returning anything. If I remove the ^ anchor, it works (but I have other similar PIs which don't start a line which I don't want matched).
It looks as if it's being confused by the \<\? but why is it ignoring the line-start anchor?
Don't parse XML with regex, use a proper XML/HTML parser.
theory :
According to the compiling theory, XML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.
realLife©®™ everyday tool in a shell :
You can use one of the following :
xmllint
xmlstarlet
saxon-lint (my own project)
Check: Using regular expressions with HTML tags
Example using xpath :
xmllint --xpath '//processing-instruction()' file.xml
Solution by OP and explanation by Ed Morton.
It works if the less-than is not escaped, as otherwise it's a word boundary. So instead of:
\<\?
I should use literal:
<\?
This is because we can't just go escaping any character and hoping for the best, we have to know which characters are metacharacters and then escape them if we want them treated as literal.

how insert ODI step error message in to Oracle table, if the error message has single quotes and colons

I'm trying to insert ODI step error message into oracle table.
I captured the error message using <%=odiRef.getPrevStepLog("MESSAGE")%>.
ODI-1226: Step PRC_POA_XML_synchronize fails after 1 attempt(s).
ODI-1232: Procedure PRC_POA_XML_synchronize execution fails.
ODI-1227: Task PRC_POA_XML_synchronize (Procedure) fails on the source XML connection XML_PFIZER_LOAD_POA_DB_DEV.
Caused By: java.sql.SQLException: class java.sql.SQLException
oracle.xml.parser.v2.XMLParseException: End tag does not match start tag 'tns3:ContctID'.
at com.sunopsis.jdbc.driver.xml.SnpsXmlFile.readDocument(SnpsXmlFile.java:459)
at com.sunopsis.jdbc.driver.xml.SnpsXmlFile.readDocument(SnpsXmlFile.java:469)
When I try to insert this into a table, I'm getting the following error:
Missing IN or OUT parameter at index:: 1
I tried with substr, replace. Nothing works as in middle of the error message we have a single quotes 'tns3:ContctID'.
Is there any way to insert this into a table?
that's a tough one if you want to use pure java BeanShell and you've given way too little details to get short and straight answer, like
how do you try to insert this (command on source/target, bean shell only, Oracle SQL +jBS, jython, groovy etc...)
The problem here is not only quotes but also newlines.
To replace them is even more difficult as every parsing step <%, <?, <# requires different trick to define those literals
What will work for sure is if you write Jython task for inserting log data (Jython in technology).
There you may use Python ability for multiline string literals
simply:
⋮
err_log = """
<?=odiRef.getPrevStepLog("MESSAGE")?>
"""
⋮
I faced this error few days back . I applied below mentioned solution in ODI ...
Use - q'#<%=odiRef.getPrevStepLog("MESSAGE")%>#'
This will escape inverted comma (') for INSERT statement.
I have used this in my code and it is working fine :)
For example -
select 'testing'abcd' from dual;
this query will give below error
"ORA-01756: quoted string not properly terminated"
select q'#testing'abcd#' from dual;
This query gives no error and we get below response in SQL Developer
testing'abcd

SSIS pkg does not recognize the file path

I'm using the 2012 version of Visual Studio to build an SSIS package. I have a variable var_root which has the string value - 'C:\Projects\OBC\Clients\ABC'. When I try to run the pkg, I get the following error:
Error: The expression contains unrecognized token "C". If "C" is a variable, it should be expressed as "#C". The specified token is not valid. If the token is intended to be a variable name, it should be prefixed with the # symbol.
Error: Attempt to parse the expression "C:\Projects\OBC\Clients\ABC" failed and returned error code 0xC00470A4. The expression cannot be parsed. It might contain invalid elements or it might not be well-formed. There may also be an out-of-memory error.
Now, this runs fine in the 2008 version of the Business Intelligence studio. I don't know how to specify the variable name. Please help me if possible. Thanks
The SSIS expression language is a C based language and the \ is a token, this means you have to escape it with another one. i.e "\" becomes "\", unlike C# you can't prefix the string with a #, you have to use the escaping route.
In summary when ever you want to use \ you need to use two \
Why use the expression though when you can set the value directly in the values column for the variable - without the quotes or double slashes - Just - C:\Projects\OBC\Clients\ABC