Apache Pig, including endpoint for DaysBetween - apache-pig

How can I verify if the DaysBetween() function in pig has considered the ending point or not? For example, if I try
grunt> DaysBetween(ToDate(1994-12-04, 'yyyy-mm-dd'), ToDate(1994-12-04, 'yyyy-mm-dd'));
I want to see if it returns me a 1 or 0.
However, it gives an error message of
mismatched input '-' expecting RIGHT_PAREN
So what does this message mean? Does the function consider ending point or not; or is there an built-in option for that?
The same question also applies to the function of SubtractDuration().

You have a syntax error in your statement - you need to surround both instances of 1994-12-04 in quotes (so '1994-12-04').

Related

Extract characters between a string and the first occurrence of something in BigQuery

I want to extract a set of characters between "u1=" and the first semi-colon using a regex. For instance, given the following string: id=1w54;name=nick;u1=blue;u2=male;u3=ohio;u5=
The desired regex output should be just blue.
I tested (?<=u1=)[^;]* on https://regex101.com and it works. However, when I run this in BigQuery, using regexp_extract(string, '(?<=u1=)[^;]*') , I get an error that reads "Cannot parse regular expression: invalid perl operator: (?<"
I'm confused why this isn't working in BQ. Any help would be appreciated.
You can use regexp_extract() like this:
regexp_extract(string, 'u1=([^;]+)')

Result exceeded maximum length error

The below OREPLACE query is throwing the error.
Select cast( OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC') as varchar(40000)) as repl
from SimpleDef0;
The return string in the OREPLACE function is set to max of 64000. When I checked the length of column SimpledefinitionQuery, it does not exceed 16000. So I am unable to find why I am getting the error.
Also when I replace 'gpi' with 'gpiRPLC', the query works perfectly. What is going wrong here?
Thanks
According to this Teradata support page, when using OREPLACE the returned string also depends on the second and the third arguments
OREPLACE (SimpledefinitionQuery , 'gpi','gpiREPLC')
OREPLACE function implicitly converts source string(first argument) to UNICODE when second or third argument is literal(UNICODE) even if the source string is LATIN.
Thus maybe check if the function works if you truncate SimpledefinitionQuery for the first 8000 characters (as suggested in #dnoeth comment it returns Unicode VARCHAR(8000))? Or change the literal type of 2nd and 3rd arguments to Latin as well.

PostgreSQL RETURNING fails with REGEXP_REPLACE

I'm running PostgreSQL 9.4 and are inserting a lot of records into my database. I use the RETURNING clause for further use after an insert.
When I simply run:
... RETURNING my_car, brand, color, contact
everything works, but if I try to use REGEXP_REPLACE it fails:
... RETURNing my_car, brand, color, REGEXP_REPLACE(contact, '^(\+?|00)', '') AS contact
it fails with:
ERROR: invalid regular expression: quantifier operand invalid
If I simply run the query directly in PostgreSQL it does work and return a nice output.
Tried to reproduce and failed:
t=# create table s1(t text);
CREATE TABLE
t=# insert into s1 values ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)', '');
regexp_replace
----------------
4422848566
(1 row)
INSERT 0 1
So elaborated #pozs suggested reason:
set standard_conforming_strings to off;
leads to
WARNING: nonstandard use of escape in a string literal
LINE 1: ...alues ('+4422848566') returning REGEXP_REPLACE(t, '^(\+?|00)...
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid regular expression: quantifier operand invalid
update
As OP author says standard_conforming_strings is on as supposed from 9.1 by default working with psql and is off working with pg-prommise
update from vitaly-t
The issue is simply with the JavaScript literal escaping, not with the
flag.
He elaborates further in his answer
The current value of environment variable standard_conforming_strings is inconsequential here. You can see it if you prefix your query with SET standard_conforming_strings = true;, which will change nothing.
Passing in a regEx string unescaped from the client is the same as using E prefix from the command line: E'^(\+?|00)'.
In JavaScript \ is treated as a special symbol, and you simply always have to provide \\ to indicate the symbol, which is what needed for your regular expressions.
Other than that, pg-promise will escape everything correctly, here's an example:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, $1, $2)", ['^(\\+?|00)', 'replaced'])
To understand how the command-line works, prefix the regex string with E:
db.any("INSERT INTO users(name) VALUES('hello') RETURNING REGEXP_REPLACE(name, E$1, $2)", ['^(\\+?|00)', 'replaced'])
And you will get the same error: invalid regular expression: quantifier operand invalid.

how insert ODI step error message in to Oracle table, if the error message has single quotes and colons

I'm trying to insert ODI step error message into oracle table.
I captured the error message using <%=odiRef.getPrevStepLog("MESSAGE")%>.
ODI-1226: Step PRC_POA_XML_synchronize fails after 1 attempt(s).
ODI-1232: Procedure PRC_POA_XML_synchronize execution fails.
ODI-1227: Task PRC_POA_XML_synchronize (Procedure) fails on the source XML connection XML_PFIZER_LOAD_POA_DB_DEV.
Caused By: java.sql.SQLException: class java.sql.SQLException
oracle.xml.parser.v2.XMLParseException: End tag does not match start tag 'tns3:ContctID'.
at com.sunopsis.jdbc.driver.xml.SnpsXmlFile.readDocument(SnpsXmlFile.java:459)
at com.sunopsis.jdbc.driver.xml.SnpsXmlFile.readDocument(SnpsXmlFile.java:469)
When I try to insert this into a table, I'm getting the following error:
Missing IN or OUT parameter at index:: 1
I tried with substr, replace. Nothing works as in middle of the error message we have a single quotes 'tns3:ContctID'.
Is there any way to insert this into a table?
that's a tough one if you want to use pure java BeanShell and you've given way too little details to get short and straight answer, like
how do you try to insert this (command on source/target, bean shell only, Oracle SQL +jBS, jython, groovy etc...)
The problem here is not only quotes but also newlines.
To replace them is even more difficult as every parsing step <%, <?, <# requires different trick to define those literals
What will work for sure is if you write Jython task for inserting log data (Jython in technology).
There you may use Python ability for multiline string literals
simply:
⋮
err_log = """
<?=odiRef.getPrevStepLog("MESSAGE")?>
"""
⋮
I faced this error few days back . I applied below mentioned solution in ODI ...
Use - q'#<%=odiRef.getPrevStepLog("MESSAGE")%>#'
This will escape inverted comma (') for INSERT statement.
I have used this in my code and it is working fine :)
For example -
select 'testing'abcd' from dual;
this query will give below error
"ORA-01756: quoted string not properly terminated"
select q'#testing'abcd#' from dual;
This query gives no error and we get below response in SQL Developer
testing'abcd

regexp_matches() returns two matches for $ (end of string)

Can somebody explain this odd behavior of regexp_matches() in PostgreSQL 9.2.4 (same result in 9.1.9):
db=# SELECT regexp_matches('test string', '$') AS end_of_string;
end_of_string
---------------
{""}
(1 row)
db=# SELECT regexp_matches('test string', '$', 'g') AS end_of_string;
end_of_string
---------------
{""}
{""}
(2 rows)
-> SQLfiddle demo.
The second parameter is a regular expression. $ marks the end of the string.
The third parameter is for flags. g is for "globally", meaning the the function doesn't stop at the first match.
The function seems to report the end of the string twice with the g flag, but that can only exist once per definition. It breaks my query. :(
Am I missing something?
I would need my query to return one more row at the end, for any possible string. I expected this query to do the job, but it adds two rows:
SELECT (regexp_matches('test & foo/bar', '(&|/|$)', 'ig'))[1] AS delim
I know how to manually add a row, but I want to let the function take care of it.
It looks like it was a bug in PostgreSQL. I verified for sure it is fixed in 9.3.8. Looking at the release notes, I see possible references in:
9.3.4
Allow regular-expression operators to be terminated early by query
cancel requests (Tom Lane)
This prevents scenarios wherein a pathological regular expression
could lock up a server process uninterruptably for a long time.
9.3.6
Fix incorrect search for shortest-first regular expression matches
(Tom Lane)
Matching would often fail when the number of allowed iterations is
limited by a ? quantifier or a bound expression.
Thanks to Erwin for narrowing it down to 9.3.x.
I am not sure about what I am going to say because I don't use PostgreSQL so this is just me thinking out loud.
Since you are trying to match the end of string/line $, then in the first situation the outcome is expected, but when you turn on global match modifier g and because matching the end of line character doesn't actually consume or read any characters from the input string then the next match attempt will start where the first one left off, that is at the end of string and this will cause an infinite loop if it kept going like that so PostgreSQL engine might be able to detect this and stop it to prevent a crash or an infinite loop.
I tested the same expression in RegexBuddy with POSIX ERE flavor and it caused the program to become unresponsive and crash and this is the reason for my reasoning.
the same occurs for example in C# in which I had the same problem recently so I think this is a normal behaviour for regexps
this is because $ doesn't stand for a specific sign but a specific position instead
so $ doesn't really match anything and the position of parser stays in the same position
you need to change your convention a little;
to test for an empty string you can use ^$
This was a bug that has been fixed in Postgres 9.3. See accepted answer.
For Postgres 9.2 or older: A halfway decent workaround for my situation would be to use the expression .$ instead - matches for any string once at the last character:
WITH x(id, t) AS (
VALUES
(1, 'test & foo/bar')
,(2, 'test')
,(3, '') -- empty string
,(4, 'test & foo/') -- other branch as last character
)
SELECT id, (regexp_matches(t, '(&|/|.$)', 'ig'))[1] AS delim
FROM x;
But it fails for empty strings.
And it fails if the last character happens to match another branch. Like: 'foo/bar/'.
And it isn't perfect to have the actual final character returned. An empty string would be much preferable.
-> SQLfiddle.