Biq Query regex_replace error (\? vs \\?) - sql

I am having issues understanding what's wrong with this regex: \?.*
select REGEXP_REPLACE(longstringcolumn, '\?.*', '') as newstring from tablename
My example string aka 'longstring' has '?' character, and I am trying to match everything trailing '?' (including '?' itself).
I have checked my regexpr in online tools and my regex seems to be working.
Edit
Thanks guys for being so quick,
Here's a sample string (it's a url):
http://example.com/one/two/three?lang=en&region=CN
I am trying to strip off everything after '?'. So this part:
?lang=en&region=CN
This is the error I am being returned: Failed to parse regular expression "?": no argument for repetition operator: ?
I am really leaning towards this being a simple escape character issue but I can't figure it out somehow.

#standardSQL
SELECT REGEXP_REPLACE(longstringcolumn, '\\?.*', '') AS newstring
FROM tablename
or
#standardSQL
SELECT REGEXP_REPLACE(longstringcolumn, r'\?.*', '') AS newstring
FROM tablename
example below
#standardSQL
WITH tablename AS (
SELECT 'is this a question?abc ' AS longstringcolumn UNION ALL
SELECT 'this is not a question' union all
SELECT'http://example.com/one/two/three?lang=en&region=CN'
)
SELECT REGEXP_REPLACE(longstringcolumn, r'\?.*', '') AS newstring
FROM tablename
with result as (where ? and all trailing chars are removed)
Row newstring
1 is this a question
2 this is not a question
Hope this shows what was wrong with your original query

Related

Regex: how to get the text between a few colons?

So, i have a lot of strings like the ones below in my database:
product1:1stparty:single_aduls:android:
product2:3rdparty:married_adults:ios:
product3:3rdparty:other_adults:android:
I need a regex to get only the text after the product name and before the device category. So, in the first line I'd get 1stparty:single_aduls, in the second 3rdparty:married_adults and in the third 3rdparty:other_adults. I'm stuck and can't find a way to solve that. Could anyone help me please?
As a regular expression, you can use:
select regexp_extract('product1:1stparty:single_aduls:android:', '^[^:]*:(.*):[^:]*:$')
This returns every after the first colon and before the penultimate colon.
We can try using REGEXP_REPLACE here:
SELECT REGEXP_REPLACE(val, r"^.*?:|:[^:]+:$", "") AS output
FROM yourTable;
This approach removes either the leading ...: or trailing :...: from the column, leaving behind the content you want. Here is a demo showing that the regex replacement is working:
Demo
You can also use standard split function and access result array element by index, which is quite clear to read and understand.
with a as (
select split('product1:1stparty:single_aduls:android:', ':') as splitted
)
select splitted[ordinal(2)] || ':' || splitted[ordinal (3)] as subs
from a
Consider below example
with your_table as (
select 'product1:1stparty:single_aduls:android:' txt union all
select 'product2:3rdparty:married_adults:ios:' union all
select 'product3:3rdparty:other_adults:android:'
)
select *,
(
select string_agg(part, ':' order by offset)
from unnest(split(txt, ':')) part with offset
where offset in (1, 2)
) result
from your_table
with output

How to select rows with only Numeric Characters in Oracle SQL

I would like to keep rows only with Numeric Character i.e. 0-9. My source data can have any type of character e.g. 2,%,( .
Input (postcode)
3453gds sdg3
454232
sdg(*d^
452
Expected Output (postcode)
454232
452
I have tried using WHERE REGEXP_LIKE(postcode, '^[[:digit:]]+$');
however in my version of Oracle I get an error saying
function regexp_like(character varying, "unknown") does not exist
You want regexp_like() and your version should work:
select t.*
from t
where regexp_like(t.postcode, '^[0-9]+$');
However, your error looks more like a Postgres error, so perhaps this will work:
t.postcode ~ '^[0-9]+$'
For Oracle 10 or higher you can use regexp functions. In earlier versions translate function will help you :
SELECT postcode
FROM table_name
WHERE length(translate(postcode,'0123456789','1')) is null
AND postcode IS NOT NULL;
OR
SELECT translate(postcode, '0123456789' || translate(postcode,'x123456789','x'),'0123456789') nums
FROM table_name ;
the above answer also works for me
SELECT translate('1234bsdfs3#23##PU', '0123456789' || translate('1234bsdfs3#23##PU','x123456789','x'),'0123456789') nums
FROM dual ;
Nums:
1234323
For an alternative to the Gordon Linoff answer, we can try using REGEXP_REPLACE:
SELECT *
FROM yourTable
WHERE REGEXP_REPLACE(postcode, '[0-9]+', '') IS NULL;
The idea here is to strip away all digit characters, and then assert that nothing were left behind. For a mixed digit-letter value, the regex replacement would result in a non-empty string.

What's the equivalent of Excel's `left(find(), -1)` in BigQuery?

I have names in my dataset and they include parentheses. But, I am trying to clean up the names to exclude those parentheses.
Example: ABC Company (Somewhere, WY)
What I want to turn it into is: ABC Company
I'm using standard SQL with google big query.
I've done some research and I know big query has left(), but I do not know the equivalent of find(). My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
My plan was to do something that finds the ( and then gives me everything to the left of -1 characters from the (.
Good plan! In BigQuery Standard SQL - equivalent of LEFT is SUBSTR(value, position[, length]) and equivalent of FIND is STRPOS(value1, value2)
With this in mind your query can look like (which is exactly as you planned)
#standardSQL
WITH names AS (
SELECT 'ABC Company (Somewhere, WY)' AS name
)
SELECT SUBSTR(name, 1, STRPOS(name, '(') - 1) AS clean_name
FROM names
Usually, string functions are less expensive than regular expression functions, so if you have pattern as in your example - you should go with above version
But in more generic cases, when pattern to clean is more dynamic like in Graham's answer - you should go with solution in Graham's answer
Just use REGEXP_REPLACE + TRIM. This will work with all variants (just not nested parentheses):
#standardSQL
WITH
names AS (
SELECT
'ABC Company (Somewhere, WY)' AS name
UNION ALL
SELECT
'(Somewhere, WY) ABC Company' AS name
UNION ALL
SELECT
'ABC (Somewhere, WY) Company' AS name)
SELECT
TRIM(REGEXP_REPLACE(name,r'\(.*?\)',''), ' ') AS cleaned
FROM
names
Use REGEXP_EXTRACT:
SELECT
RTRIM(REGEXP_EXTRACT(names, r'([^(]*)')) AS new_name
FROM yourTable
The regex used here will greedily consume and match everything up until hitting an opening parenthesis. I used RTRIM to remove any unwanted whitespace picked up by the regex.
Note that this approach is robust with respect to the edge case of an address record not having any term with parentheses. In this case, the above query would just return the entire original value.
I can't test this solution at the moment, but you can combine SUBSTR and INSTR. Like this:
SELECT CASE WHEN INSTR(name, '(') > 0 THEN SUBSTR( name, 1, INSTR(name, '(') ) ELSE name END as name FROM table;

how to escape brackets in regex replace in oracle sql

My query is not working as expected :
SELECT REGEXP_REPLACE('This is testing data SIL(TM)T for once ',
'SIL(TM)T', 'SIL<REFERENCE ID="8208" TYPE="trademark"/>T')as
Newdescriptiontext from dual
output should be:
This is testing data SIL<REFERENCE ID="8208" TYPE="trademark"/>T for once
which is not the case .Need guidance .
try replace instead of regexp_replace
SELECT REPLACE('This is testing data SIL(TM)T for once ',
'SIL(TM)T', 'SIL<REFERENCE ID="8208" TYPE="trademark"/>T')as
Newdescriptiontext from dual;
You simply have to escape the parentheses:
SELECT REGEXP_REPLACE('This is testing data SIL(TM)T for once ',
'SIL\(TM\)T', 'SIL<REFERENCE ID="8208" TYPE="trademark"/>T')as Newdescriptiontext
from dual
In a regexp, they are used to delimit a "subexpression", so '(TM)' matches 'TM'; if you escape them, they'll be interpreted as plain characters, thus having '\(TM\)' matching '(TM)'

Best way to parse and concatenate a string with SQL

I'm trying to turn object_type,ABC,00,DEF,XY string into ABC-00-DEF-XY-
Here's what I've got, I'm wondering if there is a more efficient way?
CONCAT(
REPLACE(
SUBSTR(a.object_name,
INSTR(a.object_name, ',',1,1)+1,
INSTR(a.object_name, ',',1,2)+1
),',','-'
),'-'
)
Clarification: I need to strip off everything up to and including the first comma, replace all remaining commas with dashes, and then add a dash onto the end.
Try this
replace(substr(a.object_name,instr(a.object_name,',',1,1) + 1),',','-') ||'-'
Rexexp_replace() regular expression function can come in handy in this situation as well:
select ltrim(
regexp_replace( col
, '([^,]+)|,([^,]+)', '\2-'
)
, '-'
) as res
from t1
Result:
RES
--------------
ABC-00-DEF-XY-
SQLFiddle Demo
I suggest using the following code:
REPLACE(SUBSTRING(a.object_name,13,LEN(#object_name)-11),',','-') + '-'