I am trying to remove last 8 characters from a long string but only in case it ends with the 6 character string in the parenthesis (the bolded ones). Does anyone know how to do this in BigQuery?
here are some very random data examples:
01/5/2014 - new planted trees - email - juniper
04/22/2021 - fridge remote‚I want fresh tea (xgssjj)
re- engagement email
5/20 - example reminder (hfgfgh)
repeat customer example #2 (ttrdgd)
Thanks!
Consider below approach
select longString,
trim(regexp_replace(longString, r'\(\w{6}\)$', '')) newString
from your_table
if applied to sample data in your question - output is
Related
I have a two columns with the following data:
Column 1: BIG123 - Telecommunications (John Barrot)
Column 2: 7 Congressional 1 - Toward
The data format is the same with spaces and the "-" as the delimiter for each column, but the organization, names, and beginning code can be longer or shorter than what you see here(instead of Telecommunications it can be CEO or instead of John Barrott it can be Guy Rodriguez, etc). I need to extract the following:
(Column names are in bold)
Organization Telecommunications
Supervisor John Barrot
Profile
Congressional 1 - Toward
I have been using the following cheat sheet but I am still having issues extracting: https://cheatography.com/davechild/cheat-sheets/regular-expressions/
I have tried regex_extract(column1, [A-Z][a-z]) and I only get the first two letters of column 1 after the "-".
Any help would be great.
Thanks,
DW
With your example try the following
with sample_data as (
select 'BIG123 - Telecommunications (John Barrot)' AS COLUMN_1, '7 Congressional 1 - Toward' as COLUMN_2
)
select regexp_extract(COLUMN_1, r'.+-\s(\S+)') as Organization
, regexp_extract(COLUMN_1, r'.+\((.+\w)') as Supervisor
, regexp_extract(COLUMN_2, r'\d+\s(.+)') as Profile
from sample_data
I have a table called Note with a column named Notes.
Notes
------
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Arial;}}
\viewkind4\uc1\pard\lang1033\fs20 called insurance company they are waiting to hear from the claimant's attorney
It has font info in the beginning which I don't need. I've created a new column name final_notes and would like to grab everything after the "fs" plus two characters. The final result would be
final_notes
-----------
called insurance company they are waiting to hear from the claimant's attorney
We use PATINDEX to find the first occurrence of fs followed by two digits.
We null it out if we get a 0 i.e. we cannot find the string.
SUBSTRING(Note, NULLIF(PATINDEX('%fs[0-9][0-9]%', Note), 0) + 4, LEN(Note))
I need to write regular expressions in bigquery to match the following two under title column: I want to get exactly these two. There are some other values containing 3 Percent, but I want to get only these two.
WBC - SAV - 3 Percent Q4 FY20
Canstar - canstar.com.au - AFF: Table Listing - Cost per click - National - 1x1 - 3 percent Savings
My code is:
WHEN REGEXP_CONTAINS(title, '(?i) 3 Percent')
THEN '3% PF'
I am not getting the correct output. Can anyone please assist.
There are some other values containing 3 Percent, but I want to get only these two.
So, in this case you don't need regular expression and rather use below
WHEN title IN (
'WBC - SAV - 3 Percent Q4 FY20',
'Canstar - canstar.com.au - AFF: Table Listing - Cost per click - National - 1x1 - 3 percent Savings'
) THEN '3% PF'
I want to get the last sentence that is starting from a number in a column.
Example Code:
WITH q AS (SELECT '1.abc def ghi 2.sdadasd. rewtretrtr1 3. hjgjhjhgj, yo whats. 4. gog mi man. Its been' AS sentence FROM DUAL)
SELECT SUBSTR(sentence, INSTR(sentence,'.',-1) + 1)
FROM q;
My Output
Its been
Expected Output
4. gog mi man. Its been
Is this possible in Oracle?
This is a good use case for handy Oracle regexp function REGEXP_SUBSTR():
SELECT REGEXP_SUBSTR(sentence, '\d\.\D+$') FROM q;
Regexp breakdown:
\d -- a digit
\. -- a dot
\D+ -- as many non-digit characters as possible (at least one)
$ -- end of string
REGEXP_SUBSTR() searches the string for a for the given regular expression and returns a given occurence (first occurence by default).
Demo on DB Fiddle:
WITH q AS (SELECT '1.abc def ghi 2.sdadasd. rewtretrtr1 3. hjgjhjhgj, yo whats. 4. gog mi man. Its been' AS sentence FROM DUAL)
SELECT REGEXP_SUBSTR(sentence, '\d\.\D+$') FROM q;
| REGEXP_SUBSTR(SENTENCE,'\D\.\D+$') |
| :--------------------------------- |
| 4. gog mi man. Its been |
EDIT
It turns out that you are dealing with much more complex strings:
the portion to capture might contain numbers
the string may contain new line
I hence would suggest a new approach, that relies on REGEXP_REPLACE() to remove the unwanted part of the string.
Consider:
SELECT REGEXP_REPLACE(sentence, '.*\d+\.', '', 1, 0, 'n') FROM q;
Regexp .*\d+\. will greadily match everything from the beginning of the string to the last occurence of a digit followed by a dot and a space. REGEXP_REPLACE will suppress that part of the string. The 'n' modifier allows the . character to match on the new line character.
With this expression, you get the expected part of the string, only minus the digit(s) and dot at the beginning (that's as good as it gets, since Oracle does not support regex lookaheads... sigh).
Demo on DB Fiddle:
Given this input string:
We have received customer approval on the
warranty nozzle including revised ERO repairs. Please proceed with the repairs.
Please provide photos and damage mapping when complete per customer requests." 9/12/19 MH
10. CHECKING WITH VENDOR ABOUT ECD. 9/13/19
MH11. Per Vendor,
"Originally I quoted a 3-4 week delivery once approved. This month is shot. W
e are booked solid. We estimate a delivery date of 10/11" 9/13/19 MH
The query returns:
Per Vendor,
"Originally I quoted a 3-4 week delivery once approved. This month is shot. W
e are booked solid. We estimate a delivery date of 10/11" 9/13/19 MH
This is quite tricky, if your sentences can contain digits. But it can be done in Oracle:
WITH q AS (
SELECT '1.abc def ghi 2.sdadasd. rewtretrtr1 3. hjgjhjhgj, yo whats. 4. gog mi man. Its been' AS sentence FROM DUAL union all
SELECT '1.abc def ghi 2.sdadasd. rewtretrtr1 3. hjgjhjhgj, yo whats. 4. gog mi 3 men. Its been' AS sentence FROM DUAL
)
SELECT regexp_substr(sentence, '\d[.](\D|\d+[^.])*$')
FROM q;
My current comment reply system:
1
1.1
1.2
1.2.1
10
2
2.1
I can sort the comments from table by their ids (as above) and indent depending on the number of dashes.
The problem is that '10' comes right after '1.2.1'. Is it possible to sort values such as '1.2.1' as a number and not string?
Does any number data type excepts multiple dots or commas?
Thanks in advance!
The common way in materialized path trees is to pad ids to a N-digit number with zeros so it comes 00001 etc
00001
00001.00001
00001.00002
00001.00002.00001
00002
00002.00001
00010