Regex to capture pattern with sql

Regex to capture pattern with sql - sql

Given the cell data shown below is of column 'Feed'
hdfs//sddad/aa/vv/cc/SR_DC_EF_GF_20181130_20156478907000_484658274168_CO.dat
i am trying to use a regex method to only display this value 'SR_DC_EF_GF'. Currently i am manually doing a regex method by date which i dont feel its dynamic enough. e,g
select `regexp_replace([Feed], '_2018.*', '')` from tablename.
this will only display and does regex on table that is _20181130. but if i were to have _2019 and _2020, it wont capture and display the whole value. how we can make this regex method dynamic where it can capture other dates?

You could look for four digits:
select regexp_replace([Feed], '_[0-9]{4}.*', '')
from tablename;
However, you are only getting rid of the suffix. I think you want something like this to extract the piece you are looking for:
select regexp_replace([Feed], '^.*([^/0-9]+)_[0-9]{4}.*$', '\1')
from tablename;

Related

Regex to split apart text. Special case for parentheses with spaces in them

I am trying to split a field by delimiter in LookML. This field either follows the format of:
Managers (AE)
Managers (AE - MM)
I was able to split to first case using this
sql: case
when rlike (${user_role_name}, '^.*[\\(\\)].*$') then split_part(${user_role_name}, ' ', -1)
However, I haven't been able to get the 2nd case to do the same. It's in a case statement so I am going to add another when statement, but am not able to figure out the regex for parentheses that contains spaces.
Thanks in advance for the help!

By "split" the string, I think you mean you want to extract the part in parentheses, right?
I would do this using a regex substring method. You didn't mention what warehouse you're using, and the syntax will vary a little, but on snowflake that would look like:
regexp_substr(${user_role_name}, '\\([^)]*\\)')
So, for example, with the inputs you gave:
select regexp_substr('Managers (AE)', '\\([^)]*\\)')
union all
select regexp_substr('Managers (AE - MM)', '\\([^)]*\\)')
result
(AE)
(AE - MM)

Extract characters in string following keyword and ending right before the other keyword

I have a table that looks like:
id
re|cid|13324242|
wa|cid|13435464|
fs|cid|2343532|
I want to extract information that is contained right after "|cid|" and before the following "|" element. That is:
13324242
13435464
2343532
I thought of substr() but there I don't know how to specify start and end element.

You could use REGEXP_REPLACE here (Standard SQL):
SELECT
id,
CASE WHEN id LIKE '%|cid|%'
THEN REGEXP_REPLACE(id, '^.*\|cid\|(\d+)\|.*$', '\1') END AS cid
FROM yourTable;
The idea is to use a regex replacement to extract the cid value from the id column, should it be present (and if not, we would just return NULL).
Here is a demo showing that the regex logic be correct.

If you want the third element (which appears to be the intention given the sample data), I would recommend split():
select (split(id, '|')[ordinal(3)]

How to filter String in where clause

I would like to extract the string using where clause in SAP HANA.For an example,these are 3 strings for name column.
123._SYS_BIC.meag.app.qthor.cidwh_eingangsschicht.backend.dblayer.l2.checks/MasterData_Holdings.
153._SYS_BIC.meag.app.qthor.centralAdministration.backend.dblayer.l2.checks/AuditAndSecurities.
meag.app.qthor.centralAdministration.backend.dblayer.l2.checks/GeneralLedger
After filter the name column using where clause, output in the name column would be shown only the last portion of the string. So, output will be like this. That means whatever we have, just remove from the beginning till '/'.
"MasterData_Holdings"
"AuditAndSecurities"
"GeneralLedger"

You can try using the REPLACE_REGEXPR
I'm not familiar myself with Hana but the function is pretty straight forward and it should be:
select REPLACE_REGEXPR('.+/(.+)' IN fieldName WITH '\1' OCCURRENCE ALL) as field
...
where
... -- your filter
Be aware that this regex '.+/(.+)' will eat everything until the last / so for instance if you have ....checks/MasterData_Holdings/Something it will return only Something

Select query that displays Joined words separately, not using a function

I require a select query that adds a space to the data based on the placement of the capital letters i.e. 'HelpMe' using this query would be displayed as 'Help Me' . Note i cannot use a stored function to do this the it must be done in the query itself. The Data is of variable length and query must be in SQL. Any Help will be appreciated.
Thanks

You need to use user defined function for this until MS give us support for regular expressions. Solution would be something like:
SELECT col1, dbo.RegExReplace(col1, '([A-Z])',' \1') FROM Table
Aldo this would produce leading space that you can remove with TRIM.
Replace regular expresion function:
http://connect.microsoft.com/SQLServer/feedback/details/378520
About dbo.RegexReplace you can read at:
TSQL Replace all non a-z/A-Z characters with an empty string

Assume if you are using Oracle RDBMS, you use the following,
REGEX_REPLACE
SELECT REGEXP_REPLACE('ILikeToWatchCSIMiami',
'([A-Z.])', ' \1')
AS RX_REPLACE
FROM dual
;
Managed to get this output: * SQLFIDDLE
But as you see it doesn't treat well on words such as CSI though.

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?

The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)

You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Regex to capture pattern with sql - sql

Related

Regex to split apart text. Special case for parentheses with spaces in them

Extract characters in string following keyword and ending right before the other keyword

How to filter String in where clause

Select query that displays Joined words separately, not using a function

How to extract group from regular expression in Oracle?

Categories

Resources