How to filter String in where clause - sql

I would like to extract the string using where clause in SAP HANA.For an example,these are 3 strings for name column.
123._SYS_BIC.meag.app.qthor.cidwh_eingangsschicht.backend.dblayer.l2.checks/MasterData_Holdings.
153._SYS_BIC.meag.app.qthor.centralAdministration.backend.dblayer.l2.checks/AuditAndSecurities.
meag.app.qthor.centralAdministration.backend.dblayer.l2.checks/GeneralLedger
After filter the name column using where clause, output in the name column would be shown only the last portion of the string. So, output will be like this. That means whatever we have, just remove from the beginning till '/'.
"MasterData_Holdings"
"AuditAndSecurities"
"GeneralLedger"

You can try using the REPLACE_REGEXPR
I'm not familiar myself with Hana but the function is pretty straight forward and it should be:
select REPLACE_REGEXPR('.+/(.+)' IN fieldName WITH '\1' OCCURRENCE ALL) as field
...
where
... -- your filter
Be aware that this regex '.+/(.+)' will eat everything until the last / so for instance if you have ....checks/MasterData_Holdings/Something it will return only Something

Related

Regex to capture pattern with sql

Given the cell data shown below is of column 'Feed'
hdfs//sddad/aa/vv/cc/SR_DC_EF_GF_20181130_20156478907000_484658274168_CO.dat
i am trying to use a regex method to only display this value 'SR_DC_EF_GF'. Currently i am manually doing a regex method by date which i dont feel its dynamic enough. e,g
select `regexp_replace([Feed], '_2018.*', '')` from tablename.
this will only display and does regex on table that is _20181130. but if i were to have _2019 and _2020, it wont capture and display the whole value. how we can make this regex method dynamic where it can capture other dates?
You could look for four digits:
select regexp_replace([Feed], '_[0-9]{4}.*', '')
from tablename;
However, you are only getting rid of the suffix. I think you want something like this to extract the piece you are looking for:
select regexp_replace([Feed], '^.*([^/0-9]+)_[0-9]{4}.*$', '\1')
from tablename;

Regular expression to remove element not match specific prefix

I am doing this in Impala or Hive. Basically let say I have a string like this
f-150:aa|f-150:cc|g-210:dd
Each element is separated by the pipe |. Each has prefix f-150 or whatever. I want to be able to remove the prefix and keep only element that matches specific prefix. For example, if the prefix is f-150, I want the final string after regex_replace is
aa|cc
dd is removed because g-210 is different prefix and not match, therefore the whole element is removed.
Any idea how to do this using string expression in one SQL?
Thanks
UPDATE 1
I tried this in Impala:
select regexp_extract('f-150:aa|f-150:cc|g-210:dd','(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}',0);
But got this output:
f-150:aa
In Hive, I got NULL.
The regexyou in question could look like this:
(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}
I have added some pseudo keywords to retain, but I am sure you get the idea:
Wholy match elements that should be dropped but only match the prefix for those that should be retained.
To keep the separator intact, match | at the beginning of an element in group 1 and put it back in the replacement with $1.
Demo
According to the documentation, your query should be written like a Java regex; likewise, this should perform like this code sample in Java.
You could match the values that you want to remove and then replace with an empty string:
f-150:|\|[^:]+:[^|]+$|[^|]+:[^|]+\|
f-150:|\\|[^:]+:[^|]+$|[^|]+:[^|]+\\|
Explanation
f-150: Match literally
| Or
\|[^:]+:[^|]+$ Match a pipe, not a colon one or more times followed by not a pipe one or more times and assert the end of the line
| Or
[^|]+:[^|]+\| Match not a pipe one or more times, a colon followed by matching not a pipe one or more times and then match a pipe
Test with multiple lines and combinations
You may have to loop through the string until the end to get the all the matching sub string. Look ahead syntax is not supported in most sql so above regexp might not be suitable for SQL syntax. For you purpose you can do something like creating a table to loop through just to mimic Oracle's level syntax and join with your table containing the string.
With loop_tab as (
Select 1 loop union all
Select 2 union all
select 3 union all
select 4 union all
select 5),
string_tab as(Select 'f-150:aa|ade|f-150:ce|akg|f-150:bb|'::varchar(40) as str)
Select regexp_substr(str,'(f\\-150\\:\\w+\\|)',1,loop)
from string_tab
join loop_tab on 1=1
Output:
regexp_substr
f-150:aa|
f-150:ce|
f-150:bb|

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;

Postgresql query to update fields using a regular expression

I have the following data in my "Street_Address_1" column:
123 Main Street
Using Postgresql, how would I write a query to update the "Street_Name" column in my Address table? In other words, "Street_Name" is blank and I'd like to populate it with the street name value contained in the "Street_Address_1" column.
From what I can tell, I would want to use the "regexp_matches" string method. Unfortunately, I haven't had much luck.
NOTE: You can assume that all addresses are in a "StreetNumber StreetName StreetType" format.
If you just want to take Street_Address_1 and strip out any leading numbers, you can do this:
UPDATE table
SET street_name = regexp_replace(street_address_1, '^[0-9]* ','','');
This takes the value in street_address_1 and replaces any leading string of numbers (plus a single space) with an empty string (the fourth parameter is for optional regex flags like "g" (global) and "i" (case-insensitive)).
This version allows things like "1212 15th Street" to work properly.
Something like...:
UPDATE table
SET Street_Name = substring(Street_Address_1 FROM '^[0-9]+ ([a-zAZ]+) ')
See relevant section from PGSQL 8.3.7 docs, the substring form is detailed shortly after the start of the section.

Is it possible to get the matching string from an SQL query?

If I have a query to return all matching entries in a DB that have "news" in the searchable column (i.e. SELECT * FROM table WHERE column LIKE %news%), and one particular row has an entry starting with "In recent World news, Somalia was invaded by ...", can I return a specific "chunk" of an SQL entry? Kind of like a teaser, if you will.
select substring(column,
CHARINDEX ('news',lower(column))-10,
20)
FROM table
WHERE column LIKE %news%
basically substring the column starting 10 characters before where the word 'news' is and continuing for 20.
Edit: You'll need to make sure that 'news' isn't in the first 10 characters and adjust the start position accordingly.
You can use substring function in a SELECT part. Something like:
SELECT SUBSTRING(column, 1,20) FROM table WHERE column LIKE %news%
This will return the first 20 characters from column column
I had the same problem, I ended up loading the whole field into C#, then re-searched the text for the search string, then selected x characters either side.
This will work fine for LIKE, but not full text queries which use FORMS OF INFLECTION because that may match "women" when you search for "woman".
If you are using MSSQL you can perform all kinds VB-like of substring functions as part of your query.