Search string with regular expression SQL Server - sql

The regex I want to use is: ^(?=.*[,])(,?)ABC(,?)$
What I want to get out is:
^ // start
(?=.*[,]) // contains at least one comma (,)
(,?)ABC(,?) // The comma is either in the beginning or in the end of the string "ABC"
$ // end
Of course ABC is ought to be a variable based on my search term.
So if ABC = 'abc' then ",abc", "abc,", ",abc," will match but not "abc" or "abcd"
Better way to do this is also welcome.
The value in the record looks like "abc,def,ghi,ab,cde..." and I need to find out if it contains my element (i.e. 'abc'). I cannot change the data structure. We can assume that in no case the record will contain only one sub-value, so it is correct to assume that there always is a comma in the value.

If you want to know if a comma delimited string contains abc, then I think like is the easiest method in any database:
where ',' + col + ',' like '%,abc,%'

Related

Wildcard usage for ^

I want to verify a field,
if a field is only number like 12345, return nothing
if a field is something like 1234-1, or -123331, return nothing
if a field is a12341, or 34j123 or 99933hh return 1
if a field is a1234-, or sodf233- return 1.
Basically just check if there is a non number character in this field, but allow the dash to be in.
Here is my thoughts:
select 1
from dbo.random.field
where ISNUMBERIC(field)=0 and field not like '%-%'
Use this to check if there is letter, and then, if there is a -, but my test case always like this:
12345 Pass
12345a Failed
12345- Pass
1234a- Pass, but this should fail.
So what am I doing wrong here?
Let me assume you are using SQL Server, based on isnumeric(). You can use like:
where field like 'a%[0-9]%' and
field not like 'a%[^-0-9]%'
The first checks that the column starts with 'a' and has at least one digit. The second checks that there are no non-digits or non-hyphens after the 'a'.
You can generalize the a to any letter using [a-z] (assuming case insensitivity) or to any non-digit using [^0-9].
EDIT:
For your revised question, you just seem to want a letter. You can use:
select *
from (values ('12345'), ('1234-1'), ('1234-1'), ('a12341'), ('34j123'), ('99933hh'), ('a1234-'), ('sodf233-')) v(field)
where field like '%[^-0-9]%';
So this is what I did in the end
like '%[^-0-9]%'

PostgreSQL - find matching line in char/string column?

How can I find matching line in char/string type column?
For example let say I have column called text and some row has content of:
12345\nabcdf\nXKJKJ
(where \n are real new lines)
Now I want to find related row if any of lines match. For example, I have value 12345,
then it should find match. But if I have value 123, It would not.
I tried using like but it finds in both cases, when I have matching value (like 12345) and partially matching value (like 123).
For example something like this, but to have boundary for checking whole line:
SELECT id
FROM my_table
WHERE text like [SOME_VALUE]
Update
Maybe its not yet clear what Im asking. But basically I want something equivalent what you can do with regular expression,
like this: https://regexr.com/5akj1
Here regular expression /^123$/m would not match my string, it would only match if it would have been with pattern /^12345$/m (when I use pattern, value is dynamic, so pattern would change depending what value I got).
You may use regexp_replace and then check that the replaced string is not equal to the original column value:
select count(*)
from dummy
where regexp_replace(mytext, '(?m)^1234$', '') <> mytext;
You have a demo here.
Bear in mind that I have used the (?m) modifier, which makes ^ and $ match begin and end of line instead of begin and end of string.
You should be able to use ~ for matching:
where mytext ~ '(\n|^)1234(\n|$)'

sql: regular expression for certain pattern separation

In table "example" I have a column "col1" with following strings
some example text here x2.0.3-a abc
some other example text 1.5 abc
another example text 0.1.4 mnp
some other example text abc
another example text mnp
Now I need following things
Add the part before . to another column "col1"
Add the part . to another column "col2"
So the output should look like this
col1 col2
some example text here x2.0.3-a
some other example text 1.5
another example text 0.1.4
some other example text
another example text
Some of the properties of the string in col1 are
String in the col1 always end with either abc or mnp
Numbers like these x2.0.3-a or 0.1.4 are properties. These properties may not always exists in the col1 string. But if it exits then it always exists before the ending string abc or mnp.
there is always an space before properties and after the properties i.e another space between ending string abc/mnp and the properties.
So my question is how can I separate the properties and add them into col2?
One idea that comes into my head is that try to find something with *.* abc/mnp or *.*.* abc/mnp that is anything.anything. space abc/mnp OR anything.anything.anything space abc/mnp. I am not sure if I explained it properly.
As far as I understood, you'd like your column to be splitted into 3 columns. You should explain your 2nd column's scope and semantics better, so you can make sure to regular expression regularly matches with it.
I built a regex in parallel to the data you supplied, so it may not match for future incoming lines. Regex is here : https://regex101.com/r/seLgca/2/ What it does is, it captures three main groups:
(.+?)\s?([a-z]?\d(?:\.\d){1,2}(?:-[a-z])?)?\s(abc|mnp)
Let's break regex into parts:
(.+?)
\s?
([a-z]?\d(?:.\d){1,2}(?:-[a-z])?)?
\s
(abc|mnp)
Starting in reverse order, fifth part is simply matches abc or mnp. Forth part expects a space. Third part matches your second column if it exists, note that this part is for what you supplied so you can modify this part to suit your data better. Second part expects a space if it exists, this is for the lines contain empty second columns. First part is for the rest.
In Oracle, we have search and substring functions with regex as far as I know. So, you'll need a programming language to capture those groups.
I wrote a Java method for the purpose:
static List<String> getGroups(String content, String regex){
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
List<String> groupsMatched = new ArrayList<String>();
if(matcher.find()){
for(int i=0; i<matcher.groupCount(); i++)
groupsMatched.add(matcher.group(i));
return groupsMatched;
}else
return null;
}
So, if I call the method with the lines you supplied like this:
for(String content : listOfContent){
List<String> groupsMatched = getGroups(content, regex);
if(groupsMatched != null)
System.out.println(groupsMatched.get(1) + "\t" + groupsMatched.get(2) + "\t" + groupsMatched.get(3) );
}
Here is what I have:
some example text here x2.0.3-a abc
some other example text 1.5 abc
another example text 0.1.4 mnp
some other example text null abc
another example text null mnp
Hope this helps.
Cheers,

Regular expression to remove element not match specific prefix

I am doing this in Impala or Hive. Basically let say I have a string like this
f-150:aa|f-150:cc|g-210:dd
Each element is separated by the pipe |. Each has prefix f-150 or whatever. I want to be able to remove the prefix and keep only element that matches specific prefix. For example, if the prefix is f-150, I want the final string after regex_replace is
aa|cc
dd is removed because g-210 is different prefix and not match, therefore the whole element is removed.
Any idea how to do this using string expression in one SQL?
Thanks
UPDATE 1
I tried this in Impala:
select regexp_extract('f-150:aa|f-150:cc|g-210:dd','(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}',0);
But got this output:
f-150:aa
In Hive, I got NULL.
The regexyou in question could look like this:
(?:(?:|(\\|))f-150|keep|those):|(?:^|\\|)\\w-\\d{3}:\\w{2}
I have added some pseudo keywords to retain, but I am sure you get the idea:
Wholy match elements that should be dropped but only match the prefix for those that should be retained.
To keep the separator intact, match | at the beginning of an element in group 1 and put it back in the replacement with $1.
Demo
According to the documentation, your query should be written like a Java regex; likewise, this should perform like this code sample in Java.
You could match the values that you want to remove and then replace with an empty string:
f-150:|\|[^:]+:[^|]+$|[^|]+:[^|]+\|
f-150:|\\|[^:]+:[^|]+$|[^|]+:[^|]+\\|
Explanation
f-150: Match literally
| Or
\|[^:]+:[^|]+$ Match a pipe, not a colon one or more times followed by not a pipe one or more times and assert the end of the line
| Or
[^|]+:[^|]+\| Match not a pipe one or more times, a colon followed by matching not a pipe one or more times and then match a pipe
Test with multiple lines and combinations
You may have to loop through the string until the end to get the all the matching sub string. Look ahead syntax is not supported in most sql so above regexp might not be suitable for SQL syntax. For you purpose you can do something like creating a table to loop through just to mimic Oracle's level syntax and join with your table containing the string.
With loop_tab as (
Select 1 loop union all
Select 2 union all
select 3 union all
select 4 union all
select 5),
string_tab as(Select 'f-150:aa|ade|f-150:ce|akg|f-150:bb|'::varchar(40) as str)
Select regexp_substr(str,'(f\\-150\\:\\w+\\|)',1,loop)
from string_tab
join loop_tab on 1=1
Output:
regexp_substr
f-150:aa|
f-150:ce|
f-150:bb|

Find substring in string

Is it possible to check if a specific substring which is in SQL Server column, is contained in a user provided string?
Example :
SELECT * FROM Table WHERE 'random words to check, which are in a string' CONTAINS Column
From my understanding, CONTAINS can't do such kind of search.
EDIT :
I have a fully indexed text and would like to search (by the fastest method) if a string provided by me contains words that are present in a column.
You can use LIKE:
SELECT * FROM YourTable t
WHERE 'random words ....' LIKE '%' + t.column + '%'
Or
SELECT * FROM YourTable t
WHERE t.column LIKE '%random words ....%'
Depends what did you mean, first one select the records that the column has a part of the provided string. The second one is the opposite.
Just use the LIKE syntax together with % around the string you are looking for:
SELECT
*
FROM
table
WHERE
Column LIKE '%some random string%'
This will return all rows in the table table in which the column Column contains the text "some random string".
1) If you want to get data starting with some letter you can use % this operator like this in your where clause
WHERE
Column LIKE "%some random string"
2) If you want to get data contains any letter you can use
WHERE
Column LIKE "%some random string%"
3)if you want to get data ending with some letter you can use
WHERE
Column LIKE "some random string%"