Parse stringto get final end result - vb.net

I'm trying to parse this string 'Smith, Joe M_16282' to get everything before the comma, combined with everything after the underscore.
The resulting string would be: Smith16282

string longName = "Smith, Joe M_16282";
string shortName = longName.Substring(0, longName.IndexOf(",")) + longName.Substring(longName.LastIndexOf("_") + 1);
Notes:
The second "substring" doesn't need a length parameter, because we want everything after the underscore
The LastIndexOf is used instead of IndexOf in case there are other underscores appearing in the name such as "Smith_Jones, Joe M_16282"
This code assumes that there is at least one comma and at least one underscore in the string "longName." If not, the code fails. I will leave that checking to you if you need it.

As others have said, the simple approach for parsing a string like that would be to use the String's various parsing methods, such as IndexOf and SubString. If you want something more powerful and flexible, you may also want to consider using a RegEx replacement. For instance, you could do something like this:
Dim input As String = "Smith, Joe M_16282"
Dim pattern As String = "(.*?),.*?_(.*)"
Dim replacement As String = "$1$2"
Dim output As String = Regex.Replace(input, pattern, replacement)
Or, more simply:
Dim output As String = Regex.Replace("Smith, Joe M_16282", "(.*?),.*?_(.*)", "$1$2")
Here's the meaning of the pattern:
(.*?) - The first group capturing all of the characters before the comma
( - Starts the capturing group
. - This is a wildcard which matches any character
* - Specifies that the previous thing (any character) is repeated any number of times
? - Specifies that the * is non-greedy, meaning it won't match everything until the end of the string--it will only match until it finds the following comma
) - Ends the capturing group
, - The comma to look for
.*? - Says that there will be any number of any characters between the comma and the underscore which we don't care about
. - Any character
* - Any number of times
? - Until you find the underscore
_ - The underscore the look for
(.*) - The second group capturing all of the characters after the underscore
( - Starts the capturing group
. - Any character
* - Any number of times
) - Ends the capturing group
Here's the meaning of the replacement:
$1 - The value of all of the characters found in the first capturing group
$2 - The value of all of the characters found in the second capturing group
RegEx may be overkill for your particular situation, but it is a very handy tool to learn. One major advantage is that you could move the pattern and replacement values out into external settings in the app.config, or somewhere. Then, you could modify the replacement rules without recompiling your application.

Related

regex extract big query with numeric data

how would I be able to grab the number 2627995 from this string
"hellotest/2627995?hl=en"
I want to grab the number 2627995, here is my current regex but it does not work when I use regex extract from big query
(\/)\d{7,7}
SELECT
REGEXP_EXTRACT(DESC, r"(\/)\d{7,7}")
AS number
FROM
`string table`
here is the output
Thank you!!
I think you just want to match all digits coming after the last path separator, before either the start of the query parameter, or the end of the URL.
SELECT REGEXP_EXTRACT(DESC, r"/(\d+)(?:\?|$)") AS number
FROM `string table`
Demo
Try this one: r"\/(\d+)"
Your code returns the slash because you captured it (see the parentheses in (\/)\d{7,7}). REGEXP_EXTRACT only returns the captured substring.
Thus, you could just wrap the other part of your regex with the parentheses:
SELECT
REGEXP_EXTRACT(DESC, r"/(\d{7})")
AS number
FROM
`string table`
NOTE:
In BigQuery, regex is specified with string literals, not regex literals (that are usually delimited with forward slashes), that is why you do not need to escape the / char (it is not a special regex metacharacter)
{7,7} is equal to {7} limiting quantifier, meaning seven occurrences.
Also, if you are sure the number is at the end of string or is followed with a query string, you can enhance it as
REGEXP_EXTRACT(DESC, r"/(\d+)(?:[?#]|$)")
where the regex means
/ - a / char
(\d+) - Group 1 (the actual output): one or more digits
(?:[?#]|$) - either ? or # char, or end of string.

Check if a string has a combination of a substring and numbers in sql

how do I write a SQL where statement that checks if a string contains some substring and a number. For example:
string: macsea01
where string like 'macsea' plus a number
Regex is the most obvious solution to this question. Without more detail about the specific format of the string, I can suggest the following, which will match a sequence of a letter in the alphabet followed immediately by a digit:
where column_name like '%[a-zA-Z][0-9]%'
If you're literally looking for macsea at the beginning of the string followed by a digit, it would be:
where column_name like 'macsea[0-9]%'
Regex seem to bee a little slippery here, depending on your needs you can for instance divide the string into several parts, first the text part, and take the rest of the string, try to convert it into a number.
Somthing like this (but I think this perticular code is broken
where substring(column_name, 1, 6) = 'macsea' and cast(substring(column_name, 7, 1000) as int) > 0

Oracle SQL - find string pattern in string

I need to extract some text from a string, but only where the text matches a string pattern. The string pattern will consist of...
2 numbers, a forward slash and 6 numbers
e.g. 12/123456
or
2 numbers, a forward slash, 6 numbers, a hyphen and 2 numbers
e.g. 12/123456-12
I know how to use INSTR to find a specific string. Is it possible to find a string that matches a specific pattern?
You'll need to use regexp_like to filter the results and regexp_substr to get the substring.
Here is roughly what it should look like:
select id, myValue, regexp_substr(myValue, '[0-9]{2}/[0-9]{6}') as myRegExMatch
from Foo
where regexp_like(myValue,'^([a-zA-Z0-9 ])*[0-9]{2}/[0-9]{6}([a-zA-Z0-9 ])*$')
with a link to a SQLFiddle that you can see in action and adjust to your taste.
The regexp_like provided in the sample above takes into consideration the alphanumerics and whitespace characters that may bound the number pattern.
Use regexp_like.
where regexp_like(col_name,'\s[0-9]{2}\/[0-9]{6}(-[0-9]{2})?\s')
\s matches a space. Include them at the start and end of pattern.
[0-9]{2}\/[0-9]{6} matches 2 numerics, a forward slash and 6 numerics
(-[0-9]{2})? is optional for a hyphen and 2 numerics following the previous pattern.
regexp_like(col_name,'^\d{2}/\d{6}($|-\d{2}$)')
or
regexp_like(col_name,'^\d{2}/\d{6}(-\d{2})?$')

How can I extract a substring from a character column without using SUBSTR()?

I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.

regexp after a word appear

Im using regexp to find the text after a word appear.
Fiddle demo
The problem is some address use different abreviations for big house: Some have space some have dot
Quinta
QTA
Qta.
I want all the text after any of those appear. Ignoring Case.
I try this one but not sure how include multiple start
SELECT
REGEXP_SUBSTR ("Address", '[^QUINTA]+') "REGEXPR_SUBSTR"
FROM Address;
Solution:
I believe this will match the abbreviations you want:
SELECT
REGEXP_REPLACE("Address", '^.*Q(UIN)?TA\.? *|^.*', '', 1, 1, 'i')
"REGEXPR_SUBSTR"
FROM Address;
Demo in SQL fiddle
Explanation:
It tries to match everything from the begging of the string:
until it finds Q + UIN (optional) + TA + . (optional) + any number of spaces.
if it doesn't find it, then it matches the whole string with ^.*.
Since I'm using REGEXP_REPLACE, it replaces the match with an empty string, thus removing all characters until "QTA", any of its alternations, or the whole string.
Notice the last parameter passed to REGEXP_REPLACE: 'i'. That is a flag that sets a case-insensitive match (flags described here).
The part you were interested in making optional uses a ( pattern ) that is a group with the ? quantifier (which makes it optional). Therefore, Q(UIN)?TA matches either "QUINTA" or "QTA".
Alternatively, in the scope of your question, if you wanted different options, you need to use alternation with a |. For example (pattern1|pattern2|etc) matches any one of the 3 options. Also, the regex (QUINTA|QTA) matches exactly the same as Q(UIN)?TA
What was wrong with your pattern:
The construct you were trying ([^QUINTA]+) uses a character class, and it matches any character except Q, U, I, N, T or A, repeated 1 or more times. But it's applied to characters, not words. For example, [^QUINTA]+ matches the string "BCDEFGHJKLMOPRSVWXYZ" completely, and it fails to match "TIA".