Find a specific number in a string - vba

I'm using an IF THEN statement to determine if a string contains a specific number. For example, the string = 1, 9, 13. I'm trying to isolate strings that contain the single number "3". However, when I use Like "3", or Like "3", I also get the results back that contain 13. How do I use wildcards to do this?

If your string is just a list of numbers in the form you have shown ...
"n1, n2, n3, n4"
... then you can use VBA's Like function as follows:
Debug.Print " 1, 7, 5, 13," Like "*[ ]3,*" 'matches 3 but not 13
If your string is arbitrary, then this is actually a regular expression question, and you'll need to decide if this a route you want to go down. Something like the following works at the Regex Tester Page:
/\b3\b/g
There's a useful VBA Regex Regular Expressions Guide which shows you how to use syntax like this, including how to set a reference to the additional library that you need for it to work.

Related

Regex match first number if it does not appear at the end

I am currently facing a Regex problem which apparently I cannot find an answer to.
My Regex is embedded in a teradata SQL of the form:
REGEXP_SUBSTR(column, 'regex_pattern')
I want to find the first appearance of any number except if it appears at the end of the string.
For Example:
"YEL2X30" -> "2"
"YEL19XYZ05" -> "19"
"YELLOW05" -> ""
I tried it with '[0-9]+(?!$)/' but this returns me a blank String always.
Thanks in Advance!
Shot in the dark here since I'm unfamiliar with teradata and the supported SQL-functionality. However, reading the docs on the REGEXP_SUBSTR() function it seems like you may want to use the 3rd and 4th possible argument along with a slightly different regular expression:
[0-9]+(?![0-9]|$)
Meaning: 1+ Digits that are not followed by either the end of the string or another digit.
I'd believe the following syntax may work now to retrieve the 1st appearance of any number from the matching results:
REGEXP_SUBSTR(column, '[0-9]+(?![0-9]|$)', 1, 1)
The 3rd parameter states from which position in the source-string we need to start searching whereas the 4th will return the 1st match from any possible multiple matches (is how I read the docs). For example: abc123def456ghi789 whould return 123.
Fiddling around in online IDE's gave me that:
CREATE TABLE TBL (TST varchar(100));
INSERT INTO TBL values ('YEL2X30'), ('YEL19XYZ05'), ('YELLOW05'), ('abc123def456ghi789');
SELECT REGEXP_SUBSTR(TST, '[0-9]+(?![0-9]|$)', 1, 1) as 'RESULTS' FROM TBL;
Resulted in:
RESULTS
2
19
NULL
123
NOTE: I also noticed that leaving out the 3rd and 4th parameter made no difference since they will default back to 1 without explicitly mentioning them. I tested this over here.
Possibly the simplest way is to look for digits followed by a non-digit. Then keep all the digits:
regexp_substr(regexp_substr(column, '[0-9]+[^0-9]'), '[0-9]+')

how to use REGEXP_SUBSTR or INSTR in Oracle SQL to get specific string from a column that varies in character length

I have below values stored in a oracle database table: column name=org and am trying to just get the org. user belongs to, in this case: 'abc', 'xyz' i.e., first occurrence after DC=. How can i achieve this ?
12~OU=Administrators,DC=abc,DC=enter,DC=msft,DC=com
14~OU=Admin,OU=Users,DC=xyz,DC=enter,DC=msft,DC=com
output
abc
xyz
I am pretty new to regex_substr, instr expressions.
Any input will be appreciated.
Thanks in advance.
I would suggest something like regexp_substr(col, '(,|^)DC=([^,]*)(,|$)', 1, 1, '', 2)
The (,|^)DC= bit will match either ,DC= or the start of a line followed by DC= (so it won't match another name like ANOTHERDC=). The ([^,]*) bit will match non-comma characters (depending on whether you need to handle escaped delimiters in your field, it's possible you will need to change this). The (,|$) at the end of the expression matches either a comma or the end of the line (to ensure we've selected the whole segment... but see below). Setting the 4th parameter to 1 ensures we get the first match. The 6th parameter is set to 2 to specify that we only want to return the part of the match within the second ().
Since the matching will be greedy by default, you don't really need to worry about the (,|$) bit and could just use regexp_substr(col, '(,|^)DC=([^,]*)', 1, 1, '', 2). I specified it since I think it's more clear to someone (like me) who doesn't remember whether it defaults to greedy or non-greedy.
Similarly, if you know you don't need to worry about cases where a non-DC name ends with DC, you could just simplify to regexp_substr(col, 'DC=([^,]*)', 1, 1, '', 1).
You could start with something like this and then modify it to suit your requirements.

Using 'LIKE' and 'REGEXP' in a SQL query

I'm trying to use some regex on an expression where I have two conditions on the WHERE clause. The pattern I want to capture is 106 followed by any digit followed by a digit that must be either 3 or 4, i.e. 106[0-9][3-4]
First, I tried this:
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND (ForestID REGEXP '106[0-9][3-4]')
This produced an error as below and it would be good to know why.
Msg 102, Level 15, State 1, Line 16
Incorrect syntax near 'REGEXP'.
Next, I have tried this, which is now running but I am unsure about whether this is doing what I want it to do.
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND (ForestID LIKE '106[0-9][3-4]')
Would this do as I described above?
You specify this:
The pattern I want to capture is 106 followed by any digit followed by
a digit that must be either 3 or 4, i.e. 106[0-9][3-4]
And then you give an example using a regular expression:
WHERE ForestID REGEXP '106[0-9][3-4]'
Regular expressions match patterns anywhere inside a string. So, this will match '10603'. It will also match 'abc10694 def'. This is true of regular expressions in general, not merely one databases's implementation of them.
If this is the behavior you want, then the corresponding LIKE (in SQL Server)` is:
WHERE ForestID LIKE '%106[0-9][3-4]%'
If you only want 5-digit values, then the corresponding regular expression is:
WHERE ForestID REGEXP '^106[0-9][3-4]$'
You do not need to interact with managed code, as you can use LIKE:
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND ForestID LIKE '106[0-9][3-4]')
to make clear: SQL Server doesn't supports regular expressions without managed code. Depending on the situation, the LIKE operator can be an option, but it lacks the flexibility that regular expressions provides.
If you would like to have full regular expression functionality, try this.
Try Below
SELECT DISTINCT Loggers
FROM [alo].[Forests] C
WHERE (R.LogSU = 3)
AND ((ForestID LIKE '%106_3%' OR ForestID LIKE '%106_4%'))

Cloudant - Lucene range search using numbers stored as text

I have a number of documents in Cloudant, that have ID field of type string. ID can be a simple string, like "aaa", "bbb" or number stored as text, e.g. "111", "222", etc. I need to be able to full text search using the above field, but I encountered some problems.
Assuming that I have two documents, having ID="aaa" and ID="111", then searching with query:
ID:aaa
ID:"aaa"
ID:[aaa TO zzz]
ID:["aaa" TO "zzz"]
returns first document, as expected
ID:111
returns nothing, but
ID:"111"
returns second document, so at least there is a way to retrieve it.
Unfortunately, when searching for range:
ID:[111 TO 999]
ID:["111" TO "999"]
I get no results, and I have no idea what to do to get around this problem. Is there any special syntax for such case?
UPDATE:
Index function:
function(doc){
if(!doc.ID) return;
index("ID", doc.ID, { index:'not_analyzed_no_norms', store:true });
}
Changing index to analyzed doesn't help. Analyzer itself is keyword, but changing to standard doesn't help either.
UPDATE 2
Just to add some more context, because I think I missed one key point. The field I'm indexing will be searched using ranges, and both min and max values can be provided by user. So it is possible that one of them will be number stored as a string, while other will be a standard non-numeric text. For example search all document where ID >= "11" and ID <= "foo".
Assumig that database contains documents with ID "1", "5", "alpha", "beta", "gamma", this query should return "5", "alpha", "beta". Please note that "5" should actually be returned, because string "5" is greater than string "11".
Our team just came to a workaround solution. We managed to get proper results by adding some arbitrary character, e.g. 'a' to an upper range value, and by introducing additional search term, to exclude documents having ID between upper range value and upper range value + 'a'.
When searching for a range
ID:[X TO Y]
actual query would be
(ID:[X TO Ya] AND -ID:{Y TO Ya])
For example, to find a documents having ID between 23 and 758, we execute
(ID:[23 TO 758a] AND -ID:{758 TO 758a]).
First of all, I would suggest to use keyword analyzer, so you can control the right tokenization during both indexing and search.
"analyzer": "keyword",
"index": "function(doc){\n if(!doc.ID) return;\n index(\"ID\", doc.ID, {store:true });\n}
To retrieve you document with _id "111", use the following range query:
curl -X GET "http://.../facetrangetest/_design/ddoc/_search/f?q=ID:\[111%20TO%A\]"
If you use a query q=ID:\[111%20TO%20999\], Cloudant search seeing numbers on both size of the range, will interpret it as NumericRangeQuery; and since your ID of "111" is a String, it will not be part of the results returned. Including a string into query [111%20TO%20A], will make Cloudant interpret it as a range query on strings.
You can get both docs returned like this:
q=ID:["111" TO "CCC"]
Here's a working live example:
https://rajsingh.cloudant.com/facetrangetest/_design/ddoc/_search/f?q=ID:[%22111%22%20TO%20%22CCC%22]
I found something quirky. It seems that range queries on strings only work if at least one of the range values is a string. Querying on ID:["111" TO "555"] doesn't return anything either, so maybe this is resolving to a numeric query somehow? Could be a bug.
This could also be achieved using regular expressions in queries. Something line this:
curl -X POST "https://.../facetrangetest/_design/ddoc/_search/f" -d '{"q":"ID:/<23-758>/"}' | jq .
This regular expressions means to retrieve all documents with ID field from 23 to 758. Slashes: / / are used to enclose a regular expression; the interval is enclosed inside <>.

looking for the first time specific characters appear

In vba I am openening a table from access with a column that look like the following:
1300nm11-53-0202 0302.SOR
I would like to look for the very first time "nm" is found in the string and write everything that is before that into a variable "strGolfLengte" (so In this case strGolflengte would be "1300")
NB:
I can't be sure that there won't be several nm's in the string, I just want to look for the first time they are found.
NB2:
The string before nm could be "n" characters, in all cases, I want the full lenght (n) of the string written in strGolflengte
I would use the `instr()' function like this:
strGolfLengte = left(myLine,instr(1,myLine,"nm",1))
I think it is the easiest way to do this:
strGolfLengte = Split(myLine,"nm")(0)