How do I find specific substring in a string - vb.net

My full question title was too long, but it should be asked here:
How do I find all instances of a specific substring in a string accounting for spaces and special characters potentially being on either side of the substring
What I mean is this. I am writing a SQL code formatting assistance program in VB.Net. This program will help when I am following up on truly porrly writen SQL. A for instance is (and please ignore the syntax failure here, I am not good at writting bad code in SQL):
if exists(
select *
from dbo.table
where field1 = (if exists (select field1
from dbo.table1
where field2 = '123')
select field1 from table2)
My program is still in the early stages. I have already identified most of the keywords, and written the code that will put them in the proper case format. So in the bad code example from above all of the selects will be Select. To do this I have created a list of key words in array form, and use this array in the following function:
Private Function FindAndReplace(ByVal findWhat As String, _
ByVal replaceWith As String, ByVal focusLine As String) As String
focusLine = Microsoft.VisualBasic.Strings.Replace(focusLine, findWhat, _
replaceWith, 1, -1, Constants.vbTextCompare)
Return focusLine
End Function
The good news is this works really well with words like Select. Words like If, Go, On, and End are a bit more challenging. If I have the word Send, it will replace it with the word SEnd because End is a keyword. On many of these instances I can account for this by putting the smaller words before the larger words. I have added Send as a keyword because of the number of times that word appears in user messages on our systems.
I cannot seem to account for words like On, If, or Go. I considered searching for " Go ", " On ", ")Go ", " On(", etc. but there are times when Go is going to be the first word on the line...or the only.
What I need is a VB.Net means of searching a string for all of the instances of a given substring (such as If). I was thinking I would check if it was the first word in the string, or seeing if it is surrounded by any combination of spaces or special characters (or not surrounded by other letters and underscores, etc.). I would update those that met my requirements, and leave the others alone.
I am drawing a blank on how to do this, and I could really use some assistance.

I am writing a SQL code formatting assistance program
I'd recommend starting with an existing SQL parser.
Pete Sestoft's excellent Programming Language Concepts book introduces parsing fundamentals including writing Lexer and Parser specifications for Micro-SQL in Chapter 3.
The open source Irony project includes an SQL grammar sample.
Use your favourite search engine to find others.
What I need is a VB.Net means of searching a string for all of the instances of a given substring
There are a number of ways of achieving this:
Split the string into words and then search those words for instances.
Use a state machine to iterate over the string and check words after white space.
With option 2 you can handle quoted strings and maintain an index for each word, here's a short example in F#: http://fssnip.net/f6

Related

How can I find a word with a new line in the VBA editor using find and replace?

I would like to go through and find all of the "End" statements in my code but skipping all of the "End x" statements like "End If", "End Sub", "End function", etc.--Just the pure "End". My thought was to use pattern matching, but I am unsure of how to do that.
I already tried using "End\n" and "End[\n]".
Does anyone know how to search for words that end in new lines?
The "find" function in the VBA editor does not support this kind of parameter/functionality.
You will have to manually step through the results and skip the ones you don't want to skip, or manually modify the "End" instances you don't want to catch, then search & replace, and finally restore all the End instances back to what you want.
Apologies for answering so long after the question was asked, but thought this information would help future readers as this question is still being actively found.
#TylerH is right that the specific search requested by the user cannot be performed in the VBE Find tool. For information, when "Use Pattern Matching" is selected the VBE Find tool supports use of:
? - single character
* - zero or more characters (on the same line)
# - single digit (0 to 9)
[charlist] - any single character in charlist
[!charlist] - any single character not in charlist
... where charlist can be a range of characters (eg [A-Z]) but must be in order (eg [Z-A] is not valid), it can also include multiple ranges of characters (eg [A-BD-E] matches A, B, D or E). Also to match any of ?, * or # then enclose them in square brackets (eg [*] matches an asterisk).
This means the VBE Find tool performs very similarly (perhaps identically ... but I can't provide assurances, VB and VBA not being the same language) to the VB Like operator, for which documentation is here
The alternative (which will perform the specific search in the question) is to use the 'Find Text' tool in the VBE Add-In MZ-Tools - though note MZ-Tools is a paid-for tool ... please note I am NOT in any way associated with MZ-Tools or it's author. The search text to use in MZ-Tools for the specific search requested in the question is: end\r?$

How to change VBA array decimal separator?

I would like to change VBA array decimal separator to dot. I see it as comma. I tried: Application.DecimalSeparator="."
But when I get the value as MyString = matrix(2, 1), the decimal separator in VBA arrays maliciously persists as comma. I am not able to get rid of the pest.
Is there a way to detect which system separator for VBA arrays is used?
VBA uses quite a few bits drawn from various parts of the platform to work out which decimal and thousands separator to use. Application.DecimalSeparator changes a few instances (mostly on the workbook); you can tweak others at the OS level, but even then though you get to a couple of cases where you can't change the settings.
Your best bet is to write a simple function to check which separator your platform uses based on a trial conversion of say 1.2 to a string and see what the second character ends up being. Crude but strangely beautiful.
Armed with that you can force an interchange of . and , as appropriate. Naturally then though you will have to manage all string to number parsing yourself, with some care.
Personally though I think this is epitomises an unnecessary fight with your system settings. Therefore I would leave everything as it is; grin and bear it in other words.
You have to change it in system settings, Excel takes this kind of settings from system.
I have end up with this function which does exactly what I want. Thank you all for answers, comments and hints.
Function GetVBAdecimalSep()
Dim a(0) As Variant
a(0) = 1 / 2
GetVBAdecimalSep = Mid(a(0), 2, 1)
End Function

How to categorise a column in excel based on another column containing value in string with delimiter ; where string can have two separate words

This question is a follow up to a previous query: How to categorise a column in excel based on another column that contains value in string separated by semicolons
I have the following spreadsheet (please click on link below for image):
Raw data and expected output
My question is I want to categorise the raw data so the output is as pictured in B16:B34 and C16:C34. I am trying to categorise people by their interests when their interests are in a column containing strings separated by semicolon with multiple words. The Name can come up multiple times according to their interests where in this case Movie, Action Movie, Music, Rock Music, Jazz Music and Radio.
I have tried the answer provided by #Glitch_doctor:
{=IFERROR(INDEX($B$1:$B$11,SMALL(IF(ISNUMBER(SEARCH(B$15,$C$1:$C$11)),ROW($C$1:$C$11)-ROW(INDEX($C$1:$C$11,1,1))+1),ROW(1:1))),"")}
The issue with the answer is that when a person's interest is actually Action Movie they show up in the Movie categorisation. I am trying to match when their interest is Action Movie they only show up in Action Movie after categorisation. Multiple words doesn't seem to work with SEARCH function.
I tried to replace the SEARCH function with a VBA code:
Function ProjectSearch(ByVal strProj As String, ByVal strVal As String, _
Optional ByVal delimiter As String = ";") As Boolean
Dim i As Long
Dim strSplit() As String
strSplit = Split(strProj, delimiter)
ProjectSearch = False
For i = LBound(strSplit) To UBound(strSplit)
If strSplit(i) = strVal Then
ProjectSearch = True
Exit Function
End If
Next i
End Function
Since I am a newbie it doesn't seem to work. My question is there a way where I can do what I want without VBA? If I do need to use VBA what do I need to do?
Thanks kindly in advance.
Okay, that was much simpler than I was expecting it to be:
=IFERROR(INDEX($C$1:$C$11,SMALL(IF(ISNUMBER(IF(SEARCH(E$15,$D$1:$D$11)=1,SEARCH(E$15,$D$1:$D$11),SEARCH(CONCATENATE("; ",E$15),$D$1:$D$11))),ROW($D$1:$D$11)-ROW(INDEX($D$1:$D$11,1,1))+1),ROW(1:1))),"")
I only changed the ISNUMBER(SEARCH(B$15,$C$1:$C$11)) part to ISNUMBER(IF(SEARCH(E$15,$D$1:$D$11)=1,SEARCH(E$15,$D$1:$D$11),SEARCH(CONCATENATE("; ",E$15),$D$1:$D$11)))
The addition IF statement is saying that if the search finds the single word as the first character in the cell to accept that search into the array, otherwise search beginning with "; " instead.

Search for specific word in VBA string

There is a code I am using to search for specific words in a VBA string. Currently I'm using InSTR:
tag1 = InStr(1, dish, keyword, vbTextCompare)
However the challenge is that the search is comparing strings and not words.
E.g. If I'm searching for "egg" as a keyword in a string - Eggplant Pizza, it's returning a true value.
Ideally I would simply like to search if the word "egg" exists in a string. Is there a better function I can use?
You could also use Regular Expressions to achieve this in VBA.
Looking specifically at the ^ and $ operators to force the full word match.
So in your case something like ^Egg$ as the pattern should do what you want.
See here for some good help on this:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
InStr is the way to go. For finding out how many times a string is present in a text, you can use this one-line code. Here I use your example.
Debug.Print UBound(Split("Eggplant Pizza", "Egg"))
To make the code case insensitive, you can put Option Compare Text on top of your code module.

How to tell if carriage return is in my string?

I have a textbox control that allows for the user of enter button to enter details like ADdresses or other demographic information. Since the default for addresses are as follows:
Address 1
Address 2
City, st
Zip
I am wondering if there is a way to tell if the Enter key was used to make a new line here? I've looked around and currently the only is to have a check in VB for vbCrLf however I'm not seeing it pick this up in the code.
Test data for this would be something similar to below
123 N Street
S Test Street
Test City, XX
91883
The code below is what I'm trying to just replace any return carriage and replace with a space
Text.Replace(vbCrLf, " ")
Will this vbCrLf not pick up a carriage return unless there's an actual space between the above test values?
If you are using a MultiLine textbox (as it seems from your sample) then you don't need to search for the newline characters and replace them with a space.
You could simply use the Lines property where every line is stored separated from the other and then use the string Join method to create a single line string
Dim singleLine = string.Join(" ", myTextBox.Lines)
Of course if you are just interested to know if there is a newline character then just check the Length property of the Lines array
vbCrLf actually refers to two characters, a carriage return (13) and a line feed (10).
I would search and replace each separately. It isn't strictly necessary (as replace will work on the two characters at once), but can catch instances in which the user cut and pasted information, instead of typing directly into the text box.
Text = Text.Replace(vbCr, " ")
Text = Text.Replace(vbLf, " ")
or
Text = Text.Replace(vbCr, " ").Replace(vbLf, " ")
https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.constants.vbcrlf(v=vs.110).aspx
The Replace() function will indeed properly detect and replace all occurrences of the target with the replacement - it does not matter whether there are leading or trailing spaces.
However, please consider that String objects are immutable and cannot be changed after they have been instantiated. Therefore, Replace() does not modify the existing object but rather returns a new string as its result.
To actually see the results of the function call, you need to do something along these lines:
newString = Text.Replace(vbCrLf, " ")
I spent quite some time to resolve exactly this problem, the solution I came across was:
text = text.Replace(" ", ControlChars.CrLf)
Sorry cant remember where I found the solution but if I do remember it I will post the link here.