Case sensitive search from text file fix? - vb.net

A few days ago, I asked a question on Stack Overflow, asking how to search a text file for matching strings from a search text box. This has worked great so far, except from the fact that the search was case sensitive. I thought of a way of overcoming this, however it wouldn't work in the way I necessarily wanted it to.
My idea/solution:
If ListBox.Items.Count = 0 Then
tbx_FindText.CharacterCasing = CharacterCasing.Upper
ElseIf ListBox.Items.Count = 0 Then
tbx_FindText.CharacterCasing = CharacterCasing.Lower
End If
This would essentially try both fully upper and lower case, but what happens if the user types a search request such as 'Gsk', well the 'G' is capitalized, but the other characters aren't (because the string is mixed case, not fully upper or lower case), and if it is not the exact same as the string in the text file (whether it be fully upper or lower case or mixed case, then the program reports that there are no search results, when there are - it's just that the search algorithm used is case sensitive and doesn't recognize/search it properly.
Search Algorithm Code:
Dim lines1() As String = IO.File.ReadAllLines("C:\ProgramData\WPSECHELPER\.data\Outlook Folder Wizard\outlookfolders.txt")
lbx_OFL_Results.Items.Clear()
lbx_OFL_Results.BeginUpdate()
For i As Integer = 0 To lines1.Length - 1
If lines1(i).Contains(tbx_FindText.Text) Then lbx_OFL_Results.Items.Add(lines1(i))
Next
lbx_OFL_Results.EndUpdate()
Essentially, the code opens the text file, which contains several Outlook Folder Paths needed by employees to do their jobs. They enter a search for a company name or reference number into a search box, and the list box populates with matching results of paths that contain the keywords that were entered in the search text box.
That part works great - apart from the fact the list box doesn't populate with results if my search is capitalized, and the string in the text file isn't, for example.
If anyone could help compose (or reconstruct) a piece of code that searches the text file (trying to keep the code above if possible) whilst the search not being case sensitive, it would be greatly appreciated.

Don't use the ReadAllLines function since you don't need to get all the lines from the text file. This function loads everything in memory which is unnecessary especially when you are dealing with big files. Use ReadLines instead with the where extension function to get the matches:
Dim path As String = "C:\ProgramData\WPSECHELPER\.data\Outlook Folder Wizard\outlookfolders.txt"
Dim search As String = tbx_FindText.Text
Dim lines = File.ReadLines(path).Where(
Function(l) l.IndexOf(search, 0, StringComparison.InvariantCultureIgnoreCase) >= 0
).ToList
lbx_OFL_Results.DataSource = Nothing
lbx_OFL_Results.DataSource = lines

Related

cannot get value from a cell in libreoffice 6.4.3.2 basic

I am new to libreoffice basic, i have experience with VBA but this libreoffice is different.
I just want to get cell value but it always return zero value to me while the actuall cell can be text or number.
Here is a partial of my simple code.
Sub test_moved()
Dim Doc As Object
'worksheet
Dim sh_village As Object
Dim sh_cbc As Object
sh_village = ThisComponent.CurrentController.getActiveSheet()
'sh_village = Doc.Sheets.getByName("VillageFinal")
'sh_village = Doc.Sheets(1)
Msgbox(sh_village.getCellrangeByName("B2").getValue())
Msgbox(sh_village.getCellrangeByName("B2").Value)
Msgbox(sh_village.getCellByPosition(1,1).Value)
msgbox("The process is completed.")
End Sub
Do we need to do prior task before start coding?
The code works correctly for numeric values. However, for strings, including strings that look like a number, it will display 0 because there is no numeric value.
What you probably want instead is:
MsgBox(sh_village.getCellRangeByName("B2").getString())
Also check out Format -> Cells -> Number to see how the data is displayed in the cell. And be on the lookout for a single quote at the front of the value in the formula bar (for example '42), because that means it is a string. Delete the quote to make it a number.
i have experience with VBA but this libreoffice is different.
Yes, LibreOffice Basic is a different language from VBA and the LibreOffice API is very different from the MS Office API. Knowing that will help you use it more effectively. If possible, avoid Option Compatible, because it won't fix most problems and will only muddy the waters.

VB.NET - Substring function that stops reading at first integer, possible?

I currently have a text file that has a little over 500 lines of paths.
(i.e., N:\Fork\Cli\Scripts\ABC01.VB)
Some of these file names vary in length (i.e., ABC01.VB, ABCDEF123.VB, etc)
How can I go about using substring function to remove the path name, numbers, and file type, leaving just the letters.
For example, processing N:\Fork\Cli\Scripts\ABC01.VB, and returning ABC.
Or N:\Fork\Cli\Scripts\ZUBDK22039.VB and returning ZUBDK.
I've only been able to retrieve the first 3 letters using this code
Dim comp As String = sLine.Substring(28, 3)
sw.WriteLine(comp)
As Plutonix points out, the best way to isolate the file name from a path is with System.IO.Path.GetFileNameWithoutExtension.
You can extract just the letters (not digits or other characters) from a filename like this:
Dim myPath As String = "N:\Fork\Cli\Scripts\AB42Cde01.VB"
Dim filename As String = System.IO.Path.GetFileNameWithoutExtension(myPath)
Dim letters As String = filename.Where(Function(c) Char.IsLetter(c)).ToArray
The above code sets letters to ABCde.
The code relies on the fact that Strings are treated like arrays of characters. The Where method processes all the characters in the string (array) and selects only the ones that are letters (using the Char.IsLetter method). The selected characters are converted to an array (string) that is assigned to the letters variable.
I see from your latest comment that it is not possible for numerals to be mixed with the letters (as in my example). However, the code should still work in your case.

Word.Range : Move Range index in the formatted text that corresponds to the plain text

I need to analyze text of my Word document, and create bookmarks on range of text my analyzer has detected (almost like a grammar checker).
I don't want use Find() utility, because my needs are too specific.
Explanations
For that,
1/ Retrieve Document plain text
I Retrieve Plain text of the main story of my document :
String plainText = ActiveDocument.Range().Text;
2/ Analyze plain text and get results
I send it to my analyzer tool which return a collection of marker with position :
For example, if I wanted to detected the pattern "my pattern" in the document text, analyzer could return a marker as { pattern : "my marker", start: 5, end : 14 }, where "start" and "end" are the character indexes of the pattern in the plain text sent.
3/ Display results in Document
I create bookmark from theses markers
For previously example, it woold be :
// init a new range and collapse it
Word.Range range = activeDocument.Range(); range.Collapse(WdCollapseStart);
// move character-by-character in the "formatted" text
range.MoveStart(WdUnits.Character, Marker.start ); # Marker.start=5
//set length (end)
range.setRange(range.Start,range.Start+(Marker.End-Marker.Start)); #Marker.end=14
4/ Results
4.1 Global Result
Everything is OK when Document Main Story Contains Text, links, lists, titles :
Ranges are well positionned, Plain Text indexes correlate with formatted text indexes.
4.2 Arrays Issue
When a document contains an array, Ranges are bad positionned a few characters : Plain Text indexes correlate not exactly with formatted text indexes.
I found the reason of this issue (It was explained in others forums) : this is due to non printing char(7), which is a cell delimiter added in plain text. We can handle these chars to calculate position range and everything is OK !
4.3 Issue for Content Controls, Table of contents, Sections and others
When a document contains theses elements, Ranges are also bad positionned a few characters.
Others non printing appears in plain text but I don't understand what it means and how deal with to calculate position range.
By displaying Word element markers with "Developer ribbon > creation mode", we see 2 markers per elements : shifting plain text indexes by 2*elements resolve issues. It's seems OK.
4.4 Issue with Endpaper
I don't know how we says "page de garde" (french) in english, I think it's "endpaper" : this is the first page with specific header, footer and content controls :)
When a document contains an Endpaper, Ranges are also bad positionned a few characters.
But this time, there are not non printing marker in the plain text.
Other info, when I display word element markers with "Developer ribbon > creation mode", I see endpaper markers.
Questions
How detect Endpaper in Word Document Range ?
How understand Plain Text indexes don't always correlate with formatted text indexes, in function of Word document elements which contains ?
XML nodes manipulation would be a more reliable alternative for that? If yes, could you give me good examples to manage bookmars or others in current document with XML Api ?
Others ressources
I found similar issues :
Correlate Range.Text to Range.Start and Range.End
http://www.vbaexpress.com/forum/showthread.php?36710-Strange-character-on-table-range-text
I hope my explanations are clear and you can help me to understand what is wrong or show me a best way to do that ?
Thanks, really.
It's not really pretty but you can try to remove the unwanted characters by Regex. For example to remove the \a letters (it has code 7):
string j = new string(new char[] { (char)7 });
plainText = Regex.Replace(plainText,string.Format("[{0}]", j), "");
Now you have to identify the other 'evil' characters and add them to the char array. If it works you will get a string whose length corresponds with the number of Characters in your document. Probably you have to adapt this code by experimenting. (I was not sure which language you are using - I supposed C#.)
Update
Another idea (if it is applicable to your analyzer tool):
Break your problem down to single paragraphs:
foreach(Word.Paragraph pg in activeDocument.Paragraphs)
{
Word.Range range = pg.Range();
string text = range.Text;
// your stuff here
}
With this paragraph range objects and the contained text strings you do the same as you tried to do with the whole document object and its text - just paragraph by paragraph. All these paragraphs are 'addressable' by ranges and Move operations as you already do it. I suppose that the problematic characters are outside or at the end of the paragraphs so they don't influence the character counting inside these paragraphs.
As I can't reproduce what you call endpaper I can't validate it. Besides I don't know if special text ranges as page headers and tables of content are covered by paragraphs. But at least you can reduce your problem to smaller ranges. I think it is worth trying.

Trying to scrape item from website

I was attempting to create a simple program that would pull a text item from a website and add it to the textbox. I'm simply just experimenting and thought I could do it but it is not that easy for me. I know how to get the entire source code of a website(below). It has a id I know but it does not have a tag name. So Im not really sure how to make it read through the text and only keep the part next to the id . Or would it be better to use a Webbrowser tool and then try and get the text item like that. I'm just trying to do whatever is faster. I think my 1st option is better because it would be better for the computer's ram. Using the code below I don't know what to add next?
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("Website")
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim source As String = sr.ReadToEnd()
Lets say the id is "name" for example. Viewing the source of the page this is what the part looks like(below). How can I parse through the source which is a string and find this section, get the name Brandon, and add it to the textbox.
<span id="name">Brandon</span>
There are a few ways to go about this. I'm not going to write any source code though since I haven't used Visual Basic in a very long time. But if you Google for how to do any of the following you should find many tutorials and documents on it.
Regular Expressions
Using a Regular Expression on the full source code can help you find the element by searching for the ID attribute which should be unique. Regular Expressions can sometimes be very slow, which is why if you have to perform a lot of searches on large sections of text, it should be avoided.
/<([a-z0-9]+)\sid="name"(.*?)>(.*?)<\// -> Not Tested, but might help you
String Position
Using a function that will find the position of a substring in a string would be useful. In C it's strstr and in PHP it's strpos. These type of functions will give you starting position of a string, in which your case would be searching for id="name". Once you find that, you will find the position of the end of the tag and then find the closing tag for that element. You then will perform a substring function that will get you the text starting at position X for the length of that you specify, which would be the closing tag position - end of opening tag position.
HTML / XML Library
There are probably a ton of HTML / XML libraries that will parse the document into some sort of object or an array. You then can loop through these elements until you find the one you are looking for. Some of these libraries may even have search functions of element ID's similar to how JavaScript will sort for a specific element.
These libraries may be hard to get started with, but they will offer you a lot of options in the future if you need to continue finding more HTML elements.

String Manipulation Inconsistency

This is a more general question. I am reading a document and saving its contents into a string variable. The resulting variable contains approximately 1 million characters (no cleansing). My code would then search the string, and extract key words. However, I am hung-up on an issue:
If I pass the string directly to a message box, it will show me the contents using Mid:
Messagebox.Show(Mid(searchString, startPos, endPos))
However, if I first pass the mid to a string variable, the contents are empty and the messagebox displays nothing:
Dim myString as String
myString = Mid(searchString, startPos, endPos)
Messagebox.Show(myString)
The same effect happens when I use .substring and when I use a stringbuilder.
Does anybody know why this is happening? I assume something is happening during assignment, but I am not sure what is lost?
Here is a snippet of code:
searchPos = textString.IndexOf(searchText, searchPos, StringComparison.OrdinalIgnoreCase)
MessageBox.Show(searchPos)
MessageBox.Show(Mid(textString, searchPos, 100))
So, the inconsistency is as such: the length of textString is around 3,700,000 characters. When I find the indexOf, the value returned in the first Messagebox is 455,225. However, if I try to pull out the characters using Mid, the second messagebox is blank.
Also, although it claims to be 3,700,000 characters, if I do a messagebox on textString, I am only shown around 6 characters of what appears to be XML. The file that is being searched is an old .ppt file, and I know I can just work-around it, but I am confused by how the computer can find the indexof my searchText correctly, but then cannot show me anything.
Thoughts?