I know that the sentences collection is just a just a bunch of ranges, but I have not been able to determine exactly what criteria are used to decide where those ranges begin and end. I have been able to determine that a period (.) a question mark (?) or an exclamation point (!) followed by one or more spaces is the end of a sentence and that the spaces are included in the sentence range. I have also determined that if there are no spaces between what you and I would consider two sentences MS-Word considers it as only one sentence.
The problem is when you start putting in things like tabs, page breaks, new line characters etc. things become unclear. Can anyone explain precisely or point me to some reference material what criteria MS-Word uses to decide where one sentence ends and another begins?
Seems to be based on a delimiter of a sentence ending type (e.g. ".","!","?"). If you explain what you are trying to do or post some code more people will be willing to help.
If you are concerned about combined sentences (e.g. This is a single Sentence.Even though it is deliminated) you could expand upon this basic methodology. Special Characters seem to be much harder to handle. So positn what you are trying to do would be suggested
Sub sent_counter()
Dim s As Integer
For s = 1 To ActiveDocument.Sentences.Count
ActiveDocument.Sentences(s) = splitSentences(ActiveDocument.Sentences(s))
Next s
End Sub
Function splitSentences(s As String) As String
Dim delims As New Collection
Dim delim As Variant
delims.Add "."
delims.Add "!"
delims.Add "?"
Dim ender As String
Dim sub_s As String
s = Trim(s)
ender = Right(s, 1)
sub_s = Left(s, Len(s) - 1)
For Each delim In delims
If InStr(1, sub_s, delim) Then
sub_s = Replace(sub_s, delim, delim & " ")
End If
Next delim
splitSentences = sub_s & ender
End Function
Related
Hey so i have a school project in which i need to split a massive word into smaller words.
This is the massive sequence of letters :
'GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGH
HEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG'
and then i need to split it into other smaller separate parts of itself which would look like this :
'GLSDGEWQQVLNVWGK'
'VEADIAGHGQEVLIR'
'LFTGHPETLEK'
'FDK'
'FK'
'HLK'
'TEAEMK'
'ASEDLK'
'K'
'HGTVVLTALGGILK'
'K'
'K'
'EGHHEAELKPLAQSHATK'
'HK'
'IPIK'
'YLEFISDAIIHVLHSK'
'HRPGDFGADAQGAMTK'
'ALELFR'
'NDIAAK'
'YK'
'ELGFQG'
i have no idea how to start on this if you could help pls and thanks
Different digestion enzymes cut at different positions within a protein sequence; the most commonly used one is trypsin. It follows the following rules: 1) Cuts the sequence after an arginine (R) 2) Cuts the sequence after a lysine (K) 3) Does not cut if lysine (K) or arginine (R) is followed by proline (P).
Okay, hooray, rules! Let's turn this into pseudo-code to describe the same algorithm in a half-way state between the original prose and code. (While a Regex.Split approach would still work, this might be a good time to explore some more fundamentals.)
let the list of words be an empty array
let current word be the empty string
for each letter in the input:
if the letter is R or K and the next letter is NOT P then:
add the letter to the current word
save the current word to the list of words
reset the current word to the empty string
otherwise:
add the letter to the current word
if after the loop the current word is not the empty string then:
add the current word to the list of words
Then let's see how some of these translate. This is incomplete and quite likely contains minor errors1 beyond that which has been called out in comments.
Dim words As New List(Of String)
Dim word = ""
' A basic loop works suitably for string input and it can also be
' modified for a Stream that supports a Peek of the next character.
For i As Integer = 0 To input.Length - 1
Dim letter = input(i)
' TODO/FIX: input(i+1) can access an element off the string. Reader exercise.
If (letter = "R"C OrElse letter = "K"C) AndAlso Not input(i+1) = "P"C
' StringBuilder is more efficient for larger strings
Set word = word & letter
words.Add(word) ' or perhaps just WriteLine the word somewhere?
Set word = ""
Else
Set word = word & letter
End If
Next
' TODO/FIX: consider when last word is not added to words list yet
1As I use C# (and not VB.NET) the above code comes warranty Free. Here are some quick reference links I used to 'stitch' this together:
https://www.dotnetperls.com/loop-string-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/programming-guide/language-features/operators-and-expressions/concatenation-operators
https://www.dotnetperls.com/list-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/language-reference/statements/dim-statement
How do you declare a Char literal in Visual Basic .NET?
Purpose: Use a SQL Query in MS Access to locate all records matching specific keywords in a long text field
I am attempting to query for all records in a MS Access DB that have a match on a list of specific keywords within a field. The keywords are as follows:
AIN, ATIN, CKD, AKI, ARF
Issue I'm running into is that the field is a free text entry field, so the formatting of the data is all over the place, and the keywords I'm searching on will often appear in the middle of other full length words (i.e. AIN matches on "pAIN","agAIN", etc), while I only want to include matches on words that are strictly the keywords (i.e. " AIN ", " AKI ").
The idea I'm working with is to simply include matches that will hit on the following format: field_name like '* AIN *'.So basically only include matches that have a space before and after the keyword to limit the number of false positives appearing in the result set.
I have tried writing a SQL query that will normalize the data so that all other characters that appear (".","!","?","#", etc...) will be replaced with a space character (i.e. " AIN!" would be replace(field_name,"!"," ") = " AIN ") with the idea that this should only include words containing only the keyword. In attempting to run my very long nested replace statement in the query, I am receiving the "Query Too Complex" message. Nested replace is as follows:
UCASE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(a.REF_CONTENT_NM,chr(13)," "),chr(10)," "),"`"," "),"~"," "),"!"," "),"#"," "),"#"," "),"$"," "),"%"," "),"^"," "),"&"," "),"*"," "),"("," "),")"," "),"-"," "),"_"," "),"="," "),"+"," "),"["," "),"{"," "),"]"," "),"}"," "),";"," "),":"," "),","," "),"<"," "),"."," "),">"," "),"/"," "),"?"," "),"\"," "),"|"," "),""""," ")) like "* AIN *"
I believe that a workaround would be to create a custom function that could be referenced in the SQL statement, but I am not entirely sure of how to accomplish this. So essentially, I am looking for guidance on how to evaluate a solution of how to normalize the text like the above nested replace statement in Access without running into the "Query Too Complex message". I feel like there is a simple solution that I am just not seeing here, so guidance would be tremendously appreciated!
The main trick to writing a custom function to do this is properly using the ParamArray
This is a small function that executes multiple replaces:
Public Function ReplaceMultiple(strInput As String, strReplace As String, ParamArray Find() As Variant) As String
Dim i As Long
ReplaceMultiple = strInput
For i = LBound(Find) To UBound(Find)
ReplaceMultiple = Replace(ReplaceMultiple, Find(i), strReplace)
Next
End Function
Implement it:
ReplaceMultiple(a.REF_CONTENT_NM, " ", chr(13), chr(10), "`", "etc....")
You might need to think about altering the logic altogether, though, for example keeping a table of characters that should be replaced. I remember something about the max number of arguments being around 20-30, so you might need to use ReplaceMultiple twice.
If you just want to replace everything that isn't a string with a space, you can try the following small function:
Public Function ReplaceNonAlphanumeric(str As String) As String
If str = "" Then Exit Function
Dim i As Long
For i = 1 To Len(str)
If Mid(str, i, 1) Like "[0-9A-z]" Then
ReplaceNonAlphanumeric = ReplaceNonAlphanumeric & Mid(str, i, 1)
Else
ReplaceNonAlphanumeric = ReplaceNonAlphanumeric & " "
End If
Next
End Function
First of all I have little to no knowledge about VBA.. probably none at all. However I was asked to create a VBA program that paste text from clipboard in different cells. My text has the following format:
seminar: name of Seminar (in cell(1,1))
first name: participant's first name (in cell(1,2))
last name: participant's last name (in cell(1,3)) etc..
So far I was able to read the text from clipboard. Then I found the position of the ":" in order to paste only what is AFTER it in the cell.
At this point I thought to find the position of the RETURN character in order to know where the first line ends(ex. "name of Seminar") with this line of code which I found online:
end_str = InStr(str, vbCrLf) - 1
and with the Right (string, length) function to get the relative text.
This is not working. I think because there are not return character in the string variable that holds the data? I don't know.
My question is: Is it possible to check the RETURN character somehow or Is there a better way to create this program?
Thank you in advance.
An easy way would be to use the split function to get each line separately:
Suppose you have a function called ClipBoard_GetData that returns the text from ClipBoard, you could use something like this:
Dim lines() As String
lines = Split(ClipBoard_GetData, vbNewLine)
For Each Line In lines
' Parse each line to get whatever parts you want
Next
This should work fine.. and if you don't -already have a function that gets what's in the clipboard, you could refer to this link
Hope that helps :)
Most likely the Ascii code you're after is 10 (ie newline). So you could find the position of the newline like so:
i = Instr(str, Chr(10))
However, are you aware that you don't need to parse that clipboard text at all. You can write arrays directly into worksheet cells. So all you'd need to do is use the Split function. The procedure below will complete everything you need:
Public Sub PasteText(str As String)
Dim arr() As String
Dim cols As Integer
arr = Split(str, Chr(10))
cols = UBound(arr) + 1
Sheet1.Range("A1").Resize(, cols).Value = arr
End Sub
Hey guys I'm trying to make a program that helps people encrypt messages and decrypt messages using the Caesar shift cipher, I know it's probably already been done, I want to have a go myself though.
The problem I've been having is when it comes to encrypting the text. The user selects a number (between 1-25) and then the application will change the letters corresponding to the number chosen, e.g. if the user inputs "HI" and selects 2, both characters are moved two places down the alphabet outputting "JK". My main problem is the replacing characters though, mostly because I've set up the program to be able to encrypt large blocks of text, because my code is:
If cmbxKey.Text = "1" Then
If txtOutput.Text.Contains("a") Then
sOutput = txtOutput.Text.Replace("a", "b")
txtOutput.Text = sOutput
End If
If txtOutput.Text.Contains("b") Then
sOutput = txtOutput.Text.Replace("b", "c")
txtOutput.Text = sOutput
End If
End If
This means if the user inputs "HAY" it will change it to "HBY" and then because of the second if statement it will change it to "HCY" but I only want it to be changed once. Any suggestions to avoid this???? Thanks guys
Since you want to shift all characters, start out by looping though the characters using something like ToArray:
For each s as string in txtOutput.Text.ToArray
'This will be here for each character in the string, even spaces
Next
Then, rather than having cases for every letter, look at it's ascii number:
ACS(s)
...and shift it by the number you want to. Keep in mind that if the number is greater than (I don't know if you want upper/lower case) 122, you want to subtract 65 to get you back to "A".
Then you can convert it back into a character using:
CHR(?)
So this might look something like this:
Dim sb as new text.StringBuilder()
For each s as string in txtOutput.Text.ToArray
If asc(s) > 122 Then
sb.append(CHR(ASC(s) + ?YourShift? - 65)
Else
sb.append(CHR(ASC(s) + ?YourShift?)
END IF
Next
txtOutput.Text = sb.ToString
A very simple method of changing your application while keeping your strategy is to replace the lower case characters with upper case characters. Then they won't be recognized by the Replace method anymore.
Obviously, the problem is that you want to implement an algorithm. In general, an algorithm should be smart in the sense that you don't have to do the grunt work. That's why a method such as the one presented by Steve is smarter; it doesn't require you to map each character separately, which is tedious, and - as most tedious tasks - error prone.
One big issue arise when you're facing a String that the basic Alphanumeric table can't handle. A String that contains words like :
"Déja vu" -> The "é" is going to be what ?
And also, how about encoding the string "I'm Aaron Mbilébé" if you use .ToUpper().
.ToUpper returns "I'M AARON MBILÉBÉ".
You've lost the casing, and how do you handle the shifting of "É" ?
Of course, a code should be smart as pointed above, and I was used to deal with strings just by using the System.Text.ASCIIEncoding to make things easier. But from the moment I started to use large amount of textual datas, sources from the web, files (...) I was forced to dig deeper, and seriously consider string encoding (and System Endianness by the way, when coding and decoding string to/from array of bytes)
Re-think of what do you really want in the end. If you're the only one to use your code, and you're certain that you'll only use A..Z, 0..9, a..z, space and a fixed amount of allowed characters (like puntuation) then, just build a Table containing each of those chars.
Private _AllowedChars As Char() = { "A"c, "B"c, ... "0"c, "1"c, .. "."c, ","c ... }
or
Private _AllowedChars As Char() = "ABCDEF....012...abcd..xyz.;,?:/".ToCharArray()
Then use
Private Function ShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
Dim AllChars As Char() = CurrentString.ToCharArray()
Dim FinalChars As Char()
Dim i As Integer
FinalChars = New Char(AllChars.Length - 1) {} ' It's VB : UpperBound is n+1 item.
' so n items is UpperBound - 1
For i = 0 To AllChars.Length - 1
FinalChars(i) = _AllowedChars((Array.IndexOf(_AllowedChars, AllChars(i)) + ShiftValue) Mod _AllowedChars.Length)
Next
Return New String(FinalChars)
End Function
And
Private Function UnShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
' ... the same code until :
FinalChars(i) = _AllowedChars((Array.IndexOf(_AllowedChars, AllChars(i)) - ShiftValue + _AllowedChars.Length) Mod _AllowedChars.Length)
' ...
End Function
^^ Assuming ShiftValue is always positive (defined once)
But again, this only works when you have a predefined set of allowed characters. If you want a more flexible tool, you ought to start dealing with encodings, array of byte, BitConverter and have a look at system endianness. That's why I asked if someone else is goind to use your application : let's try this string :
"Xin chào thế giới" ' which is Hello World in vietnamese (Google Trad)
In that case, you may give up..? No ! You ALWAYS have a trick in your cards !
Just create your allowed chars on the fly
Private _AllowedChars As New SortedList(Of Char, Char)
-> get the string to encode (shift)
Private Function ShiftChars(ByVal CurrentString As String, ByVal ShiftValue As Integer) As String
Dim AllChars As Char() = CurrentString.ToCharArray()
Dim FinalChars As Char()
Dim i As Integer
' Build your list of allowed chars...
_AllowedChars.Clear()
For i = 0 To AllChars.Length - 1
If Not _AllowedChars.ContainsKey(AllChars(i)) Then
_AllowedChars.Add(AllChars(i), AllChars(i))
End If
Next
' Then, encode...
FinalChars = New Char(AllChars.Length - 1) {}
For i = 0 To AllChars.Length - 1
FinalChars(i) = _AllowedChars.Keys.Item((_AllowedChars.IndexOfKey(AllChars(i)) + ShiftValue) Mod _AllowedChars.Count)
Next
Return New String(FinalChars)
End Function
The same for Unshift/decode.
Note : in foreing languages, the resulting string is pure garbage and totally unreadable, unless you (un)shift the chars again.
However, the main limitation of this workaround is the same as the fixed chars array above : Once you encode your string, and add a char in your encoded string that doesn't exists in the initial generated allowed chars, then you've nuked your data and you won't be able to decode your string. All you'll have is pure garbage.
So one day... one day maybe, you'll have to dig deeper at the byte level of the thing, in a defined extended encoding (Unicode/UTF8/16) to secure the integrity of your data.
Hey guys I'm stuck with this question. Please help.
I want to write a program that can extract alphabetical characters and special characters from an input string. An alphabetical character is any character from "a" to "z"(capital letters and numbers not included") a special character is any other character that is not alphanumerical.
Example:
string = hello//this-is-my-string#capetown
alphanumerical characters = hellothisismystringcapetown
special characters = //---#
Now my question is this:
How do I loop through all the characters?
(the for loop I'm using reads like this for x = 0 to strname.length)...is this correct?
How do I extract characters to a string?
How do I determine special characters?
any input is greatly appreciated.
Thank you very much for your time.
You could loop through each character as follows:
For Each _char As Char In strname
'Code here
Next
or
For x as integer = 0 to strname.length - 1
'Code here
Next
or you can use Regex to replace the values you do not need in your string (I think this may be faster but I am no expert) Take a look at: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
Edit
The replacement code will look something as follows although I am not so sure what the regular expression (variable called pattern currently only replacing digits) would be:
Dim pattern As String = "(\d+)?" 'You need to update the regular expression here
Dim input As String = "123//hello//this-is-my-string#capetown"
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, "")
Since you need to keep the values, you'll want to loop through your string. Keeping a list of characters as a result will come in handy since you can build a fresh string later. Then take advantage of a simple Regex test to determine where to place things. The psuedo code looks something like this.
Dim alphaChars As New List(Of String)
Dim specialChars As New List(Of String)
For Each _char As Char in testString
If Regex.IsMatch(_char, "[a-z]")) Then
alphaChars.Add(_char)
Else
specialChars.Add(_char)
End If
Next
Then If you need to dump your results into a full string, you can simply use
String.Join(String.Empty, alphaChars.ToArray())
Note that this code makes the assumption that ANYTHING else than a-z is considered a special character, so if needs be you can do a second regular expression in your else clause to test for you special characters in a similar manner. It really depends on how much control you have over the input.