Hey so i have a school project in which i need to split a massive word into smaller words.
This is the massive sequence of letters :
'GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGH
HEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG'
and then i need to split it into other smaller separate parts of itself which would look like this :
'GLSDGEWQQVLNVWGK'
'VEADIAGHGQEVLIR'
'LFTGHPETLEK'
'FDK'
'FK'
'HLK'
'TEAEMK'
'ASEDLK'
'K'
'HGTVVLTALGGILK'
'K'
'K'
'EGHHEAELKPLAQSHATK'
'HK'
'IPIK'
'YLEFISDAIIHVLHSK'
'HRPGDFGADAQGAMTK'
'ALELFR'
'NDIAAK'
'YK'
'ELGFQG'
i have no idea how to start on this if you could help pls and thanks
Different digestion enzymes cut at different positions within a protein sequence; the most commonly used one is trypsin. It follows the following rules: 1) Cuts the sequence after an arginine (R) 2) Cuts the sequence after a lysine (K) 3) Does not cut if lysine (K) or arginine (R) is followed by proline (P).
Okay, hooray, rules! Let's turn this into pseudo-code to describe the same algorithm in a half-way state between the original prose and code. (While a Regex.Split approach would still work, this might be a good time to explore some more fundamentals.)
let the list of words be an empty array
let current word be the empty string
for each letter in the input:
if the letter is R or K and the next letter is NOT P then:
add the letter to the current word
save the current word to the list of words
reset the current word to the empty string
otherwise:
add the letter to the current word
if after the loop the current word is not the empty string then:
add the current word to the list of words
Then let's see how some of these translate. This is incomplete and quite likely contains minor errors1 beyond that which has been called out in comments.
Dim words As New List(Of String)
Dim word = ""
' A basic loop works suitably for string input and it can also be
' modified for a Stream that supports a Peek of the next character.
For i As Integer = 0 To input.Length - 1
Dim letter = input(i)
' TODO/FIX: input(i+1) can access an element off the string. Reader exercise.
If (letter = "R"C OrElse letter = "K"C) AndAlso Not input(i+1) = "P"C
' StringBuilder is more efficient for larger strings
Set word = word & letter
words.Add(word) ' or perhaps just WriteLine the word somewhere?
Set word = ""
Else
Set word = word & letter
End If
Next
' TODO/FIX: consider when last word is not added to words list yet
1As I use C# (and not VB.NET) the above code comes warranty Free. Here are some quick reference links I used to 'stitch' this together:
https://www.dotnetperls.com/loop-string-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/programming-guide/language-features/operators-and-expressions/concatenation-operators
https://www.dotnetperls.com/list-vbnet
https://learn.microsoft.com/en-us/dotnet/visual-basic/language-reference/statements/dim-statement
How do you declare a Char literal in Visual Basic .NET?
Related
I'm trying to create a program which reads text as rich text, and outputs it using Markdown. I've copied the following paragraph into a RichTextBox (emphasis preserved from original)
A necessary component of narratives and story-telling. When an author of a story (be it a writer, speaker, film-maker or otherwise,) conveys a story to their audience, the audience is allowed to construct an internal representation of the world in which the story takes place (the “story world”). How the audience does this is dependent on which aspects of the world the author chooses to explicitly include in the narrative, such as the characters and characterisation, the settings and their descriptions, and information about the story world which the audience might not know.
And when I read the RichTextBox.Rtf property, it looks like this (emphasis added for demonstration):
{\rtf1\fbidis\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fswiss\fprq2\fcharset0 Arial;}{\f1\froman\fprq2\fcharset0 Times New Roman;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\ltrpar\cf1\f0\fs22 A necessary component of \b narratives and story-telling\b0 . When an \b author\b0 of a story (be it a writer, speaker, film-maker or otherwise,) conveys a story to their audience, the \b audience \b0 is allowed to construct an internal representation of the world in which the story takes place (the \ldblquote story world\rdblquote ). How the audience does this is dependent on which aspects of the world the author chooses to explicitly include in the narrative, such as the characters and characterisation, the settings and their descriptions, and information about the story world which the audience might not know.\cf0\f1\fs24\par
\pard\ltrpar\sa160\sl252\slmult1\fs22\par
\pard\ltrpar\cf1\f0\par
}
I want to extract the text content from this Rtf string - I'm not interested in the bits of code before and after the Rtf, all I want to know about is bold, italic and other formatting. I'm trying to work out how to determine where the text starts for any such given paragraph, though.
As a human, I obviously know where the text starts - right after the section I've bolded. I don't know how to tell my program what to look for though. I'm pretty sure the rtf code at the start of the paragraph is different for every paragraph, so I can't just tell my program to find this particular code and delete it.
Something else I thought of was searching for the first n characters in the original paragraph within the outputted rtf, like searching for "A necessary component". But if any of those first words is bolded, it won't look the same in the rtf output, so that approach won't work consistently either.
I'm sure I'm missing an obvious solution, but if anyone knows how I can cleverly work out where my text content starts and ends, I'd be glad.
I'm using VB.NET in Winforms, so would prefer an answer in VB.NET or pseudocode.
Well, it's super janky, but I've got the solution to my problems.
I found this article which has a complete function written in VB.NET to convert RTF to HTML.
Then I just did this, which takes the resulting HTML output from that function and converts it to markdown. So far it works perfectly.
If InputRTB.Text <> "" Then
Dim input As String = InputRTB.Text
Dim output As String = ""
output = sRTF_To_HTML(InputRTB.Rtf)
output = output.Substring(output.IndexOf("<span style"))
output = output.Substring(output.IndexOf(">") + 1)
Dim endpos = output.IndexOf("</span>")
output = output.Remove(endpos, output.Length - endpos)
Dim foundAllBold As Boolean = False
Dim boldWords As New List(Of String)
Do
If output.Contains("<b>") Then
Dim startb = output.IndexOf("<b>")
Dim endb = output.IndexOf("</b>")
Dim word = Trim(output.Substring(startb + 3, endb - startb - 3))
If word <> "" Then
Dim wordArray() As Char = word.ToCharArray
wordArray(0) = Char.ToUpper(wordArray(0))
word = New String(wordArray)
End If
boldWords.Add(word)
output = Replace(output, "<b>", "**", , 1)
output = Replace(output, "</b>", "**", , 1)
Else
foundAllBold = True
End If
Loop Until foundAllBold = True
output = output.Replace(vbCrLf, " ")
OutputRTB.Text = output
WordListRTB.Clear()
For Each b As String In boldWords
WordListRTB.AppendText(b & vbCrLf)
Next
Clipboard.SetText(OutputRTB.Text)
MsgBox("Copied output to clipboard")
End If
I know that the sentences collection is just a just a bunch of ranges, but I have not been able to determine exactly what criteria are used to decide where those ranges begin and end. I have been able to determine that a period (.) a question mark (?) or an exclamation point (!) followed by one or more spaces is the end of a sentence and that the spaces are included in the sentence range. I have also determined that if there are no spaces between what you and I would consider two sentences MS-Word considers it as only one sentence.
The problem is when you start putting in things like tabs, page breaks, new line characters etc. things become unclear. Can anyone explain precisely or point me to some reference material what criteria MS-Word uses to decide where one sentence ends and another begins?
Seems to be based on a delimiter of a sentence ending type (e.g. ".","!","?"). If you explain what you are trying to do or post some code more people will be willing to help.
If you are concerned about combined sentences (e.g. This is a single Sentence.Even though it is deliminated) you could expand upon this basic methodology. Special Characters seem to be much harder to handle. So positn what you are trying to do would be suggested
Sub sent_counter()
Dim s As Integer
For s = 1 To ActiveDocument.Sentences.Count
ActiveDocument.Sentences(s) = splitSentences(ActiveDocument.Sentences(s))
Next s
End Sub
Function splitSentences(s As String) As String
Dim delims As New Collection
Dim delim As Variant
delims.Add "."
delims.Add "!"
delims.Add "?"
Dim ender As String
Dim sub_s As String
s = Trim(s)
ender = Right(s, 1)
sub_s = Left(s, Len(s) - 1)
For Each delim In delims
If InStr(1, sub_s, delim) Then
sub_s = Replace(sub_s, delim, delim & " ")
End If
Next delim
splitSentences = sub_s & ender
End Function
Hey guys I'm stuck with this question. Please help.
I want to write a program that can extract alphabetical characters and special characters from an input string. An alphabetical character is any character from "a" to "z"(capital letters and numbers not included") a special character is any other character that is not alphanumerical.
Example:
string = hello//this-is-my-string#capetown
alphanumerical characters = hellothisismystringcapetown
special characters = //---#
Now my question is this:
How do I loop through all the characters?
(the for loop I'm using reads like this for x = 0 to strname.length)...is this correct?
How do I extract characters to a string?
How do I determine special characters?
any input is greatly appreciated.
Thank you very much for your time.
You could loop through each character as follows:
For Each _char As Char In strname
'Code here
Next
or
For x as integer = 0 to strname.length - 1
'Code here
Next
or you can use Regex to replace the values you do not need in your string (I think this may be faster but I am no expert) Take a look at: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx
Edit
The replacement code will look something as follows although I am not so sure what the regular expression (variable called pattern currently only replacing digits) would be:
Dim pattern As String = "(\d+)?" 'You need to update the regular expression here
Dim input As String = "123//hello//this-is-my-string#capetown"
Dim rgx As New Regex(pattern)
Dim result As String = rgx.Replace(input, "")
Since you need to keep the values, you'll want to loop through your string. Keeping a list of characters as a result will come in handy since you can build a fresh string later. Then take advantage of a simple Regex test to determine where to place things. The psuedo code looks something like this.
Dim alphaChars As New List(Of String)
Dim specialChars As New List(Of String)
For Each _char As Char in testString
If Regex.IsMatch(_char, "[a-z]")) Then
alphaChars.Add(_char)
Else
specialChars.Add(_char)
End If
Next
Then If you need to dump your results into a full string, you can simply use
String.Join(String.Empty, alphaChars.ToArray())
Note that this code makes the assumption that ANYTHING else than a-z is considered a special character, so if needs be you can do a second regular expression in your else clause to test for you special characters in a similar manner. It really depends on how much control you have over the input.
I am looping through a list for a spellchecker in vb.net (using vs 2010). I want to go through a wrongly spelled word list. Each time the code picks the index that's one higher than the index of the last checked word.
In my version of notquiteVB/Pythonese I think it would translate something like:
(start loop)
dim i as Integer = 0
dim word as String
word = words_to_check_at_spellcheck.Item(0 + i)
i = i+1
(end loop)
But this doesn't work at all...when it gets to the last item in the list and reaches 'word = ' it throws the error of 'out of range -- must be less than the size of the collection'.
How do you get the last item in a list? Maybe lists aren't what VB uses for this kind of thing?
If you're collection of misspelled words is named mispelled:
For Each word As String In mispelled
'Do something
Next
I was assigned the following project for my VB.Net programming course:
"Write a program using various procedures to perform the operations listed below. Call these procedures using delegates. Make sure to document your program and have the program print descriptive text along with the numbers in b. and c.
a) Print a text string in reverse word order.
b) Print the number of characters in the string.
c) Print number of words in the string."
Now this raises a couple of questions that I have (some of which are opinion-based) relating to how I should complete the assignment.
First off, what do you guys think my teacher means by "reverse word order"? Do they mean print a text string with the word compositions going backwards (i.e. "siht si a ecnetnes"), do they mean print a text string with the whole words going backwards (i.e. "sentence a is this"), or do they mean both at once (i.e. "ecnetnes a si siht")? This is one of the opinion-based questions, but I just wanted your guys' thoughts.
Secondly, what is the syntax to produce the number of characters in a string? I already know the code necessary to get the number of words, but part b of this assignment is confusing me slightly. Any help would be greatly appreciated.
For your second question, the syntax to produce the number of characters in a string is:
Dim mystring as String = "This is a string"
Console.Writeline(mystring.Length)
// outputs 16
As I mentioned in my comments, my guess for your first question is that the teacher wants the words reversed, not the characters, so "this is a sentence" would appear in reverse as "sentence a is this"
Had a quick go at this because it sounded interesting.
' Reverse the string and then print it to the output window
Dim ReverseArray As Array = "Print a text string in reverse word order.".ToCharArray
' Reverse the array to
Array.Reverse(ReverseArray)
' Convert the array back into a string (should be Reversed...)
Debug.WriteLine("Reversed string = '{0}'", New String(ReverseArray))
' Get the number of Characters, remember a "Space" is still a Character
Debug.WriteLine("Number of characters in array = {0}", ReverseArray.Length)
' Count the number of spaces in the string, then add an extra one
Debug.WriteLine("Number of Words = {0}", (From c In ReverseArray Where c.ToString = " " Select c).Count + 1)