Over the summer I decided to write a program that would solve an anagram by using all possible combinations of the letters in a word that you would enter. I managed to do it but in such a way that it could work out a 3-4 letter word out quickly but anything more and it would take ages! Anyway, after asking on some other site for help some guy/girl fixed my problem by writing some code for me.
At the time I didn't understand it even though it had been annotated and I had another look today to see if I could get anything from it but alas, nothing...
I've been researching permutations for about 2 hours now and I'm getting nowhere - the code is on the internet but no-one has explained it well enough (or simply enough) for me to grasp the concept. I will post the code that I have yet to understand below and if anyone can explain this in detail and how it works then it would be great!
Public Sub permutations(ByVal WordLength As Integer, ByVal SplitLetters As List(Of Char), ByVal word As String)
If WordLength = 1 Then
results.Add(word & SplitLetters(0))
count += 1
If count Mod updateCount = 0 Then
ldBar.Value = ((count / pCombinations) * 100)
End If
Else
For i = 0 To WordLength - 1
Dim newWord = word & SplitLetters(i)
Dim newSplitLetters As List(Of Char) = New List(Of Char)(SplitLetters)
newSplitLetters.RemoveAt(i)
Call permutations(WordLength - 1, newSplitLetters, newWord)
Next i
End If
End Sub
The method gets all permutations by using recursion.
Given a word, for example FUBAR, it uses each character in turn as the first character and gets all possible permutations by combining that will all permutation for the remaining characters:
F + permutatons(UBAR)
U + permutatons(FBAR)
B + permutatons(FUAR)
A + permutatons(FUBR)
R + permutatons(FUBA)
For that first recursive call it gets all permutations that starts with each character:
U + permutatons(BAR)
B + permutatons(UAR)
A + permutatons(UBR)
R + permutatons(UBA)
And so on. When it gets down to a single character string, all possible permutations is just that single character.
Related
In VB.net (Visual Studio 2015) how can I get the nth string (or number) in a comma-separated list?Say I have a comma-separated list of numbers like so:13,1,6,7,2,12,9,3,5,11,4,8,10How can I get, say, the 5th value in this string, in this case 12?I've looked at the Split function, but it converts a string into an array. I guess I could do that and then get the 5th element of that array, but that seems like a lot to go through just to get the 5th element. Is there a more direct way to do this, or am I pretty much limited to the Split function?
In case you are looking for an alternative method, which is more basic, you can try this:
Module Module1
Sub Main()
Dim a As String = "13,1,6,7,2,12,9,3,5,11,4,8,10"
Dim counter As Integer = 5 'the number you want (in this case, 5th one)
Dim movingcounter As Integer = 0 'how many times we have moved
Dim startofnumber, endofnumber, i As Integer
Dim numberthatIwant As String
Do Until movingcounter = counter
startofnumber = InStr(i + 1, a, ",")
i = startofnumber
movingcounter = movingcounter + 1
Loop
endofnumber = InStr(startofnumber + 1, a, ",")
numberthatIwant = (Mid(a, startofnumber + 1, endofnumber - startofnumber - 1))
Console.WriteLine("The number that I want: " + numberthatIwant)
Console.ReadLine()
End Sub
End Module
Edit: You can make this into a procedure or function if you wish to use it in a larger program, but this code run in console mode will give the output of 12.
The solution provided by Plutonix as a comment to my question is straightforward and exactly what I was looking for, to wit:result = csv.Split(","c)(5)In my case I was incrementing a variable each time my program ran and needed to get the nth character or string after the incremented value. That is, if my program had incremented the variable 5 times, then I needed the string after the 4th comma, which of course, is the 5th string. So my solution was something like this:result = WholeString.Split(","c)(IncrementedVariable)Note that this is a zero-based variable.Thanks, Plutonix.
I am trying to write a program that will count the number of sentences in a string. I would like to better use the framework has much has possible but I really don't understand this msdn example. So could someone explain it if they understand or does someone know how to count the number of sentences in a sting accurately? I am open to all and any suggestions.
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.sentences.count.ASPX?cs-save-lang=1&cs-lang=vb#code-snippet-1
EDIT:
Ok, for anyone that has had a hard time with counting sentences (like me) here is what I have been able to come up with and I think this is has good as you can get it. So if anyone else has a idea on how to solve sentences counting let me know. But what I have and it works.
My idea is that a sentences is defined has ". or ! or ?" and followed by at least one space. So this what seems to work. (Sorry its not a fully working module because I copied it out of class I'm using).
'Period cehck
For i As Integer = 0 To _runFor
If (str(i) = _Dot And True = Char.IsWhiteSpace(str(i + 1))) Then
_sentence_count = _sentence_count + 1
End If
Next
'Question check
For i As Integer = 0 To _runFor
If (str(i) = _Question And True = Char.IsWhiteSpace(str(i + 1))) Then
_sentence_count = _sentence_count + 1
End If
Next
'Exclamation check
For i As Integer = 0 To _runFor
If (str(i) = _Exclamation And True = Char.IsWhiteSpace(str(i + 1))) Then
_sentence_count = _sentence_count + 1
End If
Next
It's not really possible to accurately count sentences programmatically. For instance, if you wrote a piece of code that just counted the number of occurrences of a period followed by a space followed by a capital letter, then it would incorrectly interpret "I gave my report to Dr. Johnson." as two sentences.
I have a text file that reads:
Left Behind,Lahaye,F,7,11.25
A Tale of Two Cities,Dickens,F,100,8.24
Hang a Thousand Trees with Ribbons,Rinaldi,F,30,16.79
Saffy's Angel,McKay,F,20,8.22
Each Little Bird that Sings,Wiles,F,10,7.70
Abiding in Christ,Murray,N,3,12.20
Bible Prophecy,Lahaye and Hindson,N,5,14.95
Captivating,Eldredge,N,12,16
Growing Deep in the Christian Life,Swindoll,N,11,19.95
Prayers that Heal the Heart,Virkler,N,4,12.00
Grow in Grace,Ferguson,N,3,11.95
The Good and Beautiful God,Smith,N,7,11.75
Victory Over the Darkness,Anderson,N,12,16
The last element of each line is a price. I would like to add up all the prices. I've been searching for so many hours now and cannot find a thing to answer my question. This seems soooo easy but I cannot figure it out!!! Please help out. BTW, this list is bound to change (adding of lines, deletion of lines, altering of lines) so if you can, please nothing concrete but instead leave the code open to changes. Thanks!!!
Just so you can see my pooooorrrr work, here is what I have (I think I deleted my code and rewrote a different way for several hours now.):
Dim Inv() As String = IO.File.ReadAllLines("Books.txt")
Dim t As Integer = Inv.Count - 1
Dim a As Integer = 0 to t
Dim sumtotal As String = sumtotal + Inv(4)
also,
for each line has either an "F" or an "N". how do I add up all the F's and all the N's. Do I do it via if statements?
First, you'll be better off using Double as your type instead of String. Second, observe how I use the Split function on each line, cast its last element as a double, and add it to the total. Yes, using an If Statement is how you can determine whether or not to add to the count of F or the count of N.
Dim lstAllLines As List(Of String) = IO.File.ReadAllLines("Books.txt").ToList()
Dim dblTotal As Double = 0.0
Dim intCountOfF As Integer = 0
Dim intCountOfN As Integer = 0
For Each strLine As String In lstAllLines
Dim lstCells As List(Of String) = strLine.Split(",").ToList()
dblTotal += CDbl(lstCells(3))
If lstCells(2) = "F" Then
intCountOfF += 1
Else
intCountOfN += 1
End If
Next
I have the following table with 2 columns: ID and Title containing over 500.000 records. For example:
ID Title
-- ------------------------
1 Aliens
2 Aliens (1986)
3 Aliens vs Predator
4 Aliens 2
5 The making of "Aliens"
I need to find records that are very similar, and by that I mean they are different by 3-6 letters, usually this difference is at the end of the Titles. So I have to design a query that returns the records no. 1,2 and 4. I already looked at levenstein distance but I don't know how to apply it. Also because of the number of records the query shouldn't take all night long.
Thanks for any idea or suggestion
If you really want to define similarity in the exact way that you have formulated in your question, then you would - as you say - have to implement the Levensthein Distance calculation. Either in code calculated on each row retrieved by a DataReader or as a SQL Server function.
The problem stated is actually more tricky than it may appear at first sight, because you cannot assume to know what the mutually shared elements between two strings may be.
So in addition to Levensthein Distance you probably also want to specify a minimum number of consecutive characters that actually have to match (in order for sufficient similarity to be concluded).
In sum: It sounds like an overly complicated and time consuming/slow approach.
Interestingly, in SQL Server 2008 you have the DIFFERENCE function which may be used for something like this.
It evaluates the phonetic value of two strings and calculates the difference. I'm unsure if you will get it to work properly for multi-word expressions such as movie titles since it doesn't deal well with spaces or numbers and puts too much emphasis on the beginning of the string, but it is still an interesting predicate to be aware of.
If what you are actually trying to describe is some sort of search feature, then you should look into the Full Text Search capabilities of SQL Server 2008. It provides built-in Thesaurus support, fancy SQL predicates and a ranking mechanism for "best matches"
EDIT: If you are looking to eliminate duplicates maybe you could look into SSIS Fuzzy Lookup and Fuzzy Group Transformation. I have not tried this myself, but it looks like a promising lead.
EDIT2: If you don't want to dig into SSIS and still struggle with the performance of the Levensthein Distance algorithm, you could perhaps try this algorithm which appears to be less complex.
For all the Googlers out there that run into this question, though it's already been marked as answered, I figured I'd share some code to help with this. If you're able to do CLR user-defined functions on your SQL Server, you can implement your own Levensthein Distance algorithm and then from there create a function that gives you a 'similarity score' called dbo.GetSimilarityScore(). I've based my score case-insensitivity, without much weight to jumbled word order and non-alphanumeric characters. You can adjust your scoring algorithm as needed, but this is a good start. Credit to this code project link for getting me started.
Option Explicit On
Option Strict On
Option Compare Binary
Option Infer On
Imports System
Imports System.Collections.Generic
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports System.Text
Imports System.Text.RegularExpressions
Imports Microsoft.SqlServer.Server
Partial Public Class UserDefinedFunctions
Private Const Xms As RegexOptions = RegexOptions.IgnorePatternWhitespace Or RegexOptions.Multiline Or RegexOptions.Singleline
Private Const Xmsi As RegexOptions = Xms Or RegexOptions.IgnoreCase
''' <summary>
''' Compute the distance between two strings.
''' </summary>
''' <param name="s1">The first of the two strings.</param>
''' <param name="s2">The second of the two strings.</param>
''' <returns>The Levenshtein cost.</returns>
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function ComputeLevenstheinDistance(ByVal string1 As SqlString, ByVal string2 As SqlString) As SqlInt32
If string1.IsNull OrElse string2.IsNull Then Return SqlInt32.Null
Dim s1 As String = string1.Value
Dim s2 As String = string2.Value
Dim n As Integer = s1.Length
Dim m As Integer = s2.Length
Dim d As Integer(,) = New Integer(n, m) {}
' Step 1
If n = 0 Then Return m
If m = 0 Then Return n
' Step 2
For i As Integer = 0 To n
d(i, 0) = i
Next
For j As Integer = 0 To m
d(0, j) = j
Next
' Step 3
For i As Integer = 1 To n
'Step 4
For j As Integer = 1 To m
' Step 5
Dim cost As Integer = If((s2(j - 1) = s1(i - 1)), 0, 1)
' Step 6
d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
Next
Next
' Step 7
Return d(n, m)
End Function
''' <summary>
''' Returns a score between 0.0-1.0 indicating how closely two strings match. 1.0 is a 100%
''' T-SQL equality match, and the score goes down from there towards 0.0 for less similar strings.
''' </summary>
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function GetSimilarityScore(string1 As SqlString, string2 As SqlString) As SqlDouble
If string1.IsNull OrElse string2.IsNull Then Return SqlInt32.Null
Dim s1 As String = string1.Value.ToUpper().TrimEnd(" "c)
Dim s2 As String = string2.Value.ToUpper().TrimEnd(" "c)
If s1 = s2 Then Return 1.0F ' At this point, T-SQL would consider them the same, so I will too
Dim score1 As SqlDouble = InternalGetSimilarityScore(s1, s2)
If score1.IsNull Then Return SqlDouble.Null
Dim mod1 As String = GetSimilarityString(s1)
Dim mod2 As String = GetSimilarityString(s2)
Dim score2 As SqlDouble = InternalGetSimilarityScore(mod1, mod2)
If score2.IsNull Then Return SqlDouble.Null
If score1 = 1.0F AndAlso score2 = 1.0F Then Return 1.0F
If score1 = 0.0F AndAlso score2 = 0.0F Then Return 0.0F
' Return weighted result
Return (score1 * 0.2F) + (score2 * 0.8F)
End Function
Private Shared Function InternalGetSimilarityScore(s1 As String, s2 As String) As SqlDouble
Dim dist As SqlInt32 = ComputeLevenstheinDistance(s1, s2)
Dim maxLen As Integer = If(s1.Length > s2.Length, s1.Length, s2.Length)
If maxLen = 0 Then Return 1.0F
Return 1.0F - Convert.ToDouble(dist.Value) / Convert.ToDouble(maxLen)
End Function
''' <summary>
''' Removes all non-alpha numeric characters and then sorts
''' the words in alphabetical order.
''' </summary>
Private Shared Function GetSimilarityString(s1 As String) As String
Dim normString = Regex.Replace(If(s1, ""), "\W|_", " ", Xms)
normString = Regex.Replace(normString, "\s+", " ", Xms).Trim()
Dim words As New List(Of String)(normString.Split(" "c))
words.Sort()
Return String.Join(" ", words.ToArray())
End Function
End Class
select id, title
from my_table
where
title like 'Aliens%'
and
len(rtrim(title)) < len('Aliens') + 7
From what you've asked I imagine the differences you're looking for should not be more than a single word at the end of the original title. Is that why 1,2 and 4 are returned?
Anyway I've made a query that checks the difference at the end consists of a single word, without spaces.
declare #title varchar(20)
set #title = 'Aliens'
select id, title
from movies with (nolock)
where ltrim(title) like #title + '%'
and Charindex(' ', ltrim(right(title, len(title) - len(#title)))) = 0
and len(ltrim(right(title, len(title) - len(#title)))) < 7
hope it helps.
if you are using sql server 2008 you should be able to use the FULLTEXT functionality.
The basic steps are:
1) Create a fulltext index over the column. This will tokenise each string (stremmers, splitters, etc) and let you search for 'LIKE THIS' strings.
The disclaimer is that I've never had to use it but I think it can do what you want.
Start reading here: http://msdn.microsoft.com/en-us/library/ms142571.aspx
You can try SSIS Fuzzy Grouping and it will give you score based on string matches.
You can also use utl_match in Oracle.
enter link description here
I have built a blog platform in VB.NET where the audience are very young, and for some reason like to express their commitment by repeating sequences of characters in their comments.
Examples:
Hi!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3<3
LOLOLOLOLOLOLOLOLOLOLOLOLLOLOLOLOLOLOLOLOLOLOLOLOL
..and so on.
I don't want to filter this out completely, however, I would like to shorten it down to a maximum of 5 repeating characters or sequences in a row.
I have no problem writing a function to handle a single repeating character. But what is the most effective way to filter out a repeating sequence as well?
This is what I used earlier for the single repeating characters
Private Shared Function RemoveSequence(ByVal str As String) As String
Dim sb As New System.Text.StringBuilder
sb.Capacity = str.Length
Dim c As Char
Dim prev As Char = String.Empty
Dim prevCount As Integer = 0
For i As Integer = 0 To str.Length - 1
c = str(i)
If c = prev Then
If prevCount < 10 Then
sb.Append(c)
End If
prevCount += 1
Else
sb.Append(c)
prevCount = 0
End If
prev = c
Next
Return sb.ToString
End Function
Any help would be greatly appreciated
You should be able to recursively use the 'Longest repeated substring problem' to solve this. On the first pass you will get two matching sub-strings, and will need to check if they are contiguous. Then repeat the step for one of the sub-strings. Cut off the algo, if the strings are not contiguous, or if the string size become less than a certain number of characters. Finally, you should be able to keep the last match, and discard the rest. You will need to dig around for an implementation :(
Also have a look at this previously asked question: finding long repeated substrings in a massive string