Link numbers with an equation/algorithm - vb.net

I am making an anagram solver in Visual Basic that gives you every possible combination when you enter a string. I need to work out how many combinations there are depending on the amount of characters in the string and how many different characters there are.
E.G.
Sample string:
abc
Total characters: 3, Different Characters: 3
Possible combinations: 6
abc, acb, bac, bca, cab, cba
I need an equation (using the number of characters and different characters) to link this to a string that contains a different amount of characters.
I've been using trial and error to try and figure is out, but I can't quite get my head around it. So far I have:
((letters - 1) ^ (different letters - 1)) + (letters - 1)
which works for a few different letter counts but now for all.
Help please???

I'll lead you to the answer, but I'll try to explain along the way. Let's say you had 10 different letters. You'd have 10 choices for the first, 9 for the second, 8 for the third, etc. Ultimately, there would be 10*9*8*7*6...*2*1 = 10! possibilities. However, sometimes you'll have multiple instances of the same letter. For example, using that for the string "aaabcd" would overcount possibilities, because it counts each of the a's as distinct letters, even though they're not. To correct for that, you would have to divide by the factorial of the number of repeated letters. A good way to calculate the total number of possibilities would be (total number of letters factorial)/ (product of the factorials of the number of repeated instances of each letter).
For example:
There are 6!/(3!) ways to arrange the letters in "aaabcd"
There are 6! ways to arrange the letters is "abcdef"
There are 6!/(3!*2!) ways to arrange the letters in "aaabbc"
There are 10!/(5!*3!*2!) ways to arrange the letters in "aaaaabbbcc"
I hope this helps.

For the possible counting number, it's exactly the same as computing Multinomial Coefficient
A simple explanation is that, for no repeating characters,
It's simply permutation = n!
(It is easy to understand if you draw a tree diagram, with first character has n choices, second character has n-1choices...etc.)
However as you may have repeating characters, you will double count many of them.
Let's see an simple example: for aaa, how many possible arrangements IF WE COUNT EVEN THE OUTCOME IS THE SAME?
Answer is 3!(aaa,aaa,aaa,aaa,aaa,aaa)
This gives us an idea that, when we have a character appearing for m times, we will count m! instead of 1
So the counting is just n!(all possible arrangements, including same outcome) / m! (a character appear for m times)
Same for more characters repeating: n!/a!b!c!.. (first character appear a times, another appear for b times...)
If you understand the concept behind, then you will find that, actually for those "non-repeating" characters, it's just dividing an 1!. For eg, character (multi)set = {a,a,a,b,b,c}, #a = 3, #b = 2, #c = 1, so the answer (without repeating count) is (3+2+1)!/3!2!1! and fraction of this format is named multinomial coefficient as stated above.
In programming point of view, you can just pre-compute all factorials (with a pretty small n though as n~30 is already too large for a variable to store) with simple for loop
declare frac = array(n);
frac[0] = 1;
FOR i=1; i<=n;i++
frac[i] = i*frac[i-1]
For a larger n, you may just calculate double/float division on the fly in the loop to avoid overflow..you may face precision problem though.
If you further need to output the different strings, you may use DFS to backtrack all the possible outcomes. Or if you could use another language like C++, you can use built-in function like next_permutation() after sort the character set.

Related

Could you explain how to convert from lz77 to huffman?

Could you explain how to convert from lz77 to huffman on the example in the below picture?
Easy:
In the first step your output is essentially 3 numbers:
prev index
number of characters to repeat
next character (be it ascii or unicode)
The algorithm demands that you specify a sliding window up front. That means you know how big (1) and (2) can be at most.
In other words, you know how many bits (1) and (2) will take up.
Since (3) is essentially also a character from a fixed length alphabet, you also know the bit-length of (3)
That means it's safe to simply concatenate them.
So, the output of the first algorithm can be thought of as outputting a bit-sequence, where every item in the sequence has a fixed length.
That's ideal for applying huffman.
Of course the specifics are not mentioned, and you can choose from a lot of options.
normalized huffman table
1 on left-branch vs 0 on left-branch
priorities when merging items of similar count
etc
So I can not readily explain the exact output values you are showing.
But I hope I can at least explain how to get from A to B.
You can't. The coding shown is, well, figurative. Not literal. The symbols A, B, and C are all coded to the single bit 0. Obviously that's not going to be very helpful on the decoding end.

Random string generation using arc4random

I'm trying to create a method that creates a random string consisting of 32 characters. This method will generate a random number using arc4random_uniform(62) to choose a number between 0 and 61 and then chose a character from a string that holds numbers from 0 to 9 and alphabet letters both small and capital letters, respectively. For an instance, if arc4random_uniform(62) returns 10, the chosen character will be a, if it returns 61, the chosen character will be Z). The method will do this for 32 times to create the final generated string.
I was wondering when this approach will fail to generate a unique String and result in a repeated one. I searched about this topic and didn't find a satisfying answer. I hope that you will help with me this since I am trying to use this method to generate unique IDs for use in my app.
This method will generate a random number using arc4random_uniform(62) to choose a number between 0 and 61 and then chose a character from a string that holds numbers from 0 to 9 and alphabet letters both small and capital letters, respectively.
You could create an array with a string for all the characters you want to include, and randomly pick values. Or, alternatively you could take advantage of the ASCII encoding has mostly sequential character positions and you can fairly easily convert an ascii number to an NSString.
An integer between 48 and 57 is the numbers 0-9 in ASCII, 65 to 90 is A-Z and 97 to 122 is a-z: https://en.wikipedia.org/wiki/Ascii_table#ASCII_printable_code_chart
I was wondering when this approach will fail to generate a unique String and result in a repeated one. I searched about this topic and didn't find a satisfying answer.
It's often referred to as the "birthday problem". As long as your value is reasonably long (say, 20 characters), it is effectively impossible to have a collision. The world is more likely to be destroyed in the next 2 seconds than your app ever creating a collision.
I hope that you will help with me this since I am trying to use this method to generate unique IDs for use in my app.
Apple provides an API for generating unique IDs. You should use that instead of inventing your own system:
NSString *id = [NSUUID UUID].UUIDString;
That will give you a value like D19B40AA-322C-4ADF-BEF6-2EC4D4CE7BA8. It conforms to "Version 4" of the UUID standard — according to Wikipedia if you generate 1 billion UUIDs every second for the next 100 years, there is a 50% chance of getting two IDs that are the same.
If the UUID is longer than you want, you could grab a smaller part part of the string. Beware that the 4 at the start of the third block means this is a "version 4" UUID and is not a random value. Also the first character at the start of the 4th block is only has four possible values — so avoid or strip off those two characters if you want to grab a smaller part of the string for use as your random ID. See the wikipedia page on UUIDs for more detail.

Make unique readable string out of a long integer

I have long integers numbers like this: 5291658276538691055
How could I programmatically convert this number to a 4-6 capital letters only that is a unique combination that can also be reversed to get back to the number?
For example using OBJ-C.
There are 26 capital letters;
6 of them could represent 26 ^ 6 numbers (308915776);
So, no. You are trying to map a much larger range of numbers into a much smaller range, it cannot be reversible.
Also, log 5291658276538691055 / log 26 is less than 14, so if 14 letters is good for you, just transform the number into 26-based and map the digits to letters.
And one more thing - if the range of numbers is small enough, you could do some manipulation on the numbers (e.g., just subtract the min) and encode it, which will cost you less digits.
You will need to convert the numbers to Base 26 (Hexavigesimal - snappy name!)
The Wikipedia article on Hexavigesimal gives example code in Java - you should be able to adapt this pretty easily.
NB: You cannot get the long number you mentioned down to 4-6 capital letters only using a conversion algorithm (your example in Base 26 is BCKSATKEBRYBXJ). If you need conversion that short, you only have two options:
Lookup tables (store mappings, e.g. 5291658276538691055 = ABCDEF). Obviously only useful if you have a discrete set of numbers.
Including additional characters (e.g. lower case + numbers).

Problem 98 - Project Euler

The problem is as follows:
By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 36^(2). What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 96^(2). We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.
Using words.txt (right click and 'Save Link/Target As...'), a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).
What is the largest square number formed by any member of such a pair?
NOTE: All anagrams formed must be contained in the given text file.
I don't understand the mapping of CARE to 1296? How does that work? or are all permutation mappings meant to be tried i.e. all letters to 1-9?
All assignments of digits to letters are allowed. So C=1, A=2, R=3, E=4 would be a possible assignment ... except that 1234 is not a square, so that would be no good.
Maybe another example would help make it clear? If we assign A=6, E=5, T=2, then TEA = 256 = 16² and EAT = 625 = 25². So (TEA=256, EAT=625) is a square anagram word pair.
(Just because all assignments of digits to letters are allowed, does not mean that actually trying out all such assignments is the best way to solve the problem. There may be some other, cleverer, way to do it.)
In short: yes, all permutations need to be tried.
If you test all substitutions letter for digit, than you are looking for pairs of squares with properties:
have same length
have same digits with number of occurrences as in input string.
It is faster to find all these pairs of squares. There are 68 squares with length 4, 216 squares with length 5, ... Filtering all squares of same length by upper properties will generate 'small' number of pairs, which are solutions you are looking for.
These data is 'static', and doesn't depend on input strings. It can be calculated once and used for all input strings.
Hmm. How to put this. The people who put together Project Euler promise that there is a solution that is under one minute for every problem, and there is only one problem that I think might fail this promise, but this is not it.
Yes, you could permute the digits, and try all permutations against all squares, but that would be a very large search space, not at all likely to be the (TM) right thing. In general, when you see that your "look" at the problem is going to generate a search that will take too long, you need to search something else.
Like, suppose you were asked to determine what numbers would be the result of multiplying two primes between 1 and a zillion. You could factor every number between 1 and a zillion, but it might be faster to take all combinations of two primes and multiply them. Since you are looking at combinations, you can start with two and go until your results are too large, then do the same with three, etc. By comparison, this should be much faster - and you don't have to multiply all the numbers out, you could take logs of all the primes and then just add them and find the limit for every prime, giving you a list of numbers you could add up.
There are a bunch of innovative solutions, but the first one you think of - especially the one you think of when Project Euler describes the problem, is likely to be wrong.
So, how can you approach this problem? There are probably too many permutations to look at, but maybe you can figure out something with mappings and comparing mappings?
(Trying to avoid giving it all away.)

comparing/intersecting compare criteria

If there's any open source code that does this already I'm interested in hearing about it. But I haven't seen it yet so I'm trying to roll my own.
Example:
variable x = compareCriteriaBetween 3 and 6
variable y = compareCriteriaLesserThanOrEqual 5
The difficult part for me is finding an elegant way to compare the compareCriteria and create an intersection. In the example the intersection between the two is 'between 3 and 5'.
How can I implement this in a 'tell don't ask' manner? Note that compareCriteria can be completely unrelated (eg startsWithLetter versus betweenNumber).
If you only have constants in your expressions you should be safe from undecidability (I think!). Problems arise as soon as you can express e.g. general statements about integers with +-*/ (see Peano arithmetic).
Even if you stay within the realm of decidability, there exists no algorithm that can take arbitrary statements P(x) and Q(x) and compute a statement R(x) equivalent to P(x) & Q(x) for all x, where x can range over any domain (integers, strings, matrices, real numbers, complex numbers, logical statements [whoops, back into undecidable territory!], ...). You need domain specific tricks to get there, and strictly delimited languages in which P, Q and R are formulated. There exist software products for certain domains -- one of them is called Mathematica...
Try to get back to basics: what problem are you trying to solve?
If you are just interested in simple criteria like less equal or between on integers/floats, you can rewrite between 3 and 6 as (greater equal 3 and less equal 6). If you combine this with a logical and with less equal 5 you can use Boolean algebra to obtain (greater equal 3 and (less equal 6 and less equal 5)) before simplifying the inner parenthesis to just less equal 5 and rewriting the result as between 3 and 5.