Can any tell me the regular expression for the given language? - automation

all strings which contains even number of 0s or even number of 1s. Here I am asking about 'or' not 'and'.
I have come up with this: (1*01*0)*|(0*10*1)* so far...but this seems wrong to me cause when you draw DFA for the above language you can even accept 111 or 000 too.

For zeros, allow for any number of pars of zeros with any number of leading non-zeros and separating non-zeros. And then allow for any trailing non-zeros. If the string does not match this pattern, it has an odd number of zeros.
(1*01*0)*1*
And to do it for zeros or ones, just copy with 1s instead and add it as an alternative to the whole thing.
(1*01*0)*1*|(0*10*1)*0*
Also, 111 and 000 both correctly satisfy the condition, because 111 has an even number of 0s and 000 has an even number of 1s. Examples that shouldn't work would be 1101 or 011100.

Related

Explaining why (7.8/39) = .2 does not return results in a where clause

So I know why (7.8/39)=.2 does not return results in a where clause in SQL while ROUND((7.8/39),1)=.2 will return results, but I don't know how to explain it clearly to someone I work with and I would like to provide them something to read other then me telling them that that will not work as part of their where clause.
Thanks you,
Jalal
This is due to floating point arithmetic. This can be hard to explain, if you don't just "get" it.
Where do I start? Integers are pretty easy to represent in the bits that a CPU understands. So, 00000101 is interpreted as 5 -- and exactly 5 -- because it is 2^2 + 2^0.
However, this doesn't work for fractional parts of numbers. To solve this, computer scientists invented two forms of numbers. One is what I personally think of as BCD (binary coded decimal) but databases call decimal or numeric. Each digit is represent as a number. You only need 4 bits for a digit, so this looks like:
0011 0001 1111 0000 1001
3 1 . 0 9
This exactly represents 31.09. Just keep adding bits. Note: This is conceptual. Exact implementation may vary.
The second method is exponential notation. That is: xxx * 2^ yyy, where xxx and yyy are integers. For example, 0.25 is 1 * 2^(-2). The "1" and "-2" can be exactly represented.
This works well for approximations of numbers. The problem is that 0.25 can be exactly represented. But 0.24 and 0.26 cannot be. They end up with some complicated numbers being involved. The same is true of 0.2 -- which is the number you are trying to represent.
What happens is that you write 0.2 and it is represented as 0.00110011001 (say). But when you do the calculation it ends up being 0.00110011000. Oooh. That last bit changed, so it is really more like 0.19997 (well, a bit more '9's in practice). The values are not exactly equal.
The moral: don't use equality on floating point numbers. The numbers may look the same but differ in some piddling binary decimal place.
I tried on my "Vertica Analytic Database v9.3.1-0" as my SELECT VERSION() returns:
select (7.8/39)=.2 as is_it_true
returns:
is_it_true
true
To be sure and avoid any floating-point issues, try casting the two equality operands to the same type:
select (7.8/39)::NUMERIC(5,1) = .2::NUMERIC(5,1)

Shorten text codes

My question is more of cryptographic matter than programming.
I have codes of 5 chars each, that can be concatenated in pairs. The result is a code of 10 chars.
The problem is that the database field where I must store these values is only 6 chars width, and i'd prefer not to resize it.
Is there a known method or algorithm which could shorten the pairs of value, to change from 10 chars to 6 chars max ? The result can be made of any printable chars (preferably ASCII), and must avoid any duplicated values for two distinct pairs of codes.
Another solution may be shortening the 5 chars codes to 3 chars, but the remaining problem is also about no duplicates allowed when concatenated by pairs.
Thank you for any idea. I tried several solution (including Base64 encoding !) but my results are always too long, or they include duplicated values.
This question has nothing to do with cryptography. It should probably be tagged information-theory.
There are 97 printable ASCII characters, so the maximum amount of information that you can store in 6 chars is 39.6 bits (=6 × log2(97)). If you spread the same amount of information across 10 characters, then you only can only carry 3.96 bits in each character. That means you can use an alphabet of 15 characters for your codes (e.g., uppercase letters from A to O).

Could you explain how to convert from lz77 to huffman?

Could you explain how to convert from lz77 to huffman on the example in the below picture?
Easy:
In the first step your output is essentially 3 numbers:
prev index
number of characters to repeat
next character (be it ascii or unicode)
The algorithm demands that you specify a sliding window up front. That means you know how big (1) and (2) can be at most.
In other words, you know how many bits (1) and (2) will take up.
Since (3) is essentially also a character from a fixed length alphabet, you also know the bit-length of (3)
That means it's safe to simply concatenate them.
So, the output of the first algorithm can be thought of as outputting a bit-sequence, where every item in the sequence has a fixed length.
That's ideal for applying huffman.
Of course the specifics are not mentioned, and you can choose from a lot of options.
normalized huffman table
1 on left-branch vs 0 on left-branch
priorities when merging items of similar count
etc
So I can not readily explain the exact output values you are showing.
But I hope I can at least explain how to get from A to B.
You can't. The coding shown is, well, figurative. Not literal. The symbols A, B, and C are all coded to the single bit 0. Obviously that's not going to be very helpful on the decoding end.

Link numbers with an equation/algorithm

I am making an anagram solver in Visual Basic that gives you every possible combination when you enter a string. I need to work out how many combinations there are depending on the amount of characters in the string and how many different characters there are.
E.G.
Sample string:
abc
Total characters: 3, Different Characters: 3
Possible combinations: 6
abc, acb, bac, bca, cab, cba
I need an equation (using the number of characters and different characters) to link this to a string that contains a different amount of characters.
I've been using trial and error to try and figure is out, but I can't quite get my head around it. So far I have:
((letters - 1) ^ (different letters - 1)) + (letters - 1)
which works for a few different letter counts but now for all.
Help please???
I'll lead you to the answer, but I'll try to explain along the way. Let's say you had 10 different letters. You'd have 10 choices for the first, 9 for the second, 8 for the third, etc. Ultimately, there would be 10*9*8*7*6...*2*1 = 10! possibilities. However, sometimes you'll have multiple instances of the same letter. For example, using that for the string "aaabcd" would overcount possibilities, because it counts each of the a's as distinct letters, even though they're not. To correct for that, you would have to divide by the factorial of the number of repeated letters. A good way to calculate the total number of possibilities would be (total number of letters factorial)/ (product of the factorials of the number of repeated instances of each letter).
For example:
There are 6!/(3!) ways to arrange the letters in "aaabcd"
There are 6! ways to arrange the letters is "abcdef"
There are 6!/(3!*2!) ways to arrange the letters in "aaabbc"
There are 10!/(5!*3!*2!) ways to arrange the letters in "aaaaabbbcc"
I hope this helps.
For the possible counting number, it's exactly the same as computing Multinomial Coefficient
A simple explanation is that, for no repeating characters,
It's simply permutation = n!
(It is easy to understand if you draw a tree diagram, with first character has n choices, second character has n-1choices...etc.)
However as you may have repeating characters, you will double count many of them.
Let's see an simple example: for aaa, how many possible arrangements IF WE COUNT EVEN THE OUTCOME IS THE SAME?
Answer is 3!(aaa,aaa,aaa,aaa,aaa,aaa)
This gives us an idea that, when we have a character appearing for m times, we will count m! instead of 1
So the counting is just n!(all possible arrangements, including same outcome) / m! (a character appear for m times)
Same for more characters repeating: n!/a!b!c!.. (first character appear a times, another appear for b times...)
If you understand the concept behind, then you will find that, actually for those "non-repeating" characters, it's just dividing an 1!. For eg, character (multi)set = {a,a,a,b,b,c}, #a = 3, #b = 2, #c = 1, so the answer (without repeating count) is (3+2+1)!/3!2!1! and fraction of this format is named multinomial coefficient as stated above.
In programming point of view, you can just pre-compute all factorials (with a pretty small n though as n~30 is already too large for a variable to store) with simple for loop
declare frac = array(n);
frac[0] = 1;
FOR i=1; i<=n;i++
frac[i] = i*frac[i-1]
For a larger n, you may just calculate double/float division on the fly in the loop to avoid overflow..you may face precision problem though.
If you further need to output the different strings, you may use DFS to backtrack all the possible outcomes. Or if you could use another language like C++, you can use built-in function like next_permutation() after sort the character set.

Problem 98 - Project Euler

The problem is as follows:
By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 36^(2). What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 96^(2). We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.
Using words.txt (right click and 'Save Link/Target As...'), a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).
What is the largest square number formed by any member of such a pair?
NOTE: All anagrams formed must be contained in the given text file.
I don't understand the mapping of CARE to 1296? How does that work? or are all permutation mappings meant to be tried i.e. all letters to 1-9?
All assignments of digits to letters are allowed. So C=1, A=2, R=3, E=4 would be a possible assignment ... except that 1234 is not a square, so that would be no good.
Maybe another example would help make it clear? If we assign A=6, E=5, T=2, then TEA = 256 = 16² and EAT = 625 = 25². So (TEA=256, EAT=625) is a square anagram word pair.
(Just because all assignments of digits to letters are allowed, does not mean that actually trying out all such assignments is the best way to solve the problem. There may be some other, cleverer, way to do it.)
In short: yes, all permutations need to be tried.
If you test all substitutions letter for digit, than you are looking for pairs of squares with properties:
have same length
have same digits with number of occurrences as in input string.
It is faster to find all these pairs of squares. There are 68 squares with length 4, 216 squares with length 5, ... Filtering all squares of same length by upper properties will generate 'small' number of pairs, which are solutions you are looking for.
These data is 'static', and doesn't depend on input strings. It can be calculated once and used for all input strings.
Hmm. How to put this. The people who put together Project Euler promise that there is a solution that is under one minute for every problem, and there is only one problem that I think might fail this promise, but this is not it.
Yes, you could permute the digits, and try all permutations against all squares, but that would be a very large search space, not at all likely to be the (TM) right thing. In general, when you see that your "look" at the problem is going to generate a search that will take too long, you need to search something else.
Like, suppose you were asked to determine what numbers would be the result of multiplying two primes between 1 and a zillion. You could factor every number between 1 and a zillion, but it might be faster to take all combinations of two primes and multiply them. Since you are looking at combinations, you can start with two and go until your results are too large, then do the same with three, etc. By comparison, this should be much faster - and you don't have to multiply all the numbers out, you could take logs of all the primes and then just add them and find the limit for every prime, giving you a list of numbers you could add up.
There are a bunch of innovative solutions, but the first one you think of - especially the one you think of when Project Euler describes the problem, is likely to be wrong.
So, how can you approach this problem? There are probably too many permutations to look at, but maybe you can figure out something with mappings and comparing mappings?
(Trying to avoid giving it all away.)