Construct nfa occuring strings over {0,1} such that some two 0's are seperated by a string of length 4i, i>=0 - finite-automata

I am trying to solve this problem by first designing an NFA for a string of length 4i, as this is in the form of 0(mod 4).
Number of states = 4 and I just added 2 other states, one on each end of this design, and made a transition on 0, now number of states=6. My solution is wrong when I tried checking. Can someone pls explain where I am going wrong?

The high-level design for this NFA is correct, there are just a few missing details. One strategy I've found helpful when designing NFAs is to first start by coming up with a set of test cases or test strings. That is, if I were writing a program to check whether or not a string met these certain properties, what strings would I test? What would the edge cases be? These can help you spot patterns when you're first designing the NFA and you can use them to check your work afterwards.
For example, here are some of the test cases I would check for this problem:
00 \\ i = 0
010100 \\ i = 1
0101011010 \\ i > 1, handles lengths of larger multiples of 4
011110, 000000 \\ it shouldn't matter what's in between the two 0s
111010100 \\ can have anything before the two 0s
010100111 \\ can have anything after the two 0s
... etc...
You should consider these two in particular:
000000 - in the loop of your NFA that's checking whether or not the length of the string in between the two 0s is a multiple of 4, there is no restriction on the contents of this string. Specifically, there's no reason that the first character of this string cannot be a 0 (the transition from q1 to q5).
010100111 (and/or 0101001110, 0101000) - these are all examples of strings where we have two 0s separated by a string of length 4i, followed by some other characters. These strings should also be accepted by your NFA but currently are not - remember that an NFA accepts if it finishes in an accepting state, and that if an NFA needs to make a transition and no transition exists, it dies and that path rejects.
Do you see what modifications you can make to address these problems?

Related

How to treat numbers inside text strings when vectorizing words?

If I have a text string to be vectorized, how should I handle numbers inside it? Or if I feed a Neural Network with numbers and words, how can I keep the numbers as numbers?
I am planning on making a dictionary of all my words (as suggested here). In this case all strings will become arrays of numbers. How should I handle characters that are numbers? how to output a vector that does not mix the word index with the number character?
Does converting numbers to strings weakens the information i feed the network?
Expanding your discussion with #user1735003 - Lets consider both ways of representing numbers:
Treating it as string and considering it as another word and assign an ID to it when forming a dictionary. Or
Converting the numbers to actual words : '1' becomes 'one', '2' as 'two' and so on.
Does the second one change the context in anyway?. To verify it we can find similarity of two representations using word2vec. The scores will be high if they have similar context.
For example,
1 and one have a similarity score of 0.17, 2 and two have a similarity score of 0.23. They seem to suggest that the context of how they are used is totally different.
By treating the numbers as another word, you are not changing the
context but by doing any other transformation on those numbers, you
can't guarantee its for better. So, its better to leave it untouched and treat it as another word.
Note: Both word-2-vec and glove were trained by treating the numbers as strings (case 1).
The link you provide suggests that everything resulting from a .split(' ') is indexed -- words, but also numbers, possibly smileys, aso. (I would still take care of punctuation marks). Unless you have more prior knowledge about your data or your problem you could start with that.
EDIT
Example literally using your string and their code:
corpus = {'my car number 3'}
dictionary = {}
i = 1
for tweet in corpus:
for word in tweet.split(" "):
if word not in dictionary: dictionary[word] = i
i += 1
print(dictionary)
# {'my': 1, '3': 4, 'car': 2, 'number': 3}
The following paper can be helpful: http://people.csail.mit.edu/mcollins/6864/slides/bikel.pdf
Specifically, page 7.
Before they use an <unknown> tag they try to replace alphanumeric symbol combination with common pattern names tags, such as:
FourDigits (good for years)
I've tried to implement it and it gave great results.

Is format ####0.000000 different to 0.000000?

I am working on some legacy code at the moment and have come across the following:
FooString = String.Format("{0:####0.000000}", FooDouble)
My question is, is the format string here, ####0.000000 any different from simply 0.000000?
I'm trying to generalize the return type of the function that sets FooDouble and so checking to make sure I don't break existing functionality hence trying to work out what the # add to it here.
I've run a couple tests in a toy program and couldn't see how the result was any different but maybe there's something I'm missing?
From MSDN
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string.
Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
Because you use one 0 before decimal separator 0.0 - both formats should return same result.

How can I construct finite automata

I have to create a deterministic finite automata accepting the set of strings with an even number of 1 and ends with 0.Should I include 0 as a string from this set? and how can I do this?
Should I include 0 as a string from this set?
Yes
And how do I do this?
To construct a finite automaton, you need to identify the states and transitions. The Myhill-Nerode theorem allows you to find the necessary (and sufficient!) states of for a finite automaton if you are able to identify the equivalence classes of "indistinguishable" strings.
Two strings x and y are indistinguishable, in this sense, if for any other string z, either both xz and yz are in the language, or neither is.
In your case, let's try to identify equivalence classes. The empty string is in some equivalence class. The string 0 is in a different equivalent class, since you can add the empty string to 0 and get a string in the language (whereas you can't add the empty string to the empty string to get a string in the language). We have found two distinct equivalence classes so far - one for the empty string, one for 0. Both of these will need different states in our FA.
What about the string 1? It's distinguishable from both 0 and the empty string, since you can add 10 to 1 to get 110, a string in the language, but you can't add it to 0 or the empty string to get a string in the language. So we have yet another state.
What about the string 00? This string is not in the language, and no other string can be added to this string to get a string in the language. This is another equivalence class. It turns out that the next strings, 01 and 10, are also in this class.
The string 11 ends up being in the same class as the empty string: you can add any string in the language to 11 and get another string in the language. If you try all strings of length 3, you will find that all of those already fall into one of the above classes, and you can stop checking at that point.
So we have four states - let's call them [-], [0], [1], and [00]. Now we figure out transitions.
If you get a 0 in [-], you need to go to [0]... and if you get a 1, you need to go to [1]. For the rest, just figure out what string you'd get by adding to the canonical one, and which class the resulting string would be in... and go to that state.
Given Question is to construct a Finite Automata with even number of 1's and ends with 0.
So the alphabet of the language is {0,1}
These are the the strings that are accepted by the language.
The Language always consists of '0' before its final state as it is the end of the string and we reach the final state when we reach the last '0' in the string.
Here in the normal procedure of conversion of it into the finite automata we get NFA
Then we need to convert the NFA to DFA by combining 2 states into single and simplifying them.
New transition diagram
Here we had drawn the new transition diagram based on the states reached by a specific state at a given input. Then the new states formed by joining 2 states [ here {q0,q2} state is formed]
This new state {q0,q1} on 0 as input goes to itself (as q0 on 0 goes to q0 and q2 on 0 goes to q2).
So let us conside this new state {q0,q2} as a new state q2'
So by using the Transition state diagram we can easily make the required DFA
Deterministic Finite Automata
The above diagram is the constructed finite automata accepting the set of strings with an even number of 1's and ending with 0.
q0 - is the Initial state
q2'- is the Final state

Give state diagrams of DFAs recognizing the following languages. In all parts the alphabet is {0,1 }

Im trying to get the hang of drawing DFAs. I have the following problem to do with my following attempt, was wondering if anyone could tell me if im correct, or if incorrect what im doing wrong. Thanks! Also, if anyone has a good resource to learn more about how to do these, it would be greatly appreciated.
Give state diagrams of DFAs recognizing the following languages. In all parts the alphabet is {0,1 }
{w | the length of w is at most 5}
Here are some clues.
"At most 5": this implies you must do some counting. In state machines, counting is accomplished by the context of each node. In other words, you will require a number of nodes, each with a special meaning, and that meaning will be your "counter value."
"At most 5": This means you must accept words of length 0, 1, 2, 3, 4, and 5. (All of which have unique values, hint hint.)
Your alphabet is {0,1}, but there are no requirements of the language of the frequency, ordering, or anything related to 0 and 1. This means every time there is a transition for 0, the same transition must be available to 1, and vice versa. (Or some equivalent relation that reduces to this rule - but this is in parentheses because it's not something you need to think about.)
Here are your errors:
You have no marked start state.
The strings "0", "" (the empty string), "1" are rejected, but are within the prescribed language. In other words, you are accepting only words that are exactly length 5, not all words that are length 5 and less.
Since the alphabet is {0, 1}, you must specify at EACH state what happens when either a 0 or a 1 is encountered. If you encounter an input character whose edge is NOT specified, by convention you are going to the dead state, a state that always returns to itself and is never accepted, but is left undrawn. This is why your right-most state is unnecessary, but your left states are incomplete.
Final, big hint: You can have more than one "Accept" or "Final" state.
I think the DFA shown above is wrong. It will accept strings up to length 5 so you should make all the first six states to be final states. You are accepting only '1's but it should also accept '0's......so attach 0 with all 1's.

What is the technical term for the input used to calculate a checkdigit?

For example:
code = '7777-5';
input = code.substring(0, 4); // Returns '7777'
checkdigit = f(input); // f() produces a checkdigit
assert.areEqual(code, input + "-" + checkdigit)
Is there a technical term for input used above?
Specifically I'm calculating checkdigits for ISBNs, but that shouldn't effect the answer.
Is "original number excluding the check digit" technical enough? :)
Actually, it's often the case, as in the link you posted, that the check digit or checksum ensures a property about the full input:
...[the check digit] must be such that the sum of all the ten digits, each multiplied by the integer weight, descending from 10 to 1, is a multiple of the number 11.
Thus, you'd check the full number and see if it meets this property.
It's "backwards" when you're initially generating the check digit. In that case, the function would be named generate_check_digit or similar, and I'd just name its parameter as "input".
Although I am not sure if there is a well-known specific technical term for the input, what LukeH suggested (message/data) seems common enough.
Wiki for checksum:
With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum
Wiki for check digit:
A check digit is a form of redundancy check used for error detection, the decimal equivalent of a binary checksum. It consists of a single digit computed from the other digits in the message.