What is the difference between an Alphabet and an element of a set? - finite-automata

What is the difference between an Alphabet and an element of a set?
Whether Alphabet is an element of a set or it is a set itself?

It might be a little more correct to say that an Alphabet is a domain, whose definition consists of: "a set." In other words, an Alphabet is the set of all possible letters, such that any symbol that is not within that set, is not "a letter."
Notice that "a word" is not "a set," but rather "a collection" of "letters," because any word (such as the word, "letters") might contain the same letter many times.

if we only talk about automata theory, an Alphabet is a set of elements(letters).
So an Alphabet is an example of set where elements of this set are letters.
For exemple an alphabet A = {a, b, c} where 'a' is one of its elements.
I don't know if it is the answer you want. Maybe you could precise your question if it is not ? :)
EDIT : But you can have a set of sets like :
K = { {a, b, c}, {m, n, o} } which contains two Alphabets. But here K isn't an Alphabet anymore, it's a set of Alphabet.
Bye.

Related

what is ambiguity in alphabet in automata theory?

I am just new in automata field. I have read many articles, and seen many video. I stuck in some first topics. It can be easy for others. but after spending a lot of time,i am still unable to understand it.
TOPIC is: Ambiguity in alphabet
An alphabet is = {A, Aa, bab, d}
and a string is s= AababA
and author says that, this is ambiguous alphabet, because when computer reads it , it reads from left to right. After the capital A, there again A that is prefix of small a, will create ambiguity. A letter(symbol) should not be prefix again of a new letter.
moreover author says.
we will tokenize it (AababA) in two ways:
(Aa) (bab) (A)
(A) (abab) (A)
after that , first one is ok, second is not ok due to ambiguity in alphabet define above.
What is procedure to tokenize the above string in two ways? is there any specific rule?
How alphabet is ambiguous due to second group.
If it is invalid due to prefix of A, then how? What is the role of prefix in ambiguity of alphabet?
If we don't think about prefix, and we just simply match the both string group with above alphabet, then we can easily judge, that second is not matching with above alphabet, then why do we need to discuss that prefix?
I hope, this question will be considered important, so that answer will help me to make my self out of this confusion. I will be very thankful .
The author chose a confusing example. If you share the source where you got this example, I could give a better answer, but I would argue that in this case, there is no practical ambiguity. If you see Aa, you can know that the first lexeme must be "Aa", because nothing in the alphabet starts with "a".
For an easier example, consider the alphabet {A, a, Aa} and string "AAaAaaA"
You could tokenize this in the following ways:
(A) (A) (a) (A) (a) (a) (A)
(A) (Aa) (A) (a) (a) (A)
(A) (A) (a) (Aa) (a) (A)
(A) (Aa) (Aa) (A)
This is most often resolved by choosing the longest lexeme that matches in each case, which would yield the last tokenization.
Now let us return to your example, but let's make the string a little bit different: "AababAe".
You could tokenize the string in the following ways:
(Aa) (bab) (A) <error>
(A) <error>
In one branch, you have an error. In one branch, you don't. As you noted, the tokenizer should choose the first. Both have errors, though. The point is that there is an explicit choice here to prefer the longest valid tokenization. Nothing in the alphabet forces you to make this choice. It is just as valid to choose the shortest matching option. This would be massively impractical, but it is a valid choice.

What does command LIKE '[atoz:a]%' mean in SQL Server?

I've inherited a database and application from a project and I'm trying to debug a problem with a database query.
There's a line in the query that reads:
WHERE property_title LIKE '[atoz:a]%'
I'm struggling to understand what the atoz command is doing as I've never come across it before - I assumed it was stating that it would only allow characters in the title - but some of the titles contain symbols such as () or -
I've tried searching for it on Google but I must be using the wrong terminology as nothing is appearing. If someone could explain it to me, or point me to a resource that would be great!
Thanks
This is looking for property_title that starts with the letters "a", "t", "o", "z" and ":". The second "a" is redundant.
I would guess the intention is actually:
WHERE property_title LIKE '[a-z]%'
which would specify that the property title starts with a letter (or a lower case letter, depending on the collation being used).
This is just part of the LIKE operator of T-SQL:
[ ]
Any single character within the specified range ([a-f]) or set ([abcdef]).
WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. In range searches, the characters included in the range may vary depending on the sorting rules of the collation.
The exact expression you're seeing:
'[atoz:a]%'
basically means this:
First any single character that can be one of the following:
a, t, o, z, or :
Then followed by anything (even nothing)
Note that atoz does not mean any character from a to z, it literally means the 4 characters a, t, o and z. To get any character from a to z you would use [a-z]. The second a in the expression is redundant, as [aa] means the same as [a].

How do I identify which letter of the alphabet a word starts with in Objective-C?

Given a string, I'm trying to determine which letter of the alphabet it belongs to. For example, "apple" goes into the "A" section. "Banana" goes into the "B" section. I'm using this to identify the section:
NSRange range = [string rangeOfString:letter
options:NSAnchoredSearch |
NSCaseInsensitiveSearch |
NSDiacriticInsensitiveSearch |
NSWidthInsensitiveSearch
range:NSMakeRange(0, string.length)
locale:locale];
Where string is the string I'm trying to bucket and letter is a letter of the alphabet. I do this in a loop for each letter of the alphabet.
It works great, except for words like "æquo", which should be bucketed into the letter "A", but aren't. What to do?
Edit The plot thickens. I'm looking at Korean now. The word "것" should be bucketed into the letter "ㄱ". There's got to be some way to do this other than maintaining a huge mapping table.
I think I've figured it out: I was thinking about it wrong. The question isn't, does a given word begin with a certain letter of the alphabet. Rather, the question is, does a given word fall within the sorting range of a certain letter of the alphabet.
For example, in the case of "æquo", I can check if it falls within the sorting range of the "A" section by checking if it is or comes after "A", and comes before "B".
Apple's compare:options:range:locale: method knows the answer to those two questions for any given locale. In this particular example, for French it would say yes. For some other language, like Danish, it should say no.
I've tested this on English, Spanish, Portuguese, French, German, and Korean, and it appears to be giving the expected results.

How to recognize if word has no meaning, maybe some impossible syllables?

Initially, I have m arrays of n characters, where each array contains unknown (for me) character of needed word (condition: word has meaning).
For example, m = 4, n = 3: array0 = {'t', 'e', 'c'}, array1 = {'g' 'o' 'a'}, array2 = {'w' 'd' 'y'}, array3 = {'e' 'o' 's'}. Each array contains only one correct letter: in array0 is first letter, in array1 - second... So, the probable secret word is 'code': array0[2] = 'c', array1[1] = 'o', array2[1] = 'd', array3[0] = 'e'.
I need to find all of existing letter-combinations, i.e. exclude generated meaningless words.
Are there any rules/regularities of 'impossible' syllables/letter-combinations in English?
I'm attacking Vigenere's cipher. So, I know the length of key and its probable characters. I'm shuffling my arrays and getting many meaningless words. Problem is to filter them. As I get it, some conditions can help to recognize incorrect words. For example, if word length is > 4 then all vowel chars, or all consonant chars word is wrong. Some syllables, such as kk *hh* ww, in general, are impossible too. Where can I find such rules?
I'm supposing what you mean by the "word has meaning" is that it is an English dictionary word.
I believe that you should approach the problem from the other direction, as GregS suggests, and go through a dictionary. English has many exceptions when it comes to letters and spelling, and the number of words that look English are much greater than the actual number of English words. You won't be able to cut down your search very much in that way.
But because you know the length and probable characters you are able to quickly throw out many dictionary words. Also, if the message isn't too short, it would also be very fast to attempt a decoding of the message with possible words, and throw out unlikely decodings by letter, digram or trigram frequencies.
I'm not sure I follow your strategy for attacking a Vigenere cipher. However, in response to:
I need to find all of existing letter-combinations, i.e. exclude generated meaningless words. Are there any rules/regularities of 'impossible' syllables/letter-combinations in English?
Yes, indeed there is a plethora of such rules. There are two ways of learning and implementing these rules:
Carefully study the morphology of English, and meticulously implement the rules.
Train a Markov model on a corpus of English text.
1 will be substantially less work for little additional benefit.

Get words corresponding to a match from SpanNearQuery in Lucene

I would need to retrieve the words in my text that correspond to a match of Spans returned by SpanNearQuery.getSpans(). For instance, if my text is [a b c d e f] and I use SpanNearQueries with queries 'b' and 'e' (and sufficient slop), then I get a match 'b c d e' in my text. Now, how can I most efficiently retrieve the words as they appear in the match, that is, the sequence of words 'b c d e' itself?
Here is an example code of what I would need:
SpanNearQuery allNear = new SpanNearQuery(spansTermQueries, numWordsInBetween, true);
Spans allSpans = allNear.getSpans(reader);
Now I would like to iterate over all the matches in allSpans, and for each match retrieve the exact words between the queries 9 the text that correspond to that match.
One indirect way is to get the end and start position of that match, read through the text document using a file reader, and find the string of text between position 'end' and 'start'. But that does not seem a very efficient way. It seems that this information should already be stored in the Lucene Index.
Would anyone know of a more direct way of retrieving the words between the queries in a match?
Thanks.
What you want to do is highlighting. You can either use the plain highlighter or fast vector highlighter if you store term vectors.