String - Matching Automaton - automation

So I am going to find the occurrence of s in d. s = "infinite" and d = "ininfinintefinfinite " using finite automaton. The first step I need is to construct a state diagram of s. But I think I have problems on identifying the occurrence of the string patterns. I am so confused about this. If someone could explain a little bit on this topic to me, it'll be really helpful.

You should be able to use regular expressions to accomplish your goal. Every programming language I've seen has support for regular expressions and should make your task much easier. You can test your regexs here: https://regexr.com/ and find a reference sheet for different things. For your example, the regex /(infinite)/ should do the trick.

Related

Clean unstructured place name to a structured format

I have around 300k unstructured data as below screen.I'm trying to use Google refine or OpenRefine to make this correct. However, I'm unable to find a proper way to do this. I'm new to this tool. Anyone's help would be greatly appreciated.Also, this tool is quite slow to process 300k records. If I am trying out something its taking lots of time to process and give an output.
OR Please suggest any other opensource tools and techniques do this?
As Owen said in comments, your question is probably too broad and cannot receive acceptable answer. We can just provide you with a general procedure to follow.
In Open Refine, you'll need to create a column based on the messy column and apply transformations to delete unwanted characters. You'll have to use regular expressions. But for that, it's necessary to be able to identify patterns. It's not clear to me why the "ST" of "Nat.secu ST." is important, but not the "US" in "Massy Intertech US". Not even the "36" in "Plowk 36" (Google doesn't know this word, so I'm not sure is an organisation name).
On the basis of your fifteen lines, however, we seem to distinguish some clear patterns. For example, it looks like you'll have to remove the tokens (character suites without spaces) at the end of the string that contain a #. For that, the GREL formula in Open Refine could look like this:
value.trim().replace(/\b\w+#\w+\b$/,'')
Here is a screencast if it's not clear to you.
But sometimes a company name may contain a #, in which case you will need to create more complex rules. For example, remove the token only if the string contains more than two words.
if(value.split(' ').length() > 2, value.replace(/\b\w+#\w+\b$/, ''), value)
And so on for the other patterns that you'll find (for example, any number sequence at the end that contains more than 4 numbers and one - between them)
Feel free to check out the Open Refine documentation in case of doubt.

Is it, and if so why, wrong that these two regular grammars are different?

I'm tasked with writing a regular grammar based on a regular expression.
Given the regular expression a*b can be written as S -> b | aS
Is it incorrect that ba* as a regular grammar is S -> b | Sa?
I'm told the correct answer is in fact S -> bA, A -> ^| aA but I don't see the difference myself.
An explanation would be greatly appreciated!
IIRC, both your answer and the one being called "correct" are correct. See this. What you have constructed is a "left regular grammar", while the proponent(s) of the "correct" answer obviously prefer a "right regular grammar". There are other arbitrary rules that may be held more or less pedantically, like the "no empty productions" rule, but they don't really affect the class of regular languages, just the compactness of the grammar you use for a particular language, as your example highlights - a single production with two alternatives vs. two productions, one with a single clause, and one with two alternatives, one of which is empty.

Am I correct? (Finite Automata)

I was given a regular expression, and I am suppose to covert it to NFA and then DFA. Here's the regular expression:
a ( b | c )* a | a a c* b
Then I coverted this to NFA using Thomson's algorithm:
and here's the DFA:
Can someone please take a quick look at let me know if I am wrong or right?
Since this is very likely homework, I'm hesitant to just give you the complete correct solution.
Your NFA appears correct, but has a lot of superfluous states that aren't necessary but do not adversely affect its correctness. (At first glance it looks like you could remove 11 states.)
Your DFA is incorrect, though. This is because when you branch off to begin handling one condition of the string or the other, you later rejoin them together. This allows it to take the path from an accepted string matching a(b|c)*a and take in another b or c by travelling to nodes 15,17 or 11. It then accepts this string even though it doesn't match your expression.
What you need to do is basically stop this from happening. If you have additional questions feel free to ask.
I highly recommend making a list of test strings that you know should be and shouldn't be accepted, and then trace them through, making sure your automata ends in the correct (accept or reject) state.

common meanings of punctuation characters

I'm writing my own syntax and want characters that do not have obvious common meanings in that syntax [1]. Is there a list of the common meanings of punctuation characters (e.g. '?' could be part of a ternary operator, or part of a regex) so I can try to pick those which may not have 'obvious' syntax (I can be the judge of that :-).
[1] It's actually an extended Fortran FORMAT, but the details are irrelevant here
Here is an exhaustive survey of syntax across languages.
I am loath to be so defeatist, but this does sound a bit like it doesn't exist ( a list of all the symbols / operators across languages ) a quick look around would give a good idea of what is commonplace.
Assuming that you will restrict yourself to ASCII, the short-list is more or less what you can see on your keyboard and I can can think of a few uses for most of them. So maybe avoiding conflicts is a bit ambitious. Of course it depends on who is to be the user of this syntax, if for example symbols that are relatively unused in Fotran would be suitable then that is more realistic.
This link: Fotran 95 Spec gives a list of Fortran operators, which might help if avoided.
I'm sorry if any of this is a statement of the obvious or missing the point, or just not very helpful :)
I would say [a-z][A-Z] All do not have an obvious syntax for instance. if you used Upper case T as an operator.
x T v
The downfall is people like to use letters for variables.
Other than that you might want to investigate multicharacter operators, the downfall of these however is that they quickly grow weary to type things like
scalar = vec4i *+ vec4j
if you perhaps had a Fused multiply add operator. Well that one isnt so bad, but I'm sure you can find more cumbersome ones.

Why should I capitalize my SQL keywords? Is there a good reason? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a good reason to use upper case for T-SQL keywords?
I personally find a string of lowercase characters to be more readable than a string of uppercase characters. Is some old/popular flavor of SQL case-sensitive or something?
For reference:
select
this.Column1,
case when this.Column2 is null then 0 else this.Column2 end
from dbo.SomeTable this
inner join dbo.AnotherTable another on this.id = another.id
where
this.Price > 100
vs.
SELECT
this.Column1,
CASE WHEN this.Column2 IS NULL THEN 0 ELSE this.Column2 END
FROM dbo.SomeTable this
INNER JOIN dbo.AnotherTable another ON this.id = another.id
WHERE
this.Price > 100
The former just seems so much more readable to me, but I see the latter way more often.
I agree with you - to me, uppercase is just SHOUTING.
I let my IDE handle making keywords stand out, via syntax highlighting.
I don't know of a historical reason for it, but by now it's just a subjective preference.
Edit to further make clear my reasoning:
Would you uppercase your keywords in any other modern language? Made up example:
USING (EditForm form = NEW EditForm()) {
IF (form.ShowDialog() == DialogResult.OK) {
IF ( form.EditedThing == null ) {
THROW NEW Exception("No thing!");
}
RETURN form.EditedThing;
} ELSE {
RETURN null;
}
}
Ugh!
Anyway, it's pretty clear from the votes which style is more popular, but I think we all agree that it's just a personal preference.
I think the latter is more readable. You can easily separate the keywords from table and column names, etc.
One thing I'll add to this which I haven't seen anyone bring up yet:
If you're using ad hoc SQL from within a programming language you'll have a lot of SQL inside strings. For example:
insertStatement = "INSERT INTO Customers (FirstName, LastName) VALUES ('Jane','Smith')"
In this case syntax coloring probably won't work so the uppercasing could be helping readability.
From Joe Celko's "SQL Programming Style" (ISBN 978-0120887972):
Rule:
Uppercase the Reserved Words.
Rationale:
Uppercase words are seen as a unit,
rather than being read as a series of
syllables or letters. The eye is drawn
to them, and they act to announce a
statement or clause. That is why
headlines and warning signs work.
Typographers use the term bouma for
the shape of a word. The term appears
in paul Saenger's book (1975). Imagine
each letter on a rectangular card that
just fits it, so that you see the
ascenders, descenders, and baseline
letters as various "Lego blocks" that
are snapped together to make a word.
The bouma of an uppercase word is
always a simple, dense rectangle, and
it is easy to pick out of a field of
lowercase words.
What I find compelling is that this is the only book about SQL heuristics, written by a well-known author of SQL works. So is this the absolute truth? Who knows. It sounds reasonable enough and I can at least point out the rule to a team member and tell them to follow it (and if they want to blame anyone I give them Celko's email address :)
Code has punctuation which SQL statements lack. There are dots and parentheses and semicolons to help you keep things separate. Code also has lines. Despite the fact that you can write a SQL statement on multiple physical lines, it is a single statement, a single "line of code."
IF I were to write English text without any of the normal punctuation IT might be easier if I uppercased the start of new clauses THAT way it'd be easier to tell where one ended and the next began OTHERWISE a block of text this long would probably be very difficult to read NOT that I'd suggest it's easy to read now BUT at least you can follow it I think
Mostly it's tradition. We like to keep keywords and our namespace names separate for readability, and since in many DBMSes table and column names are case sensitive, we can't upper case them, so we upper case the keywords.
I prefer lower case keywords. SQL Server Management Studio color codes the keywords, so there is no problem distinguishing them from the identifiers.
And upper case keywords feels so... well... BASIC... ;)
-"BASIC, COBOL and FORTRAN called from the eighties, and they wanted their UPPERCASE KEYWORDS back." ;)
I like to use upper case on SQL keywords. I think my mind skips over them as they are really blocky and concentrates on what's important. The blocky words split up the important bits when you layout like this:
SELECT
s.name,
m.eyes,
m.foo
FROM
muppets m,
muppet_shows ms,
shows s
WHERE
m.name = 'Gonzo' AND
m.muppetId = ms.muppetId AND
ms.showId = s.showId
(The lack of ANSI joins is an issue for another question.)
There is a psychology study that shows lowercase was quicker to read than uppercase due to the outlines of the words being more distinctive. However, this effect can disappear about with lots of practice reading uppercase.
What's worse it that as the majority of developers at my office believe in capitals for SQL keyword, so I have had to change to uppercase. Majority rules.
I believe lowercase is easier to read and that given that SQL keywords are highlighted in blue anyway.
In the glory days, keywords were in capitals because we were developing on green screens!
The question is: if we don't write C# keywords in uppercase then why do I have to write SQL keywords in uppercase?
Like someone else has said - capitals are SHOUTING!
Back in the 1980s, I used to capitalize database names, and leave SQL keywords in lower case. Most writers did the opposite, capitalizing the SQL keywords. Eventually, I started going along with the crowd.
Just in passing, I'll mention that, in most published code snippets in C, C++, or Java the language keywords are always in lower case, and upper case keywords may not even be recognized as such by some parsers. I don't see a good reason for using the opposite convention in SQL that you use in the programming language, even when the SQL is embedded in source code.
And I'm not defending the use of all caps for database names. It actually looks a little like "shouting". And there are better conventions, like using a few upper case letters in database names. (By "database names" I mean the names of schemas, schema objects like tables, and maybe a few other things.) Just because I did it in the 80s doesn't mean I have to defend it today.
Finally, "De gustibus non disputandum est".
It's just a matter of readability and helps you quickly distinguish SQL keywords.
Btw, that question was already answered:
Is SQL syntax case sensitive?
I prefer using upper case as well for keywords in SQL.
Yes, lower case is more readable, but for me having to take an extra second to scan through the query will do you good most of the time. Once it's done and tested you should rarely ever see it again anyway (DAL, stored procedure or whatever will hide it from you).
If you are reading it for the first time, capitalized WHERE AND JOIN will jump right at you, as they should.
It’s just a question of readability. Using UPPERCASE for the SQL keywords helps make the script more understandable.
I capitalize SQL to make it more "contrasty" to the host language (mostly C# these days).
It's just a matter of preference and/or tradition really...
Apropos of nothing perhaps, but I prefer typesetting SQL keywords in small caps. That way they look capitalized to most readers, but they aren't the same as the ugly ALL CAPS style.
A further advantage is that I can leave the code as is and print it in the traditional style. (I use the listings package in LaTeX for pretty-printing code.)
Some SQL developers here like to lay it out like this:
SELECT s.name, m.eyes, m.foo
FROM muppets m, muppet_shows ms, shows s
WHERE m.name = 'Gonzo' AND m.muppetId = ms.muppetId AND ms.showId = s.showId
They claim this is easier to read unlike your one field per line approach which I use myself.