implementing jump-statement in ast-walking interpreter - interpreter

In an ast-walking interpreter, the code is executed node by node. How can I implement features like goto, break or continue? I stop the current execution and jump to another node? Are there any best practices?

Best practice is "don't interpret ASTs for languages with gotos".
Fundamentally any kind of discontinuity in the tree walk causes serious slowdown if the language is processing mostly scalars. (If your language processes complex values mostly, like the array language APL, it won't matter).
The best you can hope for is to pre-walk the tree and determine where the gotos actually go in the AST, and record that in an associative cache off to the side. Then when you encounter a goto, simply consult the cache rather than searching the tree.
But this is the first step down the road toward compiling, e.g., precomputing what you can before you execute.

Related

Compiler code optimization: AST vs. IR

, where I define IR as a 3-address code type representation (I realize that one can mean by it an AST representation as well).
It is my understanding that, when writing a best-practice compiler for an imperative language, code optimization happens both on the AST (probably best using a Visitor Pattern), and on the IR produced from the AST. 
(a) Is that correct?
(b) Which type of optimization steps are best handled on the AST before even producing an IR? (reference to an article/a list online welcome too as long as it deals with an imperative language) 
The compiler I'm working on is for Decaf (which some might know) which has a fairly deep CFG up to (single) class inheritance; I'll add features not part of it such as type coercion. It will be completely hand-coded (using no tools whatsoever). This is not homework; writing it for fun. 
(a) Yes.
(b) Constant folding is one example; CSE is another; in fact almost anything to do with expression evaluation. IR-phase optimizations are more about what results from flow analysis.
IR is a form of an AST (often it is "flattened", but there are deep tree IRs as well), it may not be easy to distinguish one from another, especially if compiler is implemented as a sequence of very small rewrites from an original AST all the way down to a final IR suitable for instruction selection.
Optimisations may happen anywhere on this chain, but some representations are more suitable for a wide range of optimisations, most notably, an SSA form, used by most of the modern compilers to do nearly all the optimisations.
It's never too early to optimise (to coin a phrase). So there are optimisations performed before and during AST creation, on the AST itself, on the IR (if you have one) and on the code as it is generated. In C-like languages and those that compile to machine code, the effort goes into the later stages. In compilers targeting a VM I think there is less room for improvements at that stage.
Some early optimisations obviously work better than others. I don't know much about Decaf, but there are the obvious things like constant folding and constant expression evaluation. If you get the whole program in tree form before you have to generate any code you can find common subexpressions, do code migration, eliminate dead code/dead stores, hoist invariants, eliminate tail recursion and some kinds of strength reduction.
A lot of it depends on how hard you want to work and what your target is. You didn't say much about that.

How to get better at optimization?

In advance apologize if the question seems somewhat broad or strange, I don't mean to offend anyone, but maybe someone can actually make a recommendation. I tried looking for the similar questions, but cold not.
Which are the better resources (books, blogs etc.) that can teach about optimizing code?
There is quite a few resources on making code more human-readable (Code Complete being number one choice probably). But what about making it run faster, more memory-efficient?
Of course there are lots of books on each particular language, but I wonder if there are some that cover the problems of memory / speed of operations and are somewhat language-independent?
Here are some links that might be helpful in general on the subject of memory optimizations
What Every Programmer Should Know About Memory by Ulrich Drepper
Herb Sutter: The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Slides: Herb Sutter: Machine Architecture (Things Your Programming Language Never Told You)
Video: Herb Sutter # NWCPP: Machine Architecture: Things Your Programming Language Never Told You
The microarchitecture of Intel, AMD and VIA CPUs
An optimization guide for assembly programmers and compiler makers, by Agner Fog
Read Structured Programming with go to Statements. While it's the source of the "premature optimisation is the source of all evil" quote that comes up the moment somebody wants to make anything faster or smaller - no matter how desperately important or late in the process they are - it's actually about the importance of making things efficient when you can.
Learn about time complexity, space complexity and the analysis of algorithms.
Come up with examples where you would want to sacrifice having worse space complexity for better time complexity, and vice versa.
Know the time and space complexities of the algorithms and data structures your languages and frameworks of choice offer, especially those you use most often.
Read the answers on this site on questions about creating a good hash code.
Study the approach HTTP took to having the advantage of caching, without the disadvantage of using stale data inappropriately. Consider how easy or difficult that is to apply to in-memory caches. Consider when you would say "screw it, I can live with being stale for the speed boost it gives me". Consider when you would say "screw it, I can live with being slow for the guarantee of freshness it gives me".
Learn how to multithread. Learn when it improves performance. Learn why it often doesn't or even makes things worse.
Look at a lot of Joe Duffy's blog where performance is a regular concern of his writing.
Learn how to process items as streams or iterations rather than building and rebuilding data-structures full of each item, each time. Learn when you're actually better off not doing that.
Know what things cost. You can't reasonably decide "I'll work so this is in the CPU cache rather than main-memory/main-memory rather than disk/disk rather than over a network" unless you've a good idea what actually causes each to be hit, and what the cost differences are. Worse, you can't dismiss something as premature optimisation if you don't know what they cost - not bothering to optimise something is often the best choice, but if you don't even consider it in passing you aren't "avoiding premature optimisation", you're muddling through and hoping it works.
Learn a bit about what optimisations are done for you by the script engine/jitter/compiler/etc you use. Learn how to work with them rather than against them. Learn not to re-do work it'll do for you anyway. In one or two cases, you may also be able to apply the same general principle to your work.
Search for cases on this site where something is dismissed as an implementation detail - yes, all of those are cases where the detail in question isn't the most important thing at the time, but all of those implementation details were chosen for a reason. Learn what they were. Learn the counter-arguments.
Edit (I'll keep adding a few more to this as I go):
Different books of course differ in the emphasis they put on efficiency concerns, but I remember Stroustrup's The C++ Programming Language as one where there were a good few times where he will explain a choice between a few different options as relating to efficiency, and also on how to not have decisions made for efficiency's sake impact on the usability of the classes "from the outside".
Which brings me to another point. Concentrate on the efficiency of the library code you reuse in different projects. You don't want to ever be thinking "maybe I should hand-roll a new one here to be more efficient", unless it's a very specialised case, you want to be confident that lots of work went into making that heavily used class efficient over a lot of case, and concentrate on identifying hot-spots.
As for specialised cases, some of the more obscure data structures are worth knowing for the cases they serve. For example, a DAWG is a very compact structure for storing strings with a lot of common prefixes and suffixes (which would be most words in most natural languages) where you just want to find those in the list that match a pattern. If you need a "payload" then a tree where each letter has a list of nodes for each subsequent letter (a generalisation of a DAWG but ending in that "payload" rather than the terminal node) has some but not all of the advantages. They also find the result in O(n) time where n is the length of the string sought.
How often will that come up? Not many. It came up for me once (a few times really, but they were variants of the same case), and as such it would not have been worth it for me to learn all there was to know about DAWGs until then. But I knew enough to know it was what I needed to research later, and it saved me gigabytes (really, from way too much for a machine with 16GB RAM to cope with, to less than 1.5GB). Going straight for a hand-rolled DAWG would totally be premature optimisation rather than putting the strings in a hashset, but flicking through the NIST datastructure site meant I could when it came up.
Consider: "Finding a string in a DAWG is O(n)" "Finding a string in a Hashset is O(1)" Both of these statements is true, but the speed of the two tends to be comparable. Why? Because the DAWG is O(n) in terms of the length of the string, and effectively O(1) in terms of the size of the DAWG. The Hashset is O(1) in terms of the size of the hashset, but working out the hash is typically O(n) in terms of the length of the string, and equality checks are also O(n) in terms of that length. Both statements were correct, but they were thinking about a different n! You always need to know what n means in any discussion of time and space complexity - most often it'll be the size of the structure, but not always.
Don't forget constant effects: O(n²) is the same as O(1) for sufficiently low values of n! Remember that the likes of O(n²) translates as n²*k + n * k₁ + k₂, with the assumption that k₁ & k₂ are low enough and k and the k of another algorithm or structure we are comparing of are close enough, that they don't really matter and it's only n² that we care about. This isn't true all the time, and we can sometimes find that k, k₁ or k₂ are high enough that we end up in trouble. It's also not true when n is going to be so small as to make the difference in the constant costs of different approaches matter. Of course normally when n is small we don't have a big efficiency concern, but what if we are doing m operations on structures averaging n in size, and m is large. If we are choosing between an O(1) and a O(n²) approach, we are choosing between an O(m) and O(n²m) approach overall. It still seems like a no-brainer in favour of the former, but with a low n it essentially becomes a choice between two different O(m) approaches, and the constant factors are much more important.
Learn about lock-free multi-threading. Or perhaps don't. Personally, I've two pieces of my own code I use professionally that use all but the simplest lock-free techniques. One is based on well-known approaches and I wouldn't bother now (it's .NET code first written for .NET2.0 and the .NET4.0 library supplies a class that does the same thing). The other I first wrote for fun, and only used after that just-for-fun period had given me something reliable (and it still gets beaten by something in the 4.0 library for a lot of cases, but not for some others that I care about). I would hate to have to write something like it with a deadline and a client in mind.
All that said, if you're coding out of interest, the challenges involved are interesting and it's an enjoyable thing to work with when you've the freedom to give up on a failed plan that you don't get when you're doing something for a paying client, and you'll certainly learn a lot about efficiency concerns generally. (Take a look at https://github.com/hackcraft/Ariadne if you want to see some of what I've done with this).
A Case Study
Actually, that contains a relatively good example of some of the above principles. Take a look at the method that's currently at line 511 at https://github.com/hackcraft/Ariadne/blob/master/Collections/ThreadSafeDictionary.cs (where I joke in the comments about it being flame-bait for people quoting Dijkstra. Let's use it as a case-study:
This method was first written to use recursion, because it's a naturally recursive problem - after doing the operation on the current table, if there's a "next" table we want to do the exact same operation on that, and so on until there's no further table.
Recursion is almost always slower than iteration, for a few different methods. Should we make all recursive calls iterative? No, it's often not worth it, and recursion is a wonderful way to write code that is clear about what it's doing. Here though I apply the principle above that since this is a library that might be called where performance is crucial, particular effort should be extended on it.
The decision to try to improve its speed being made, the next thing I did was make measurements. I don't depend on "I know that iteration is faster than recursion, so it must be faster when changed to avoid recursion". That's just not true - a poorly written iterative version may not be as good as a well-written recursive version.
The next question is, just how to re-write it. I've a tested method that I know works and I'm going to replace it with a different version. I don't want to replace it with a version that doesn't work, obviously, so how to re-write while taking the most advantage out of what's already there?
Well, I know about tail-call elimination; an optimisation normally done by compilers that changes the way the stack is managed so that recursive functions end up with properties closer to those of iterative (it's still recursive from the perspective of the source code, but it's iterative in terms of how the compiled code actually uses the stack).
This gives me two things to think about: 1. Maybe the compiler is already doing this, in which case my extra work isn't going to do anything to help. 2. If the compiler isn't already doing this, I can take the same basic approach manually.
That decision made, I replaced all of the points where the method called itself, with a change to the one parameter that would be different for that next call, and then go back to the beginning. I.e. instead of having:
CurrentMethod(param0.next, param1, param2, /*...*/);
We have:
param0 = param0.next;
goto startOfMethod;
That being done, I measure again. Running through the entire unit tests for the class is now consistently 13% faster than before. If it were closer I'd have tried more detail measurements, but a consistent 13% on runs that includes code that doesn't even call this method is something I'm pretty happy with. (It also tells me that the compiler wasn't doing the same optimisation, or I wouldn't have gained anything).
Then I clean up the method to make more changes that make sense with the new code. Most of them let me take out the goto because goto is indeed nasty (and there's other places the same optimisation was done that aren't as obvious because the goto was refactored entirely). In some, I left it in, because 13% is worth breaking the no-goto rule to my mind!
So the above gives an example of:
Deciding where to concentrate optimisation effort (based on how often it might be hit and my inability to predict all uses of the library)
Using knowledge of general costs (recursion costs more than iteration, most of the time).
Measuring rather than depending on assuming the above always applies.
Learning from what compilers do.
Understanding that because of that I may not gain anything - maybe the compiler already did it for me.
Avoiding optimisations leading to unreadable code (refactoring out most of the gotos the first pass introduced).
Some of these are matters of opinion and style (the decision to leave in some goto would not be without controversy), and it's certainly okay to disagree with my decisions, but knowledge of the points raised so far in this post would make it an informed disagreement, rather than a knee-jerk one.
In addition to the resources mentioned in other answers, Michael Abrash's Graphics Programming Black Book is a great read for learning about optimization. While the specifics are a bit dated in places, it is still a great resource for learning about how to approach optimization.
Any time you want to optimize code it is absolutely essential to measure, measure, measure. One of the best ways to learn about optimization is by doing - take some code you want to optimize, learn how to use a profiler to measure its performance and then make changes and measure the results.

What are motivations behind compiling to byte-code?

I'm working on my own toy programming language. For now I'm interpreting the source language from AST and I'm wondering what advantages compiling to a byte-code and then interpreting it could provide me.
For now I have three things in mind:
Traversing the syntax tree hundreds of time may be slower than running instructions in an array, especially if the array support O(1) random access(ie. jumping 10 instructions up and down).
In typed execution environment, I have some run-time costs because my AST is typed, and I'm constantly traversing it(ie. I have 10 types of nodes and I need to check what type I'm on now to execute). Maybe compiling to an untyped byte-code could help to improve this, since after type-checking and compiling, I would have an untyped values and code.
Compiling to byte-code may provide better portability.
Are my points correct? What are some other motivations behind compiling to bytecode?
Speed is the main reason; interpreting ASTs is just too slow in practice.
Another reason to use bytecode is that it can be trivially serialized (stored on disk), so that you can distribute it. This is what Java does.
The point of generating byte code (or any other "easily interpreted" form such as threaded code) is essentially performance.
For an AST intepreter to decide what to do next, it needs to traverse the tree, inspect nodes, determine the type of nodes, check the type of any operands, verify legality, and decide which special case of the AST-designated operator applies (it says "+", but it means 16 bit add or string concatenate?), before it finally performs some action.
If one takes the final action and generates some kind of easily interpreted structure, then at "execution" time the interpreter can focus simply on performing actions without all that checking/special-case determination.
Another recent excuse is that if you generate byte code for any of a number of well-known virtual machines (JVM, MSIL, Parrot, etc.) you don't even have to code the interpreter. For the JVM and MSIL, you also get the benefit of the JIT compilers associated with them, and with careful design of your language, compatibility with huge libraries, which are the real attraction of Java and C#.

Is it a good practice to do optimization during initial coding?

Is it a good practice to follow optimization techniques during initial coding itself or should one concentrate purely on realization of functionality first?
If one concentrates purely on functionality during initial coding, then how easy or difficult is it to take care of optimization later on?
Optimise your design and architecture - don't lock yourself into a design which will never scale - but don't micro-optimise your implementation. In particular, don't sacrifice simplicity and readability for micro-optimised implementation... at least not without benchmarking your code (ideally your whole system) first.
Measurement really is the key point when it comes to performance. Bottlenecks are almost never where you expect them to be. There are loads of different ways of measuring; optimisation without any measurement is futile IMO.
Donald Knuth said:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
It depends what you see as "optimization". Micro-optimization should not be done in early stages, and afterwards only if you have a valid reason to do so (e.g. profiler results or similar).
However, writing well-structured, clean code following best practices and common coding guidelines is a good habit, and once you're used to it, it doesn't take much more time than writing sloppy code. This kind of "optimization" (not the correct word for it, but some see it as such) should be done from the beginning.
See http://en.wikipedia.org/wiki/Program_optimization for quotes by Knuth.
If you believe that optimization might make your code harder to (a) get right in the first place, or (b) maintain in the long run, then it's probably best to get it right first. Having good development processes, such as Test Driven Development, can help you make optimisations later.
It's always better to have it work right and slow, than wrong and fast.
Rightly said by Donald Knuth "Premature optimization is the the root of all evil " , and it makes your coding speed slow. The best way to optimize is by visiting the codebase again and refactoring. This way you know which part of the code is often used or is a bottleneck and should be fine tuned.
Premature optimization is not a good thing.
And that goes especially for low level optimization. But at a higher level your design shouldn't lock out any future optimization.
For example.
The retrieval of collections should be hidden behind methods call, in the end you can always decide to cache the retrieval of collections or not.
After you have a stable application and(!) you have developed regression unit tests. You can profile the application and optimize the hotspots. And remember to after every optimalization step you should run your complete unit test set.
Is it a good practice to follow optimization techniques during initial coding itself or should one concentrate purely on realization of functionality first?
If you know performance is critical (or important), consider it in your design and write it correctly the first time. If you don't also consider this in your design and it is important, you are wasting time or "developing a proof of concept".
Part of this comes down to experience; If you know optimizations and your program's problem areas or have already implemented similar functionalities in the past, your experience will certainly help you create an implementation closer to the end result the first time. If you still need a proof of concept, you should not be writing the actual program until that's completed -- kick out some tests to determine what solution is appropriate for the problem, then implement it properly.
If one concentrates purely on functionality during initial coding, then how easy or difficult is it to take care of optimization later on?
Some fixes are quick, others deserve complete rewrites. The more that needs to change and adapt after the fact, the more time you waste re-testing and maintaining a poorly implemented program. The libraries that are easiest to maintain and sustain the demands are typically the ones which the engineer had an understanding of what design is ideal, and strived to meet that ideal during initial implementation.
Of course, that also assumes you favor a long-lived program!
Premature optimization is the the root of all evil
To elaborate more on this famous quote, doing optimization early has the disadvantage of distracting you from doing a good design. Also, programmers are notoriously bad in finding which parts of the code cause the more trouble, and so try hard to optimize things that aren't that important. You should always measure first to find out what needs to be optimized and this can only happen in later phases.

What are Finite State Automata and why should a programmer know about them?

Erm - what the question said. It's something I keep hearing about, but I've not got round to looking into it yet.
(updated) I could look up the definition... but why not (as pointed out by #erikson) get insight into your real experiences and anecdotes. Community Wiki'd incase that helps folks vote up the most insightful answer. Interesting reading so far, thanks!
Short answer, it is a technique that you can use to express systems with concrete states (as opposed to quantum states / probability distributions).
Quoting the Wikipedia article:
A finite state machine (FSM) or finite
state automaton (plural: automata) or
simply a state machine, is a model of
behavior composed of a finite number
of states, transitions between those
states, and actions. A finite state
machine is an abstract model of a
machine with a primitive internal
memory.
So, what does that mean to you? Put simply, it is an effective way to represent the path(s) from a starting state to the end state(s) of the system that you care about. Using regular expressions as a fairly easy to understand example, let's look at the pattern AB+C (imagine that that plus is a superscript). I would expect to this pattern to accept strings such as "ABC", "ABBC", "ABBBC", etc. A at the start, C at the end, some number of B's in the middle (greater than or equal to one).
If you think about it, it's almost easier to think about this in terms of a picture. Faking it with text (and that my parentheses are a loopback arc), you can see that A (on the left), is the starting state and C (on the right) is the end state on the right.
_
( )
A --> B --> C
From FSAs, you can continue your journey into computational complexity by heading over to the land of Turing Machines.
However, you can also use state machines to represent real behaviors and systems. In my world, we use them to model certain workflow of actual people working with components that are extremely intolerant of mistakes in state order. As in, "A had better happen before C or there will be a very serious problem. Make that be not possible right now."
You could look it up, but what the hell. Intuitively, a finite state automaton is an abstraction of something that has some finite number of states, and rules by which you can go from state to state. A state is something for which a true or false statement can be made, and a rule is a way that you change from one state to another. So, you could have, say, two states: "I'm at home" and "I'm at work" and two rules, "go to work" and "go home."
It turns out that you can look at machines like this mathematically, and find there are things they can and cannot do. Regular expressions are basically a way of describing a finite state machine in which the states are a set of different strings, and the rules move you from state to state based on the next character read. You can prove that. But you can also prove that no finite state machine can tell whether or not the parentheses in an expression are matched (via the pumping lemma for FSAs.)
The reason you should learn about FSAs is that they can be used to solve many problems: string matching, control of systems, business process descriptions, digital circuit design. They're also inherently pretty.
Formally, an FSA is a algebraic structure F = 〈Σ, S, s0, F, δ〉 where Σ is the input alphabet, S is a set of states, s0 ∈ S is a particular start state, F ⊆ S is a set of accepting states, and δ:S×Σ → S is the state transition function.
in OOP terms: if you have an object with methods that you call on certain events, and some (other) methods that have different behaviour depending on the previous calls.... surprise! you have a state machine!
now, if you know the theory, you don't have to rethink it all. you simply say: "piece of cake, it's just a state machine" and go on to implement it.
if you don't know the theory you'll think about it for a while, write some clever hacks, and get something that's difficult to explain and document... because you don't have the words to describe it
Good answers above. I would only add that FSA are primarily a thinking tool, not a programming technique. What makes them useful is they have nice properties, and anything that acts like one has those properties. If you can think of something as an FSA, there are many ways you can build it:
as a regular expression
as a state-transition table
as a while-switch-on-state loop
as a goto-net (horrors!)
as simple structured program code
etc. etc.
If somebody says something is a FSA, you can immediately know what they are talking about, no matter how it is built.
You need state machines whenever you have to release your thread before you have completed your operation.
Since web services are often not statefull, you don't usually see this in web services--you re-arrange your URL so that each URL corresponds to a single path through the code.
I guess another way to think about it could be that every web server is a FSM where the state information is kept in the URL.
You often see it when processing input. You have to release your thread before the input has all been completed, so you set a flag saying "input in progress" or something like that. When done you set the flag to "awaiting input". That flag is your state monitor.
More often than not, a FSM is implemented as a switch statement that switches on a variable. Each case is a different state. At the end of the case, you may set the state to a new value. You've almost certainly seen this somewhere.
The nice thing about a FSM is that you can make the state a part of your data rather than your code. Imagine that you need to fill out 1000 items in the database. The incoming data will address one of the 1000 items, but you generally don't have enough data to complete the operation.
Without an FSM you might have hundreds of threads waiting around for the rest of the data so they can complete processing and write the results to the DB. With a FSM, you write the state to the DB, then exit your thread. Next time you can check the incoming data, read the state from the thread and that should give you enough info to determine what code to run.
Nearly every FSM operation COULD be done by dedicating a thread to it, but probably not as well (The complexity multiplies based on number of states, whereas with a state machine the rise in complexity is more linear). Also, there are some conceptual design issues--examining your code at the state level is in some cases much easier than examining it at the line of code level.
Every programmer should know about them because they are an excellent tool for certain kinds of problems, where the usual 'iterative-thinking' approach would yield nasty, complex code.
A typical example is game AI, where NPCs have different states that change according to where the player is, something like:
NPC_STATE_IDLE
NPC_STATE_ALERT (player at less than 100 meters)
NPC_STATE_ENGAGE (player attacked NPC)
NPC_STATE_FLEE (low on health)
where a FSM can describe easily the transitions and help perform complex reasoning about the system the FSM is describing.
Important: If you are a "visual" style learner, stop everything you are doing and go to this link ... Right Now.
If you are a "visual" learner, here is an excellent link that gives a very accessible introduction.
Reanimator by Oliver Steele
It looks like you've already approved an answer, but if you appreciate "visual" introduction to new concepts, as is common, you really should check out the link. It is simply outstanding.
(Note: the link points to a discussion of DFA and NDFA in the context of regular expressions -- with animated interactive diagrams)
Yes! You could look it up!
http://en.wikipedia.org/wiki/Finite_state_machine
What it is is better answered on other sites (such as Wikipedia), because there are pretty extensive answers out there already.
Why you should know them: Because you probably implemented them already.
Any time your code has a limited number of possible states (that's the "finite state" part) and switches to another one once some input/event happend (that's the "machine" part) you've written a finite state machine.
It is a very common tool and knowing the theoretical basics for that, being able to reason about it and knowing how to combine two FSMs into a single one that does the same work can be a great help.
FSAs are great data structures to understand because any chance you have to implement them, you're working at the lowest level of computational complexity on the Chomsky hierarchy. A great example is in word morphology (how parts of words come together). A lot of work has been done to show that even the most severe cases can be analyzed in this extremely fast analytical framework. Take a look at Karttunnen and Beesley's work out of PARC.
FSAs are also a great place to start learning about machine learning concepts like hidden markov models, because in many ways, the problem can be broken down using the same ideas and vocabulary.
One item that hasn't been mentioned so far is the semantic equivalence of finite state automata and regular expressions. A regular expression can be compiled to a finite state automaton (this is how regex libraries work) and vice-versa.
FSA (including DFA and NFA) are very important for computer science and they are use in many fields including many fields. For instance hidden markov fields for speech recognition also regular expressions are converted to the FSA's before they are interpreted by the software and NLP (Natural Language Processing), AI (game programming), Robot Programming etc.
One of the disadvantage of FSA's are they are usually slow and usually hard to implement and hard to understand or visualize while reading the code, but they are good because they usually provide generic solutions to the problems and they are well-known with a lot of studies on FSA's.