How Do I Design Abstract Semantic Graphs? - language-design

Can someone direct me to online resources for designing and implementing abstract semantic graphs (ASG)? I want to create an ASG editor for my language. Being able to edit the ASG directly has a number of advantages:
Only identifiers and literals need to be typed in and identifiers are written only once, when they're defined. Everything else is selected via the mouse.
Since the editor knows the language's grammar, there are no more syntax errors. The editor prevents them from being created in the first place.
Since the editor knows the language's semantics, there are no more semantic errors.
There are some secondary advantages:
Since all the reserved words are easily separable, a program can be written in one locale and viewed in other. On-the-fly changes of locale are possible.
All the text literals are easily separable, so changes of locale are easily made, including on-the-fly changes.

I'm not aware of a book on the matter, but you'll find the topic discussed in portions of various books on computer language. You'll also find discussions of this surrounding various projects which implement what you describe. For instance, you'll find quite a bit of discussion regarding the design of Scratch. Most workflow engines are also based on scripting in semantic graphs.
Allow me to opine... We've had the technology to manipulate language structurally for basically as long as we've had programming languages. I believe that the reason we still use textual language is a combination of the fact that it is more natural for us as humans, who communicate in natural language, to wield, and the fact that it is sometimes difficult to compose and refactor code when proper structure has to be maintained. If you're not sure what I mean, try building complex expressions in Scratch. Text is easier and a decent IDE gives virtually as much verification of correct structure.*
*I don't mean to take anything away from Scratch, it's a thing of beauty and is perfect for its intended purpose.

Related

Are there any interpreted languages in which you can dynamically modify the interpreter?

I've been thinking about this writing (apparently) by Mark Twain in which he starts off writing in English but throughout the text makes changes to the rules of spelling so that by the end he ends up with something probably best described as pseudo-German.
This made me wonder if there is interpreter for some established language in which one has access to the interpreter itself, so that you can change the syntax and structure of the language as you go along. For example, often an if clause is a keyword; is there a language that would let you change or redefine this on the fly? Imagine beginning a console session in one language, and by the end, working in another.
Clearly one could write an interpreter and run it, and perhaps there is no concrete distinction between doing this and modifying the interpreter. I'm not sure about this. Perhaps there are limits to the modifications you can make dynamically to any given interpreter?
These more open questions aside, I would simply like to know if there are any known interpreters that allow this at all? Or, perhaps, this ability is just a matter of extent and my question is badly posed.
There are certainly languages in which this kind of self-modifying behavior at the level of the language syntax itself is possible. Lisp programs can contain macros, which allow among other things the creation of new control constructs on the fly, to the extent that two Lisp programs that depend on extensive macro programming can look almost as if they are written in two different languages. Forth is somewhat similar in that a Forth interpreter provides a core set of just a dozen or so primitive operations on which a program must be built in the language of the problem domain (frequently some kind of real-world interaction that must be done precisely and programmatically, such as industrial robotics). A Forth programmer creates an interpreter that understands a language specific to the problem he or she is trying to solve, then writes higher-level programs in that language.
In general the common idea here is that of languages or systems that treat code and data as equivalent and give the user just as much power to modify one as the other. Every Lisp program is a Lisp data structure, for example. This is in contrast to a language such as Java, in which a sharp distinction is made between the program code and the data that it manipulates.
A related subject is that of self-modifying low-level code, which was a fairly common technique among assembly-language programmers in the days of minicomputers with complex instruction sets, and which spilled over somewhat into the early 8-bit and 16-bit microcomputer worlds. In this programming idiom, for purposes of speed or memory savings, a program would be written with the "awareness" of the location where its compiled or interpreted instructions would be stored in memory, and could alter in place the actual machine-level instructions byte by byte to affect its behavior on the fly.
Forth is the most obvious thing I can think of. It's concatenative and stack based, with the fundamental atom being a word. So you write a stream of words and they are performed in the order in which they're written with the stack being manipulated explicitly to effect parameter passing, results, etc. So a simple Forth program might look like:
6 3 + .
Which is the words 6, 3, + and .. The two numbers push their values onto the stack. The plus symbol pops the last two items from the stack, adds them and pushes the result. The full stop outputs whatever is at the top of the stack.
A fundamental part of Forth is that you define your own words. Since all words are first-class members of the runtime, in effect you build an application-specific grammar. Having defined the relevant words you might end up with code like:
red circle draw
That wold draw a red circle.
Forth interprets each sequence of words when it encounters them. However it distinguishes between compile-time and ordinary words. Compile-time words do things like have a sequence of words compiled and stored as a new word. So that's the equivalent of defining subroutines in a classic procedural language. They're also the means by which control structures are implemented. But you can also define your own compile-time words.
As a net result a Forth program usually defines its entire grammar, including relevant control words.
You can read a basic introduction here.
Prolog is an homoiconic language, allowing meta interpreters (MIs) to be declined in a variety of ways. A meta interpreter - interpreting the interpreter - is a common and useful native construct in Prolog.
See this page for an introduction to this argument. An interesting and practical technique illustrated is partial execution:
The overhead incurred by implementing these things using MIs can be compiled away using partial evaluation techniques.

Why embed lua into a game engine?

I've been looking into building a basic game engine from the ground up and after making a list of features that are common to other engines, one of the bigger things is the fact that they have an embedded scripting language like lua or python.
My question is how is an embedded scripting language superior to just making a header file (or something of the like) that the user can include in a c++ file which gives them access to many of the functions and states. I'm sure there's a very good answer out there, I just haven't stumbled on it yet.
Also beyond why it's needed, what are languages like lua used for in things like game engines?
Lua is a far simpler language than C++, and all you need to edit it is a text editor. This puts the ability to script events and/or high level game logic in the hand of your designers and end users. Dynamic typing and garbage collecting allows them write very succinct code that focusses on game logic rather than all the systems-level housekeeping chores you get in a language like C++. It's also far easier to sandbox.
Lua is a popular choice because it's small, portable, hackable ANSI C code base; easy to embed, extend, and -- most importantly for game developers -- it has a minimal runtime footprint (one of the fastest interpreted languages). It's also a great combination of easy to learn/read/write syntax, but with powerful features like coroutines which can be very useful in games.
The reason for including a scripting language is to allow users to customize the behavior without having to recompile the code.
I'm not sure about what you are asking in the second part of the question. Are you asking what other languages are used, or are you asking what ways are languages like Lua used?
If you asked about what other languages are good for this, one such language is Tcl. Tcl was designed from the ground up to be an embedded scripting language, and is very mature and robust, and easily learned by non-technical people.
As for what scripting languages are good for ... configuration files is one way. By using a programming language rather than a text file with name/value pairs, it allows users to add logic to their start-up files. For example, maybe you allow users to assign different functions to keys on the keyboard; with a programming language they can add different functions for different computers. Or, if you're creating a game like a RPG, perhaps you can assign different keys for different character classes. If playing as a mage, F12 might be cast a spell, but if playing as a warrior f12 might be to do a finishing blow.
There are many ways to use scripting languages, and many different langages to choose from. It all boils down to allowing your users to customize the behavior of the game without having to recompile the code.
You might find this article by a Game developer useful in understanding why embedded languages are used.
http://www.grimrock.net/2012/07/25/making-of-grimrock-rapid-programming/
Another good reason, is unless you are sharing your source code with your game users and they are all C programmers, languages such as lua make it possible for users to extend the game, for example look at World of Warcraft.

What tool to use for finding duplicated Ada code due to copy&paste

I'm looking for a tool for finding duplicated code due to copy&paste programming to be run over a large Ada codebase. I suppose that Ada support in the tool is important for detecting more than the trivial text similarities, that is, ignore layout or identifier difference, etc.
The tools that I have found with Ada support are the following:
Clone Doctor, commercial product with support for several languages, including Ada. http://www.semdesigns.com/Products/Clone/index.html
ConQAT: commercially supported open source product that includes a CloneDetection tool with Ada support since September 2011 http://conqat.cs.tum.edu/index.php/CloneDetectionTutorial
Have you tried these tools? Am I missing any other one of interest? Is the language support really significant or a general text tool would be enough? What is your experience with code duplication detection?
Thanks in advance.
I'm the author of CloneDR. Read the following understanding my bias.
It is important to understand the differences in the detection methods of clone detection tools, and the quality of the results as a consequence.
ConQAT is a representative of what are called "token based" detectors. They match sequences of language tokens (operators, identifiers, brackets, keywords etc.) The good news is they are pretty fast (that isn't a big issue; you don't run clone detection every 30 seconds, once a week is enough). They will find some clones that are near-misses, in the sense that another identifier or constant is substituted for an identifier in a clone. The bad news is that they don't understand the structure of your code and consequently want to report things like
} void ID ( ID
as clones. This is defeated by making the detectors only hunt for very long sequences of tokens (typically 30 or more), which means token-based detectors cannot find small but interesting clones without also drowning you in false positives like the above.
CloneDR operates by parsing the code (even for Ada) just like a compiler, building abstract syntax trees, and matching the trees up to a point of difference. It cannot propose a clone that crosses structure boundaries in silly ways. It will find near misses of the same kind as the token based detectors, but it goes beyond this. CloneDR will find consistent substitutions ("anti unifiers") which means clones can be explained by a small number of parameters that have been used in many places in the clone, and it will find variations in the code in which the mismatches are larger than a single token, e.g., expressions, statements, declarations, even blocks. So it produces fewer false positives and better answers. Independent research reports that compare types of clone detectors, specifically including CloneDR, agree with this analysis.
There is more detailed discussion at the Clone Doctor link you listed above. You can see examples of detected clones for many languages (but we don't have an Ada report on the web site).
EDIT March 19, 2012:
Now you can download an eval copy of an Ada95 CloneDR.
Ira Baxter has a good description.
Token-based clone detection tools tend to be good enough for our purpose, which is usually to get a quick overview of how bad code duplication is in a body of source code we haven't seen before, and how duplication is distributed across that code.
In particular, we are happy with CCFinderX, because it has a nice visualization frontend.
However, it's buggy, unmaintained, and the code has been released but without any license statement.
It has language specific preprocessors for some languages, but we often just disable them (they are buggy as well).
If you need better accuracy, you know exactly the language you need to parse (e.g. with C or C++, this is not always the case), and you can find a tool that parses exactly that language (which is also an issue with C and C++), a parsing-based approach may be better, as Ira writes.

Creating a simple Domain Specific Language

I am curious to learn about creating a domain specific language. For now the domain is quite basic, just have some variables and run some loops, if statements.
Edit :The language will be Non-English based with a very simple syntax .
I am thinking of targeting the Java Virtual Machine, ie compile to Java byte code.
Currently I know how to write some simple grammars using ANTLR.
I know that ANTLR creates a lexer and parser but how do I go forward from here?
about semantic analysis: does it have to be manually written or are there some tools to create it?
how can the output from the lexer and parser be converted to Java byte code?
I know that there are libraries like ASM or BCEL but what is the exact procedure?
are there any frameworks for doing this? And if there is, what is the simplest one?
You should try Xtext, an Eclipse-based DSL toolkit. Version 2 is quite powerful and stable. From its home page you have plenty of resources to get you started, including some video tutorials. Because the Eclipse ecosystem runs around Java, it seems the best choice for you.
You can also try MPS, but this is a projectional editor, and beginners may find it more difficult. It is nevertheless not less powerful than Xtext.
If your goal is to learn as much as possible about compilers, then indeed you have to go the hard way - write an ad hoc parser (no antlr and alike), write your own semantic passes and your own code generation.
Otherwise, you'd better extend an existing extensible language with your DSL, reusing its parser, its semantics and its code generation functionality. For example, you can easily implement an almost arbitrary complex DSL on top of Clojure macros (and Clojure itself is then translated into JVM, you'll get it for free).
A DSL with simple syntax may or may not mean simple semantics.
Simple semantics may or may not mean easy translation to a target language; such translations are "technically easy" only if the DSL and the target languate share a lot of common data types and execution models. (Constraint systems have simple semantics, but translating them to Fortran is really hard!). (You gotta wonder: if translating your DSL is easy, why do you have it?)
If you want to build a DSL (in your case you stick with easy because you are learning), you want DSL compiler infrastructure that has whatever you need in it, including support for difficult translations. "What is needed" to handle translating all DSLs to all possible target languages is clearly an impossibly large set of machinery.
However, there is a lot which is clear that can be helpful:
Strong parsing machinery (who wants to diddle with grammars whose structure is forced
by the weakness of the parsing machinery? (If you don't know what this is, go read about LL(1) grammmars as an example).
Automatic construction of a representation (e.g, an abstract syntax tree) of the parsed DSL
Ability to access/modify/build new ASTs
Ability to capture information about symbols and their meaning (symbol tables)
Ability to build analyses of the AST for the DSL, to support translations that require
informatoin from "far away" in the tree, to influence the translation at a particular point in the tree
Ability to reogranize the AST easily to achieve local optimizations
Ability to consturct/analysis control and dataflow information if the DSL has some procedural aspects, and the code generation requires deep reasoning or optimization
Most of the tools available for "building DSL generators" provide some kind of parsing, perhaps tree building, and then leave you to fill in all the rest. This puts you in the position of having a small, clean DSL but taking forever to implement it. That's not good. You really want all that infrastructure.
Our DMS Software Reengineering Toolkit has all the infrastructure sketched above and more. (It clearly doesn't, and can't have the moon). You can see a complete, all-in-one-"page", simple DSL example that exercises some ineresting parts of this machinery.

How do you write your QTP Tests?

I am experimenting with using QTP for some webapp ui automation testing and I was wondering how people usually write their QTP tests. Do you use the object map, descriptive programming, a combination or some other way all together? Any little code example would be appreciated, Thank you
Here's my suggestion.
1) Build your test automation requirements matrix.
You can use samples from my blog
http://automation-beyond.com/2009/06/06/qa-test-automation-requirements-usability/
http://automation-beyond.com/2009/06/07/qa-test-automation-requirements-usability-2/
http://automation-beyond.com/2009/06/10/qa-test-automation-requirements-5-maintainability/
http://automation-beyond.com/2009/06/08/qa-test-automation-requirements-robustness/
http://automation-beyond.com/2009/06/09/qa-test-automation-requirements-scalability/
2) Choose your automation approach
3) Write your testing scripts according to the approach you chose
Note. QTP Repository way or Descriptive Programming belong to GUI recognition part of front-end functional test automation. They matter in terms of robustness and maintenance.
Technically, it's nearly the same. In both cases you should understand GUI recognition concept well, or you will have problems no matter the approach.
You can store GUI object recognition properties in XML-like data structure and map the record to an English-like name. Whenever the original object's properties change, you update your record in repository, while a code still refers to a mapped name.
Or you can address GUI objects by directly putting same recognition properties into a function call. Whenever the original object's properties change, you have to do code change. But you don't have to maintain extra files along with your scripts.
A good framework should support both GUI-mapped and descriptive programming notations by operating at object reference level. I.e. you should keep object recognition and object interaction tasks separate.
Note that depending on context Descriptive Programming notation may slowdown performance of your scripts and it always demands extra maintenance effort while in other cases using Object Repositories only may lead to unwanted duplication of objects' descriptions or it may limit recognition of dynamically changing GUI.
I illustrate some points made above in the following article:
A little QTP performance test: Object Repository vs. Descriptive Programming
Straight code examples (for a practical automation I recommend GUI Function Wrapping).
Descriptive programming - addressing objects by physical description properties.
Dim sProfile
sProfile = "Guest"
Set objWebParent = Browser("title:=Select Profile").Page("title:=Select Profile")
Set objWebObject = objWebParent.Link("text:="&sProfile)
boolRC = objWebObject.Exist(0)
If Not boolRC Then
'error-handling
End If
objWebObject.Click
Addressing objects by mapped GUI names
Browser("Select Profile").Page("Select Profile").Link("Guest").Click
Thank you,
Albert Gareev
http://automation-beyond.com/
I know I am late here, and you must already have what you are looking for, but I wanted to provide my inputs as well for anyone visiting this topic.
I generally never use OR, unless I encounter an environment where Descriptive Programming is a no-go. Just recently, I worked with a Mainframe Front-End GUI application that has absolutely no naming convention for objects. If you choose to use Descriptive Programming with such an application, the only way to work with its objects would be through Index or Location Ordinal Identifiers, which is not the best course of action considering 100's of objects in each pane.
So, the answer to your question really depending upon the environment and your experience with OR and DP. Most people I have worked with at my job, and on online communities prefer to work with Descriptive Programming whenever its feasible. However, I have also seen people work wonders with OR.
I have a few code samples, but, unfortunately, they are deal with Descriptive Programming. For instance, the following article talks about creating modular VBScript classes to divide application's functionality into small manageable components:
http://relevantcodes.com/qtp-using-classes-as-test-modules-i/
Similarly, this article shows how Descriptive Programming can be used to verify multiple properties of target objects through a single block of code:
http://relevantcodes.com/qtp-verify-multiple-object-properties-an-elegant-approach/
Also, a demo framework is also available for you to view here:
http://relevantcodes.com/relevantcodes1one-qtp-automation-framework/
The framework is built completely on the principles of Descriptive Programming, but in the next release, some functionality will be added that will enable users to work with ORs as well.
Thank you,
Anshoo Arora
(Thanks for linking to the original articles, Motti)