Related
Can someone direct me to online resources for designing and implementing abstract semantic graphs (ASG)? I want to create an ASG editor for my language. Being able to edit the ASG directly has a number of advantages:
Only identifiers and literals need to be typed in and identifiers are written only once, when they're defined. Everything else is selected via the mouse.
Since the editor knows the language's grammar, there are no more syntax errors. The editor prevents them from being created in the first place.
Since the editor knows the language's semantics, there are no more semantic errors.
There are some secondary advantages:
Since all the reserved words are easily separable, a program can be written in one locale and viewed in other. On-the-fly changes of locale are possible.
All the text literals are easily separable, so changes of locale are easily made, including on-the-fly changes.
I'm not aware of a book on the matter, but you'll find the topic discussed in portions of various books on computer language. You'll also find discussions of this surrounding various projects which implement what you describe. For instance, you'll find quite a bit of discussion regarding the design of Scratch. Most workflow engines are also based on scripting in semantic graphs.
Allow me to opine... We've had the technology to manipulate language structurally for basically as long as we've had programming languages. I believe that the reason we still use textual language is a combination of the fact that it is more natural for us as humans, who communicate in natural language, to wield, and the fact that it is sometimes difficult to compose and refactor code when proper structure has to be maintained. If you're not sure what I mean, try building complex expressions in Scratch. Text is easier and a decent IDE gives virtually as much verification of correct structure.*
*I don't mean to take anything away from Scratch, it's a thing of beauty and is perfect for its intended purpose.
I'm looking for a tool for finding duplicated code due to copy&paste programming to be run over a large Ada codebase. I suppose that Ada support in the tool is important for detecting more than the trivial text similarities, that is, ignore layout or identifier difference, etc.
The tools that I have found with Ada support are the following:
Clone Doctor, commercial product with support for several languages, including Ada. http://www.semdesigns.com/Products/Clone/index.html
ConQAT: commercially supported open source product that includes a CloneDetection tool with Ada support since September 2011 http://conqat.cs.tum.edu/index.php/CloneDetectionTutorial
Have you tried these tools? Am I missing any other one of interest? Is the language support really significant or a general text tool would be enough? What is your experience with code duplication detection?
Thanks in advance.
I'm the author of CloneDR. Read the following understanding my bias.
It is important to understand the differences in the detection methods of clone detection tools, and the quality of the results as a consequence.
ConQAT is a representative of what are called "token based" detectors. They match sequences of language tokens (operators, identifiers, brackets, keywords etc.) The good news is they are pretty fast (that isn't a big issue; you don't run clone detection every 30 seconds, once a week is enough). They will find some clones that are near-misses, in the sense that another identifier or constant is substituted for an identifier in a clone. The bad news is that they don't understand the structure of your code and consequently want to report things like
} void ID ( ID
as clones. This is defeated by making the detectors only hunt for very long sequences of tokens (typically 30 or more), which means token-based detectors cannot find small but interesting clones without also drowning you in false positives like the above.
CloneDR operates by parsing the code (even for Ada) just like a compiler, building abstract syntax trees, and matching the trees up to a point of difference. It cannot propose a clone that crosses structure boundaries in silly ways. It will find near misses of the same kind as the token based detectors, but it goes beyond this. CloneDR will find consistent substitutions ("anti unifiers") which means clones can be explained by a small number of parameters that have been used in many places in the clone, and it will find variations in the code in which the mismatches are larger than a single token, e.g., expressions, statements, declarations, even blocks. So it produces fewer false positives and better answers. Independent research reports that compare types of clone detectors, specifically including CloneDR, agree with this analysis.
There is more detailed discussion at the Clone Doctor link you listed above. You can see examples of detected clones for many languages (but we don't have an Ada report on the web site).
EDIT March 19, 2012:
Now you can download an eval copy of an Ada95 CloneDR.
Ira Baxter has a good description.
Token-based clone detection tools tend to be good enough for our purpose, which is usually to get a quick overview of how bad code duplication is in a body of source code we haven't seen before, and how duplication is distributed across that code.
In particular, we are happy with CCFinderX, because it has a nice visualization frontend.
However, it's buggy, unmaintained, and the code has been released but without any license statement.
It has language specific preprocessors for some languages, but we often just disable them (they are buggy as well).
If you need better accuracy, you know exactly the language you need to parse (e.g. with C or C++, this is not always the case), and you can find a tool that parses exactly that language (which is also an issue with C and C++), a parsing-based approach may be better, as Ira writes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I do know some advantages to classes such as variable and function scopes, but other than that is just seems easier to me to have groups of functions rather than to have many instances and abstractions of classes. So why is the "norm" to group similar functions in a class?
Simple, non-OOP programs may be one
long list of commands. More complex
programs will group lists of commands
into functions or subroutines each of
which might perform a particular task.
With designs of this sort, it is
common for the program's data to be
accessible from any part of the
program. As programs grow in size,
allowing any function to modify any
piece of data means that bugs can have
wide-reaching effects.
In contrast, the object-oriented
approach encourages the programmer to
place data where it is not directly
accessible by the rest of the program.
Instead the data is accessed by
calling specially written functions,
commonly called methods, which are
either bundled in with the data or
inherited from "class objects" and act
as the intermediaries for retrieving
or modifying those data. The
programming construct that combines
data with a set of methods for
accessing and managing those data is
called an object.
Advantages of OOP programming:
MAINTAINABILITY Object-oriented programming methods make code more maintainable. Identifying the source of errors is easier because objects are self-contained.
REUSABILITY Because objects contain both data and methods that act on data, objects can be thought of as self-contained black boxes. This feature makes it easy to reuse code in new systems.Messages provide a predefined interface to an object's data and functionality. With this interface, an object can be used in any context.
SCALABILITY Object-oriented programs are also scalable. As an object's interface provides a road map for reusing the object in new software, and provides all the information needed to replace the object without affecting other code. This way aging code can be replaced with faster algorithms and newer technology.
The point of OOP is not to 'group similar functions in a class'. If this is all you're doing then you're not doing OOP (despite using an OO language). Having classes instead of just a bunch of functions has a side effect of 'variable and function scopes' that you mention, but I see it just as a side effect.
OOP is about such concepts as encapsulation, inheritance, polymorphism, abstraction and many others. It is a specific way of software design, a specific way of mapping a problem to a software solution.
Grouping functions in a class is by no means the norm. Let me share some of the things I have learned through experimentation with different languages and paradigms.
I think the core concept here is that of a namespace. Namespaces are so useful that they are present in almost on any programming language.
Namespaces can help you overcome some common problems and model various patterns that appear in many domains, e.g., avoiding name collisions, hiding details, representing hierarchies, define access control, grouping related symbols (functions or data), define a context or scope ... and I'm sure there are more applications.
Classes are a type of namespace, and the specific properties of classes vary from language to language and sometimes from version to version of the same language, e.g., some provide access modifiers, some do not; some allow inheritance from multiple classes, others do not. People have been trying to find the magic mix of features that will be the most useful and that in part explains the plethora of available options in different programming languages.
So, why use classes, because they help solve certain kinds of problems in a way that seems natural or maybe intuitive. Every time we write a computer program we're trying to capture the essence of the problem and if the problem can be modeled by using some of the patterns mentioned above then it makes perfect sense to use those features of a language that help you do that.
As the problem becomes better understood you might realize that certain parts of the program could be better implemented by using a different paradigm/feature/pattern and then it's time for refactoring. Most programs I have had the chance to work on keep evolving until either the money/resources run out or when we arrive at the point of diminishing returns, many times you have something that's good enough for now and there's little incentive to keep working on it.
It's not the norm, it's just one way of doing it. Classes group methods (functions) AND data together, based on the concept of encapsulation.
For lager projects it often becomes easier to group things this way. Many people find it easier to conceptualizes the problem with objects.
There are many reasons to use classes, not the least of which is encapsulation of logic. Objects more closely match the world we live in, and are thus often more intuitive than other methodologies. Consider a car, a car has properties like body color, interior color, engine horsepower, features, current mileage, etc.. It also has methods, like Start (), TurnRight(.30), ApplyBrakes(.50). It has events like the ding when you open your car door with the keys in the ignition.
Probably the biggest reason is that most applications seem to have a graphical component these days and most of the libraries for graphical user interface are implemented with object models.
Polymorphism is probably a big reason, too. The ability to treat multiple types of objects generically is quite helpful.
If you are a mathematician, a functional style may be more intuitive, ML, F#. If you’re interacting with data in a predictable format, a declarative style would be better like SQL or LINQ.
In simple words, it seems to me that (apart from everything everyone is saying) that classes are best suited for large projects, especially those implemented by more than one programmer to facilitate keeping things tidy; as using functions in such situations can become rather cumbersome.
Otherwise, for simple programs/projects that you would implement yourself to do one thing or another, then functions would do nicely.
I am currently trying to draw a set of UML diagrams to represent products, offers, orders, deliveries and payments. These diagrams have probably been invented by a million developers before me.
Are there any efforts to standardize the modeling of such common things? Or even the modeling of specific domains (for example car-manufacturing).
Do you know if there is some sort of repository containing UML diagrams (class diagrams, sequence diagrams, state diagrams...)?
There is a movement for documenting (as opposed to standardizing) models for certain domains. These are called analysis patterns and is a term Martin Fowler came up with. He actually wrote a book called Analysis patterns. Also, he has a dedicated section on his website where he presents some of these patterns accompanied by UML diagrams.
Maybe you'll find some inspiration that will help you in modeling your domain. I've stressed the word inspiration as I think different businesses have different requirements although they operate the same domain so the solutions you might read about may not be appropriate for your problem.
There are many tools out there that do both - but they're generally not free!
Microsoft Visio does both and is extensible. For UML artefacts they come with auto generators into VB/Java template code - but you can modify them to auto-generate any code. There are many users of Visio that have created models from which to use as templates.
Artisan Enterprize is by far the most powerful UML tool (but it's not cheap).
Some would argue that Rational Rose or RUP is the better tool
But for Car-Manufacturing and other similar real world modelling, by far the best tool is Mathworks Simulink (not because it's one of the most expensive). It is by far the best tool beccause you can animate the model - you can prove the model working before generating the slik code (in whatever grammar/language/other Models you care to push it)!
You can obtain a student license for around £180; with the 'real thing' pushing £4000 (for car-related artefacts). The full product with all the trimmings is about £15k. Simulink is also extensible with a C like language though there is a .Net addin and APIs to use a plethora of other langhuages. And, just like Visio there is a world-wide forum creating saleable, shareware & freeware real world model templates. Many world-wide Auto-Manufacturers are already using Simulink.
I think that MiniQuark question is really good and will sooner or later be provided by vendors such as Omondo, Rational IBM etc... Users doesn't just need tools, they need models out of the box and just add their business rules inside an existing well defined architecture. Why to develop from scratch a new architecture if the job has already be done ? In Java we use plenty of frameworks, existing methods etc...so why not to go one level higher and reuse architecture ? It is today impossible to guess how a project will evole and new demands are coming every day. We therefore need a stable architecture which has been tested previously and is extensible. I have seen so many projects starting with a nice architecture then realizing in the middle of the project that this is not what is the best and then changing their architecture. Renaming classes, splitting classes, creating packages etc...after the first iteration it is getting a real mess. Could you imagine what we found after 10 iterations !! a total mess !!
This mess would had been avoided if using a predefined model which has been tested previously because the missing class, or package etc..would have already been created and only a class rename would be sufficient for architecture purposes. Adding business rules methods will end the codding stage before deployment test.
I think there is a confusion between patterns and the initial question which is related to UML model re usability.
There is no today any reusable model out of the box which has been developped. This is really strange but the job has never been done or never been shared.
Omondo has tried to launch an initiative without real success. I have heard that they are working on hundred of out of box models which will be open source and given for free to the community. I hope this will be done because this is really important for me and would save me a lot of time at the beginning of a project.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
What are the differences between these programming paradigms, and are they better suited to particular problems or do any use-cases favour one over the others?
Architecture examples appreciated!
All of them are good in their own ways - They're simply different approaches to the same problems.
In a purely procedural style, data tends to be highly decoupled from the functions that operate on it.
In an object oriented style, data tends to carry with it a collection of functions.
In a functional style, data and functions tend toward having more in common with each other (as in Lisp and Scheme) while offering more flexibility in terms of how functions are actually used. Algorithms tend also to be defined in terms of recursion and composition rather than loops and iteration.
Of course, the language itself only influences which style is preferred. Even in a pure-functional language like Haskell, you can write in a procedural style (though that is highly discouraged), and even in a procedural language like C, you can program in an object-oriented style (such as in the GTK+ and EFL APIs).
To be clear, the "advantage" of each paradigm is simply in the modeling of your algorithms and data structures. If, for example, your algorithm involves lists and trees, a functional algorithm may be the most sensible. Or, if, for example, your data is highly structured, it may make more sense to compose it as objects if that is the native paradigm of your language - or, it could just as easily be written as a functional abstraction of monads, which is the native paradigm of languages like Haskell or ML.
The choice of which you use is simply what makes more sense for your project and the abstractions your language supports.
I think the available libraries, tools, examples, and communities completely trumps the paradigm these days. For example, ML (or whatever) might be the ultimate all-purpose programming language but if you can't get any good libraries for what you are doing you're screwed.
For example, if you're making a video game, there are more good code examples and SDKs in C++, so you're probably better off with that. For a small web application, there are some great Python, PHP, and Ruby frameworks that'll get you off and running very quickly. Java is a great choice for larger projects because of the compile-time checking and enterprise libraries and platforms.
It used to be the case that the standard libraries for different languages were pretty small and easily replicated - C, C++, Assembler, ML, LISP, etc.. came with the basics, but tended to chicken out when it came to standardizing on things like network communications, encryption, graphics, data file formats (including XML), even basic data structures like balanced trees and hashtables were left out!
Modern languages like Python, PHP, Ruby, and Java now come with a far more decent standard library and have many good third party libraries you can easily use, thanks in great part to their adoption of namespaces to keep libraries from colliding with one another, and garbage collection to standardize the memory management schemes of the libraries.
These paradigms don't have to be mutually exclusive. If you look at python, it supports functions and classes, but at the same time, everything is an object, including functions. You can mix and match functional/oop/procedural style all in one piece of code.
What I mean is, in functional languages (at least in Haskell, the only one I studied) there are no statements! functions are only allowed one expression inside them!! BUT, functions are first-class citizens, you can pass them around as parameters, along with a bunch of other abilities. They can do powerful things with few lines of code.
While in a procedural language like C, the only way you can pass functions around is by using function pointers, and that alone doesn't enable many powerful tasks.
In python, a function is a first-class citizen, but it can contain arbitrary number of statements. So you can have a function that contains procedural code, but you can pass it around just like functional languages.
Same goes for OOP. A language like Java doesn't allow you to write procedures/functions outside of a class. The only way to pass a function around is to wrap it in an object that implements that function, and then pass that object around.
In Python, you don't have this restriction.
For GUI I'd say that the Object-Oriented Paradigma is very well suited. The Window is an Object, the Textboxes are Objects, and the Okay-Button is one too. On the other Hand stuff like String Processing can be done with much less overhead and therefore more straightforward with simple procedural paradigma.
I don't think it is a question of the language neither. You can write functional, procedural or object-oriented in almost any popular language, although it might be some additional effort in some.
In order to answer your question, we need two elements:
Understanding of the characteristics of different architecture styles/patterns.
Understanding of the characteristics of different programming paradigms.
A list of software architecture styles/pattern is shown on the software architecture article on Wikipeida. And you can research on them easily on the web.
In short and general, Procedural is good for a model that follows a procedure, OOP is good for design, and Functional is good for high level programming.
I think you should try reading the history on each paradigm and see why people create it and you can understand them easily.
After understanding them both, you can link the items of architecture styles/patterns to programming paradigms.
I think that they are often not "versus", but you can combine them. I also think that oftentimes, the words you mention are just buzzwords. There are few people who actually know what "object-oriented" means, even if they are the fiercest evangelists of it.
One of my friends is writing a graphics app using NVIDIA CUDA. Application fits in very nicely with OOP paradigm and the problem can be decomposed into modules neatly. However, to use CUDA you need to use C, which doesn't support inheritance. Therefore, you need to be clever.
a) You devise a clever system which will emulate inheritance to a certain extent. It can be done!
i) You can use a hook system, which expects every child C of parent P to have a certain override for function F. You can make children register their overrides, which will be stored and called when required.
ii) You can use struct memory alignment feature to cast children into parents.
This can be neat but it's not easy to come up with future-proof, reliable solution. You will spend lots of time designing the system and there is no guarantee that you won't run into problems half-way through the project. Implementing multiple inheritance is even harder, if not almost impossible.
b) You can use consistent naming policy and use divide and conquer approach to create a program. It won't have any inheritance but because your functions are small, easy-to-understand and consistently formatted you don't need it. The amount of code you need to write goes up, it's very hard to stay focused and not succumb to easy solutions (hacks). However, this ninja way of coding is the C way of coding. Staying in balance between low-level freedom and writing good code. Good way to achieve this is to write prototypes using a functional language. For example, Haskell is extremely good for prototyping algorithms.
I tend towards approach b. I wrote a possible solution using approach a, and I will be honest, it felt very unnatural using that code.