I am wondering if there is a tool or technique which, given a BNF grammar, adjusts it randomly(but intelligently) and generates a stream of output for use in detecting cases that slip past the BNF (but shouldn't).
edit: Fuzz testing a parser, in other words.
Thanks
Spending some tender time with Google, I found that automated grammar-based fuzz testing is hard, and a subject of current research. In particular, P. Godefroid at Microsoft Research is working on a piece of software called SAGE.
I dug up a research paper by him.
Automated Whitebox Fuzz Testing (joint work with Michael Y. Levin and David Molnar) Proceedings of NDSS'2008 (Network and Distributed Systems Security), pages 151-166, San Diego, February 2008.
I also found the XML-based Peach software, but it is unclear to me on a casual reading how I might leverage it in an afternoon of work for a non-security application.
So my conclusion is: "It's a subject of current (Apr '10) research and there's no quick-use tool out there".
Not strictly a BNF fuzzing tool, but american fuzzy lop employs artificial intelligence methods and can walk around the lack of BNF knowledge quite well. It already found bugs in many open source parsers, so it might be the right tool for yours as well.
Related
This answer shows a pretty example of using a parser generator to look through text for some patterns of interest. In that example, it's product prices.
Does anyone know of tools to generate the grammars given training examples (document + info I want from it)? I found a couple papers, but no tools. I looked through ANTLR docs a bit, but it deals with grammars; a "recognizer" takes as input a grammar, not training examples.
This is a machine learning problem. You can at best get an approximation. But I don't think anybody has done this well, let alone released a tool. (I actively track what people do to build grammars for computer languages, and this idea has been proposed many times, but I have yet to see a useful implementation).
The problem is that for any fixed set of examples, there's a huge number of possible grammars. It is easy to construct a naive one: for the fixed set of examples, simply propose a grammar that has one rule to recognize each example. That works, but is hardly helpful. Now the question is, how many ways can you generalize this, and which one is the best? In fact you can't know, because your next new example may be a total surprise in terms of structure. (Theory definition: A language is the set of sentences that comprise it).
We haven't even talked about the simpler problem of learning the lexemes of the language. How would you propose to learn what legal strings for floating point numbers are?
One tool that does this is NLTK. I Highly recommend it, and the O'Reilly book that covers it is available free online. There are tools for parsing, learning grammars, etc... The only downside is that it is mainly a research rather than production tool, so the emphasis isn't on performance.
NLTK is able to construct grammar from labeled training samples, which is exactly what you are asking. Have a look at the great docs and the book. (My last experience with it also had it working on the JVM through Jython without any issues.)
I'm thoroughly intrigued by Scheme, and have started with some toy programming examples, and am reading through Paul Graham's On Lisp.
One thing I haven't been able to find is a book or website intended to teach Scheme to "OO people", i.e. people like myself who've done 99 % of their coding in c++/Java/Python.
I see that closures are sort of object-y, in the sense that they have local state, and offer one or more functions that have access to that state. But I don't want to learn Scheme only to port my existing habits on to it. This is why I'm learning Scheme rather than Common Lisp at the moment; I fear that CLOS might just serve as a crutch to my existing OO habits.
What would be ideal is a book or website that offers case studies of problems solved in both an OO language, and also in Scheme in a Schemey way. I suppose I would most appreciate scientific computing and/or computer graphics problems, but anything would do.
Any pedagogical leads would be much appreciated.
I doubt CLOS would serve as a crutch for old habits, I found it to be pretty different from the OO style in C++/Java/Python, and very interesting. I don't understand all the details, but I would recommend Peter Seibel's Practical Common Lisp. If you are reading On Lisp without much trouble, you should be able to dive into the chapters introducing CLOS in PCL. Also, I'd recommend his Google Tech Talk comparing Java and Common Lisp.
Here's a few more recommendations to make this a more full-fledged answer:
The classic text Structure and Interpretation of Computer Programs covers quite a few examples in chapter 3 of building modular systems using closures (and addresses issues with introducing state and mutability). Chapter 2 includes some generic and data/type-directed programming which could be helpful for motivating study of CLOS. This book really needs no introduction though, it's a towering work, and I've only been reading it slowly since the spring. Highly recommended if you are interested in Scheme.
While SICP is a great book, it's not without its flaws: A really interesting look at these is the essay "The Structure and Interpretation of the Computer Science Curriculum" which elaborates on a few criticism of SICP, and is written by the authors of How to Design Programs (I haven't read HTDP but I hear it's very good). While this essay won't teach you specifically what you are looking for - comparing functional and OO programming - it is really interesting anyway. Their freshman undergraduate course starts with a first semester introduction to functional programming using Scheme (I think, PLT/Racket) and is followed by a semester of OO programming with C++ or Java... at least that's the course they describe in the essay.
These slides from Peter Norvig address some of the design patterns common in OO programming and show why they are missing or unnecessary in dynamic, functional languages like Scheme and Lisp: http://norvig.com/design-patterns/
I cautiously recommend the book by the same authors as the Little Schemer books: A Little Java, A Few Patterns. I can't say for sure if this is a really a good book or not, it was incredibly strange and there are some really bad typesetting decisions (italic, serif, variable-width, superscript doesn't belong in a text on programming), but it might be interesting to take a look at. You can probably find it cheap, anyway. Don't take this recommendation that seriously. I think it would be better to stick to the Scheme texts.
p.s. I have to disagree with one comment stating that functional programming is not as complicated at OO programming, I think that's grossly misstating it. Functional programming in all its breadth is truly mind-boggling. When you go beyond map/filter/reduce and first-class functions, and take a look at other things in the functional realm like lazy evaluation, avoiding side effects and mutation, and the strong, static-typed languages, it gets pretty interesting, and is certainly just as complicated as traditional OO programming. I've only just scratched the surface myself but have discovered a great deal of new ideas. Programming is complicated business, whether OO or functional.
Congrat you, my friend ! Love cs, love functional programming.
If you are python developer it takes 3-4 days to think in scheme
Here is the best simple tutorial I have ever met http://www.shido.info/lisp/idx_scm_e.html
I found this course http://cs.gettysburg.edu/~tneller/cs341/scheme-intro/index.html and it may be useful for you
One beginner's resource that is very helpful and geared very much toward the casual reader is "The Adventures of a Pythonista in Schemeland". It's written (obviously) from the point of view of a Python programmer taking first steps with Scheme. One especially nice thing about it is that it includes an overview of the current implementations and compatibility issues between each scheme implementation, which, unfortunately, can cause some headaches when you're just starting out.
With regards to object systems, these two documents (linked from here) give nice examples of very simple toy implementations using closures that I found helpful in understanding their use in capturing state.
If you are starting off with Scheme, have a look at How to Design Programs. This book presents the "Schemey" approach to problem solving. I don't think there is a book that compares OO and functional solutions to the same programming problems. But there is a nice presentation that shows how dynamic languages like Scheme could provide simple solutions to problems that demand complex design patterns in statically typed OOP languages.
I want to know in which applications/programming domain are most suitable for Smalltalk. Could anyone please provide me some useful links that could answer my query?
Through googling I learned that some companies use it for:
logistics and foreign trade application
desktop, server and script development
data processing and logistics, scripts and presentations
but I cant find documents/research papers that can tell me which programming domain Smalltalk-80 (or Smalltalk) is best suited.
Some of the programming domains are:
- Artificial intelligence reasoning
- General purpose applications
- Financial time series analysis
- Natural language processing
- Relational database querying
- Application scripting
- Internet
- Symbolic mathematics
- Numerical mathematics
- Statistical applications
- Text processing
- Matrix algorithms
I hope you guys can help me. I am doing this for my case study. Thanks in advance.
It's a general purpose programming language. To paraphrase Kent Pitman on the question of what Common Lisp is useful for:
...Please don't assume [Smalltalk] is only
useful for Animation and Graphics, AI,
Bioinformatics, B2B and E-Commerce,
Data Mining, EDA/Semiconductor
applications, Expert Systems, Finance,
Intelligent Agents, Knowledge
Management, Mechanical CAD, Modeling
and Simulation, Natural Language,
Optimization, Research, Risk Analysis,
Scheduling, Telecom, and Web Authoring
just because these are the only things
they happened to list.
It's particularly suited for applications that cannot have downtime - it's quite normal to patch a running server in deep ways (say, by changing the shape of your class) without taking the server down - or systems that are very complex or have rapidly changing requirements.
Smalltalk has quite substantial growth recently in web based applications, thanks to innovations and fresh approaches in Aida/Web, Iliad and Seaside Smalltalk web frameworks.
In general Smalltalk is used for most complex information systems, let me mention just two:
Finance: Kapital, a risk management in JP Morgan
Manufacturing: ControlWorks, for chip manufacturing in AMD
My goal has been to do a brain dump into software. And I have found Smalltalk to be very well suited for that. Smalltalk makes it easy to put my ideas down in code. And it provides feedback to my thinking. The ability to debug infinitely deep at any point in the execution just enhances my understand of the problem to be solved. Then it allows me to carry out my solution most naturally.
Aik-Siong Koh
I'm afraid you will get as many answers as users of Smalltalk. For some it's a "way of life" for others it's a learning process and in the end they "strand" at granddaddy of the OO languages. Some are using their smalltalk as a kind of shell to "IT-problems".
For me the answer is for application development. Now this is definitive a wide field. As you figured out it is used quite "much" in the software for economic stuff. And that is where I'm using it. I've decided to use it for my Web-Development projects which are related to "business".
The domains you named are all suitable for Smalltalk. Smalltalk shows its strengths in development for systems that are engineering-time limited, instead of hardware-limited.
The Seaside web framework allows us to create complex web applications in a fraction of the time needed in other technologies. The Gemstone object-oriented database allows us to nearly ignore persistence issues.
Smalltalk is generally a very expressive, readable, and understandable language. Whenever a large codebase is to be maintained or code needs to be understandable to non-professionals, Smalltalk shines.
»Smalltalk is a vision of the computer as a medium of self expression. … A humanistic vision of the computer as something everyone could use and benefit from. If you are going to have a medium for self expression, programability is key because unless you can actually make the system behave as you want you are a slave to what’s on the machine. So it’s really vital, and so language comes to the for because it’s through language that you express yourself to the machine.« – Elliot Miranda
You can check this link: http://www.clubsmalltalk.org/web/index.php?option=com_content&view=article&id=183&Itemid=117 this is a compilation of uses of smalltalk in latam.
perhaps another way of answering the question would be by stating what it might not be suitable for. One domain would be where you have "real" real time constraints i.e. you would need to control the garbage collector from kicking off. If I recall IBM's (OTI) Smalltalk embedded had a mechanism for turning off the gc, but IBM dropped that a while ago. The other domain I have not seen much of is cell phone apps. As far as I know none of the viable Smalltalk's can run on Android but that may change. One hears of folks in Squeak/Pharo working on that. I would love to see ST running well on Android. I think that the Android tablet market will be a hot one.
I should conclude by saying that in all the years I have been coding in ST i.e. since 94, I have seen Smalltalk in just about everything else.
I cant find documents/research papers that can tell me which programming domain Smalltalk-80 (or Smalltalk) is best suited.
This is because Smalltalk is not a domain-specific language, but a general purpose language.
Things it has been used for in the past:
- as the operating system system language for personal computers
- writing rich multimedia and near real-time applications, such as sound synthesisers
- very large corporate and government data processing systems, such as the UK's Home Office Large Matter Enquiry System, or many of JPMorgan Chase's financial trading systems
- web applications, such as DabbleDB
- creating complicated development tools, such as IBM's VisualAge IDE
- experimenting and prototyping applications in early-stage development
Generally speaking Smalltalk shines where the systems are complex, development speed is a key factor, and maintainability is going to be a key factor.
I use Smalltalk to create applications to control, manage and distribute multi-platform JavaScript webapps.
I'm looking for code or a product or a service to do semantic analysis of text (sentences and or paragraphs) to categorize the text by general topic, e.g.
Finance
Entertainment
Technology
Business
Art
etc...
If you have a bunch of examples that have already been categorised, you can use these to train a classifier.
This is a very simple document classfication problem, and any suite of machine learning tools will have the algorithms and tutorials for this. For instance, check out weka: http://www.cs.waikato.ac.nz/ml/weka/
or rapidminer: http://rapid-i.com/content/blogcategory/38/69/
If your needs are limited, and you just want a simple API, you cannot go wrong with this Naive Bayes library: https://ci-bayes.dev.java.net/
Good luck!
If you want to evaluate a commercial service API, check out the VIKI engine APIs:
http://www.softwareevolution.it/en/products/viki-core-api.html
It is an easy to use Json service api with specific semantic features.
Would this be of any help to you?
http://en.wikipedia.org/wiki/Document_classification
It's not a finished product or service, neither code, but it describes the various algorithms that can be used for semantic analysis. Googling on a bit further, I believe that it's not really out of the laboratory yet. People are experimenting with KNN algorithms mostly, resulting in cool stuff, but not really what you need:
http://www.ebi.ac.uk/webservices/whatizit/info.jsf
But if there is some software that will do what you ask, it would be in this list:
http://www.kdnuggets.com/software/text.html
For example the LPU program, it seems to be able to learn if you feed it enough teaching documents.
http://www.cs.uic.edu/~liub/LPU/LPU-download.html
If you're into Python/interpreted languages, check out the excellent NLTK framework at nltk.org. It has an excellent how to page and a recently published O'Reilly book.
If you're into Java and/or require a more mature but harder to grasp framework, try GATE instead.
Does anyone know where to find good online resources with examples of how to make grammars and parse trees? Preferably introductory materials.
Info that is n00b friendly, haven't found anything good with Google myself.
Edit: I'm thinking about theory, not a specific parser software.
Not online, but maybe you should take a look at Compilers: Principles, Techniques, and Tools (2nd Edition) by Aho et al. This is a standard text that has been evolving for 30 years (if you count the 1st Dragon Book, published in 1977
Well, here's where I learned it...
http://www.cs.uiuc.edu/class/sp08/cs273/
Click on the lectures tag, scroll through till you find the lectures on the material you are talking about.
Love my alma mater. God bless them, they never take down their lectures in any class and you can go and read any of them anytime you want.
edit: Looks like you want lecture11
Antlr?
http://www.antlr.org/
Has a quite good IDE for designing a grammar, and a lot of generators for different languages.
www.goldparser.com
The tools are free and good to work on. It has technical and theoretical tutorials, lots of info, tools and code generators for many langs.
in C,C++ use lex and bison
in java use ANTLR
this is a beautiful antlr video tutorial