need some help!
I'm trying to write some code in objective-c that requires part-of-speech tagging, and ideally also named entity recognition. I don't have much interest in "rolling my own", so I'm looking for a decent library to use for this purpose. Obviously the more accurate the better, but we're not talking anything critical here -- so as long as it's generally pretty accurate that's good enough.
It's going to be English-only, at least for the time being, but I don't want to have to do any training of models myself. So whatever the solution, it has to have an English language model already built.
And finally, it has to be available via a commercial-friendly license (e.g. BSD/Berkeley, LGPL). Can't do GPL or anything restrictive like that, though I'm open to paying a small amount for a commercial license if that's the only option.
C, C++ or Obj-C code is all fine.
So: Anyone familiar with something that'd do the trick here? Thanks!!
I suggest you check out the iOS 5 beta release notes.
As you've probably figured out most of the NLP code that's freely available is in python, perl or java. However, a quick look at Stanford's NLP tools page shows a few things in C/C++ that are available. Another list of tools can be found at a blog post.
Of the POS taggers, YamCha is well-known, though I have not used it myself (being a java/python/perl guy).
Unfortunately, I cannot suggest any NER nlp tools. However, I bet there's a maxent or svm implentation in C/C++ that you can work with:
1) create your training data and annotate it
2) define your features
3) use the ml library
Sorry I can't be of more help, but if anything else comes to mind I'll add it.
Maybe once I figure out objective-c to a respectable degree I'll write an NLP library for it!
This answer shows a pretty example of using a parser generator to look through text for some patterns of interest. In that example, it's product prices.
Does anyone know of tools to generate the grammars given training examples (document + info I want from it)? I found a couple papers, but no tools. I looked through ANTLR docs a bit, but it deals with grammars; a "recognizer" takes as input a grammar, not training examples.
This is a machine learning problem. You can at best get an approximation. But I don't think anybody has done this well, let alone released a tool. (I actively track what people do to build grammars for computer languages, and this idea has been proposed many times, but I have yet to see a useful implementation).
The problem is that for any fixed set of examples, there's a huge number of possible grammars. It is easy to construct a naive one: for the fixed set of examples, simply propose a grammar that has one rule to recognize each example. That works, but is hardly helpful. Now the question is, how many ways can you generalize this, and which one is the best? In fact you can't know, because your next new example may be a total surprise in terms of structure. (Theory definition: A language is the set of sentences that comprise it).
We haven't even talked about the simpler problem of learning the lexemes of the language. How would you propose to learn what legal strings for floating point numbers are?
One tool that does this is NLTK. I Highly recommend it, and the O'Reilly book that covers it is available free online. There are tools for parsing, learning grammars, etc... The only downside is that it is mainly a research rather than production tool, so the emphasis isn't on performance.
NLTK is able to construct grammar from labeled training samples, which is exactly what you are asking. Have a look at the great docs and the book. (My last experience with it also had it working on the JVM through Jython without any issues.)
I have a game I wrote in Actionscript 3 I'm looking to port to iOS. The game has about 9k LOC spread across 150 classes, most of the classes are for data models, state handling and level generation all of which should be easy to port.
However, the thought of rejiggering the syntax by hand across all these files is none too appealing. Are there tools that can help me speed up this process?
I'm not looking for a magical tool here, nor am I looking for a cross compiler, I just want some help converting my source files.
I don't know of a tool, but this is the way I'd try and attack your problem if there really is a lot of (simple) code to convert. I'm sure my suggestion is not that useful on parts of the code that are very flash-specific (all the DisplayObject stuff?) and also not that useful on lots of your logic. But it would be fun to build! :-)
Partial automatic conversion should be possible, especially if the objects are just 'data containers', watch out for bringing too much as3-idiom over to objective-c though, it might not always be a good fit.
Unless you want to create your own (semi) parser for as3 you'd need some sort of a parser, apparently FlexPMD has one (never used it), and there probably are others.
After getting your hands on a parser you have to find some way of suggesting to the system what parts could be converted automatically. You could try and add rules to the parser/generator script for the general case. For more specific cases I'd use custom metadata on the actual class/property/method, assuming a real as3 parser would correctly parse those.
Now part of your work will shift from hand-converting files to hand-annotating files, but that might be ok for you.
Have the parser parse your classes and define actions based on your metadata that will determine what kind of objective-c class to generate. If you get this working it could at least get you all your classes, their simple properties and method signatures (getting the body of the methods converted might be a bit too much to ask but you could include it as a comment so you'd have a nice reference while hand-translating).
PS: if you make this into a one way process be very sure you don't need to re-generate it later - it would be bad if you find out that you have been modifying the generated code and somehow need to re-generate all those classes -- that would mean you'll have to redo all your hard work!
I've started putting a tool together to take the edge off the menial aspects of this process.
I'm trying to figure out if there's enough interest to make it clean and stable enough to release for others to use. I may just do it anyway.
http://meanwhileatthelab.blogspot.com.au/2012/08/automating-process-of-converting-as3-to.html
It's so far saving me a lot of time while porting one of my fairly large games from AS3 to objc.
Check out the Sparrow Framework. It's purported to be designed with Actionscript developers in mind, recreating classes that sort of emulate display list and things like that. You'll have to dive into some "rejiggering" for sure no matter what you do if you don't want to use the CS5 packager.
http://www.sparrow-framework.org/
even if some solution exists, note that architectural logic is DIFFERENT, and many more other details.
Anyway even if posible, You will have a strange hybrid.
I am coming back from WWDC2012, and the message is (as always..) performance anf great user experience.
So You should rewrite using a different programming model.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have always liked the documentation on Java APIs, generally speaking, but I know some people consider them lacking. So I'm wondering, what do you consider a good example of API documentation?
Please, include a link or an actual example in any answer. I want to have references that I (and others, of course) can use to improve our own documents.
A good documentation MUST have:
datatypes specs - often more essential than actual functions. Do NOT treat this lightly.
function specs (this is obvious). Including What given function does, why it does it (if not obvious), and caveats if any.
an introduction document that binds the whole into a logical entity, explaining the intentions, correct usage patterns and ideas beyond the scope of actual API code. Normally you are given 50 different functions and you don't know which must be used, which shouldn't be used outside of specific cases, which are recommended to more obscure alternatives and why must they be used that way.
examples. Sometimes they are more important than all the rest
I know how to draw an arbitrary shape of arbitrary color in GTK+. I still have no clue why a change of drawing color requires three quite long lines of very obscure, quite unintuitive lines of code. Remembering SVGAlib's setcolorRGB(r,g,b); draw(x1,y1,x2,y2); I find it really hard to comprehend what possessed the authors of GTK+ to complicate things so much. Maybe if they explained the underlying concepts instead of just documenting functions that use them, I'd understand...
Another example: yesterday I got an answer that allowed me to understand SQLite. I understood a function extracting data from a column returns signed long long. I understood the integer columns could be 1,2,4,6 and 8 bytes long. I understood I can define a column as "UNSIGNED INT8", or "TINYINT". I didn't quite get what "affinity" meant, I just knew both had "INTEGER" affinity. I spent hours seeking whether timestamps should be UNSIGNED INTEGER or INT8, whether INT8 is 8-digits or 8-bytes, and what is the name of that esoteric 6-byte int?
What I missed was that "UNSIGNED INT8", "TINYINT" and the like are all a syntactic sugar synonyms for "INTEGER" type (which is always signed long long), and the lengths given are for internal disk storage only, are adjusted automatically and transparently to fit any value on least number of bits and are totally invisible and inaccessible from the API side.
Actually the iPhone (really Mac Cocoa/framework) documentation has gotten pretty good. The features I like are:
Very easy jump to docs from the API.
Well formatted and the code snippets
you would want to copy and paste
(like method signatures) stand out.
Links to projects with sample code
right from the docs.
Automated document refresh mechanism,
but by default docs are all local to
start (so you can live with a flaky
internet connection).
Easy way to switch between variants
of documentation (to see different
versions of the OS), and also select
which sets of documentation to run
searches against.
An overview section explains what the
class is for, followed by a section
breaking out methods grouped by
purpose (methods to create and
object, methods to query for data,
methods to work with type
conversions, etc), followed by the
detailed method explanations.
I also personally really liked Javadoc and the Java system documentation (I used that for many years), I found a benefit there was it was a little easier to make your own custom docs for your own classes that flowed well with the system docs. XCode lets you also use Doxygen to generate documentation for your own classes, but it would take a but more work to format it as well as the system class docs, in part because the system framework documents have more formatting applied.
A good API will have the following characteristics:
Easy to learn
Easy to use, even without documentation
Hard to misuse
Easy to read and maintain code that uses it
Sufficiently powerful to satisfy requirements
Easy to extend
Appropriate to audience
The most common mistake I see in API design is when developers feel auto-generated XML commenting is sufficient, and then precede to auto-generate their API based off of the XML comments. Here's what I'm talking about:
///<summary>
/// Performs ObscureFunction to ObscureClass using ObscureArgument
///</summary>
void ObscureClass.ObscureFunction(ObscureArgument) { ... }
API's like the one above are only counter-productive and frustrate the developer using the API. Good API documentation should give developers hints as to how to use API and give them insight into certain facets of the API they otherwise would not notice.
I personally believe a perfect example of good documentation is PHP's documentation:
For an example:
http://www.php.net/manual/en/function.fopen.php
I think effective documentation includes:
Parameter listing
(Useful) description of the parameter
If they parameters are a string, list
out and EXPLAIN every possible
possible parameter
Return values on both successful
execution and non-successful
execution
Any exceptions/errors it can raise
Examples (THE MOST IMPORTANT imo)
Optionally:
Changelog
Notes/Examples from other users
Whenever I look up something in the PHP documentation I almost know exactly how to use it without having to scour the internet to find "better" examples. Usually the only time which I need to search the internet is when I need to find how to use a set of functions for a specific purpose. Otherwise, I think the PHP documentation is the greatest example of excellent documentation.
What is think is an example of a alright documentation is Python's:
http://docs.python.org/py3k/library/array.html
It lists out the methods but it doesn't do a good job of actually explaining in depth what it is, and how to use it. Especially when you compare it to the PHP docs.
Here is some really bad documentation: Databinder Dispatch. Dispatch is a Scala library for HTTP that abstracts away the (Java) Apache Commons HTTP library.
It uses a lot of functional-syntax magic which not everyone is going to be very clear on, but provides no clear explanation of it, nor the design decisions behind it. The Scaladocs aren't useful because it isn't a traditional Java-style library. To really understand what is going on, you basically have to read the source code and you have to read a load of blog posts with examples.
The documentation succeeds in making me feel stupid and inferior and it certainly doesn't succeed in helping me do what I need to do. The flipside is most of the documentation I see in the Ruby community - both RDoc and in FAQs/websites/etc. Don't just do the Javadoc - you need to provide more comprehensive documentation.
Answer the question: "how do I do X with Y?" You may know the answer. I don't.
My main criteria is - tell me everything I need to know and everything I'll ever want to know.
QT has pretty decent docs:
http://doc.qt.digia.com/4.5/index.html
Win32 MSDN is also pretty good although it didn't age well.
The java docs are horrible to me. They constantly tell me everything I don't want to know and nothing of what I do want to know. The .NET docs has a similar tendency although the problem there is mostly the extreme wordyness, overflow of so much superfluous details and so much god damn pages. Why can't I see both the summary and the methods of a class in the same page?
I like Twitter's documentation. To me a good API is up to date, easy to read and contains examples.
I think that a good API document needs to clearly explain:
What problem this API solves
When you should use it
When you shouldn't use it
Actual code showing "best practice" usage of the API
Not quite API documentation but nevertheless quite useful is the Oracle database documentation, e.g. for the SELECT statement. I like the inclusion of diagrams which helps to clarify the usage for example.
Just a few thoughts...
Examples - win32 API documentation is better than iPhone's because of:
(short) code examples
I vote for any API doc with small and make-sense examples
Don't ever never show "Form1", "asdf", "testing users" in screen shots or sample codes
good API is solving real world problems and there should be some meaningful examples
Don't auto-gen doc
documentation should not be done during writing code (or by the same guy)
doc is for a stranger, whom the programmers usually don't care of
Avoid ___V2 version of API
but it's not a doc issue
Basically, tell the story of the class at the class level. Why is this here? What should it do? What should be in here? Who wrote it?
Tell the story of methods at the method level. What does this do? No matter how accurate your methods names are, 20-30 characters just won't always cut it for descriptiveness.
#author:
Who wrote this? Who's proud of it? Who should be ashamed of their work?
Interface level documentation tells me:
what should this do?
what will it return?
Implementation level documentation tells me:
how does it do it? what kind of algorithm? what sort of system load?
what conditions might cause a problem? will null input cause an issue? are negative numbers okay?
Class level documentation tells me:
what goes here? what kind of methods should I expect to find?
what does this class represent?
#Deprecated tells me:
why is this planned for removal?
when is it expected to be removed?
what is the suggested replacement?
If something is final:
why didn't you want me to extend this?
If something is static:
remind me in the class level doc, at least implicitly.
In general: you're writing these for the next developer to use if and when you hit the lottery. You don't want to feel guilty about quitting and buying a yacht, so pay a bit of attention to clarity, and don't assume you're writing for yourself.
As the side benefit, when someone asks you to work with the same code two years from now and you've forgotten all about it, you're going to benefit massively from good in-code documentation.
First point for a great API-documentation is a good naming of the API itself. The names of methods and parameters should be say all. If the language in question is statically typed, use enums instead of String- or int-constants as parameters, to select between a limited set of choices. Which options are possible can now be seen in the type of the parameter.
The 'soft-part' of documentation (text, not code) should cover border-cases (what happens if I give null as parameter) and the documentation of the class should contain a usage-example.
Good documentation should have at least the following:
When an argument has additional limitations beyond its type, they need to be fully specified.
Description of the [required] state of an object before calling the method.
Description of the state of an object after calling the method.
Full description of error information provided by the method (return values, possible exceptions). Simply naming them is unacceptable.
Good example: Throws ArgumentOutOfRangeException if index is less than 0 -or- index is greater than or equal to Count.
Bad example: Returns 0 for success or one of the following E_INVALIDARG, etc... (without specifying what makes an argument invalid). This is standard "FU developer" approach taken in the PS3 SDK.
In addition, the following are useful:
Description of the state of an object if an exception is thrown by the method.
Best practices regarding classes and groups of classes (say for exceptions in .NET) in the API.
Example usage.
Based on this:
An example of great documentation is the MSDN library.
To be fair, the online version of this does suffer from difficulty of navigation in cases.
An example of terrible documentation is the PS3 SDK. Learning an API requires extensive testing of method arguments for guessing what may or may not be the actual requirements and behavior of any given method.
IMO examples are the best documentation.
I really like the Qt4 Documentation, it first confronts you only with the essential information you need to get things working, and if you want to dig deeper, it reveals all the gory details in subsections.
What I really love, is the fact that they built the whole documentation into Qt Creator, which provides context sensitive help and short examples whenever you need them.
One thing I've always wanted to see in documentation: A "rationale" paragraph for each function or class. Why is this function there? What was it built for? What does it provide that cannot be achieved in any other way? If the answer is "nothing" (and surprisingly frequently it is), what is it a shorthand for, and why is that thing important enough to have its own function?
This paragraph should be easy to write - if it's not, it's probably a sign of a dubious interface.
I have recently come across this documentation (Lift JSON's library), which seems to be a good example of what many people have asked for: nice overview, good example, use cases, intent, etc.
i like my documentation to have a brief overview at the top, with fully featured examples below, and discussions under these! I'm surprised that few include simple function arguments with their required variable types and default values, especially in php!
I'm afraid i can't really give an example because i havent trawled through to find which ones my favourite, however i know this probably doesn't count because its unofficial but Kohana 3.0's Unofficial Wiki By Kerkness is just brilliant! and the Kohana 2.34 documentation is pretty well laid out too, well at least for me. What do you guys think?
Most people have listed the points making up good API documentation, so I am not going to repeat those (data type specs, examples, etc.). I'm just going to provide an example which I think illustrates how it should be done:
Unity Application Block (Go to the Download section for the CHM)
All the people involved in this project have done a great job of documenting it and how it should be used. Apart from the API reference and detailed method description, there are a lot of articles and samples which give you the big picture, the why and how. The projects with such good documentation are rare, at least the ones I use and know about.
The only criteria for documentation quality is that it speeds up development. If you need to know how something works, you go and read docs. One doc is better than another if you've understood everything from first doc faster than from from second.
Any other qualities are subjective. Styles, cross-references, descriptions… I know people who likes to read books. Book-styled doc (with contents/index/etc.) will be good for him. Another my friend likes to doc everything inside code. When he downloads new library, he gets sources and "reads" them instead of docs.
I, personally, like JavaDocs. Like Apple dev docs with the exception of lower-level parts, for example, Obj-C runtime (reference part) is described awfully. Several website APIs have docs I like also.
Don't like MSDN (it's good in general but there are too many variants of the same document, I get lost often).
Documentation is only a part of the big picture, API design. And one could argue the latter is much more important than just the naming. Think of meaningful non-duplicating method names, etc.
I would definitely recommend watching Josh Bloch's presentation about this:
http://www.infoq.com/presentations/effective-api-design OR http://www.youtube.com/watch?v=aAb7hSCtvGw
This covers not only what you're looking for but much more.
Lots of practical, real-world examples are a must. The recent rewrite of jQuery's API documentation is a good example, as well as Django's legendary docs.
The best documentation I've found is Python. You can use sphinx to generate the source documentation into HTML, LaTeX and others, and also generate docs from source files; the API doc you are looking for.
API docs is not only the quality of the final documentation, but also how easy is for the developers and/or technical writers to actually write it, so pick a tool that make the work easier.
Most things about good documentation have already been mentioned, but I think there is one aspect about the JavaDoc way of API documentation that is lacking: making it easy to distinguish between the usage scenarios of all the different classes and interfaces, especially distinguishing between classes that should be used by a library client and those that should not.
Often, JavaDoc is pretty much all you get and usually there is no package documentation page. One is then confronted with a list of hundreds or even more of classes: where and how to start? What are typical ways of using the library?
It would be good if there were conventions of how to make it easy to provide this information as part of JavaDoc. Then the generated API documentation could allow for different views for different groups of people -- at a minimum two groups: those who implement the library and those who use it.
I find Google APIs a beautiful example of Good documentation API.
They have:
Bird's eyes view of the entire APIs structure
Overviews of the main features of the single API
Nice and colored examples for a quick feedback
Detailed references
A blog that keep you updated
A google groups that documents problems and solutions
Videos
FAQ
Articles
Presentations
Code Playground
A search engine to crawl inside a pile of documentation
That's it!
When I play with google APIs documentation site, I feel at home.
Go to the Doxygen site and look at the examples of the HTML that it generates. Those are good:
http://www.doxygen.nl/results.html
I'm looking for a D template library to take an arbitrary variable and marshal it into a transportable bundle. The variable might be a basic value type (int, char[], real) or might be a struct or class and even might contain or be a reference type. A system that can do this without any per type help would be nice but I suspect that it's to much to ask so I'd be happy with something that uses light weight annotations.
If nothing like that exists suggestions on how to structure it would be nice. I can think of a few ways to do the sterilization but I'm not sure how to specify the annotations.
Background: After trying to use ASMX and WCF web services and not likening them I'm felling like I want to try my hand at the RPC problem.
edit: BTW I don't care to much what the format in the middle is (XML, JASON, YAML, binary) as long as it's portable.
Have a look at Google Protocol Buffers. Maybe you can use the C++ or C bindings directly, or write D bindings yourself.
Here's a basic one I wrote for D 1.x. It was written quite some time ago, so it might be possible to improve it, but it does work. The actual format is basically network byte-order binary, so it should be safe to store and transfer the bytes.
http://gist.github.com/100885
It doesn't support classes or arbitrary pointers. To do that properly, you'd want something which memorised what references it had already serialised. If you restrict yourself to value types, arrays and AAs, it'll do the job.
If you do want to extend it to support classes, my advice would be to require toStream and fromStream methods to be defined.
I recommend you write your own, as it's a useful exercise in templating and helps you adapt the serialization format to your specific requirements.
You might want to take a look at tools.serialize (http://dsource.org/projects/scrapple/browser/trunk/tools/tools/serialize.d) as a starting point.
[edit] SORRY! Didn't realize it was you! :D