Open Source Authentication Classifier - authentication

I understand that authentication is essentially classifying hundreds or thousands of classes (individuals). Is there an open source classifier that can be used for the task?

I'm not sure about the authentication. But I'll answer anyway.
Weka is popular and I have used it before, however I've never used it for such a large number of classes to be classified, but that doesn't mean it can't be done!
Weka: http://www.cs.waikato.ac.nz/ml/weka/
It has a GUI and a really easy-to-use Java library, if you're hard-coding. I don't know what you're looking for as you haven't really said, but this is what I suggest!
Good luck!

Related

Part-Of-Speech tagging and Named Entity Recognition for C/C++/Obj-C

need some help!
I'm trying to write some code in objective-c that requires part-of-speech tagging, and ideally also named entity recognition. I don't have much interest in "rolling my own", so I'm looking for a decent library to use for this purpose. Obviously the more accurate the better, but we're not talking anything critical here -- so as long as it's generally pretty accurate that's good enough.
It's going to be English-only, at least for the time being, but I don't want to have to do any training of models myself. So whatever the solution, it has to have an English language model already built.
And finally, it has to be available via a commercial-friendly license (e.g. BSD/Berkeley, LGPL). Can't do GPL or anything restrictive like that, though I'm open to paying a small amount for a commercial license if that's the only option.
C, C++ or Obj-C code is all fine.
So: Anyone familiar with something that'd do the trick here? Thanks!!
I suggest you check out the iOS 5 beta release notes.
As you've probably figured out most of the NLP code that's freely available is in python, perl or java. However, a quick look at Stanford's NLP tools page shows a few things in C/C++ that are available. Another list of tools can be found at a blog post.
Of the POS taggers, YamCha is well-known, though I have not used it myself (being a java/python/perl guy).
Unfortunately, I cannot suggest any NER nlp tools. However, I bet there's a maxent or svm implentation in C/C++ that you can work with:
1) create your training data and annotate it
2) define your features
3) use the ml library
Sorry I can't be of more help, but if anything else comes to mind I'll add it.
Maybe once I figure out objective-c to a respectable degree I'll write an NLP library for it!

Experiences with using Alloy in real-world projects

I have been interested in formal methods for some time. I have used formal methods to reason about some very specific sub-areas of a few projects I have been working on. I was never able to convince other team members to try the same let alone specify an entire domain with a formal method.
One method I have found particularly interesting is Alloy. I think that it may "scale" better as foundation for an entire project because it is conceptually and notationally very close to actual programming languages. Furthermore, the tools are quite solid so that the benefits of model verification are readily available.
I'd be very much interested to hear about any real-world experiences you folks might have had with using Alloy in your projects. Do you feel that it has helped you in designing a better domain model? Did find errors in your domain model during verification? Would you use it again?
I've used Alloy on a few projects and have found it helpful; on some but not all of those projects I have been able to persuade others involved to use Alloy as well, or at least to work with the Alloy models I wrote. These projects may or may not be what you have in mind in asking for 'real-world' projects, but they certainly took place in the part of the real world I work in.
In 2006 and 2007 I created a partial Alloy model for the then-current draft of the W3C XProc specification; as far as I could tell, most members of the working group never read the paper I wrote (at http://www.w3.org/XML/XProc/2006/12/alloy-models/models.html); they said "Oh, we changed that part of the spec last week, so what the model says is no longer relevant". But the paper did manage to persuade the editor of the spec that the abstract 'component' level described in the first draft of the spec was woefully underspecified and needed to be either fully specified or dropped. He dropped it, with (I think) good results for the readability and usability of the spec.
In 2010 I made an Alloy model of the XPath 1.0 data model, which uncovered some glitches in the specification. The reaction of most interested parties (including the W3C working group responsible for maintaining the XPath 1.0 spec) has, unfortunately, not been encouraging.
A research project I'm involved with has used Alloy to model the MLCD Overlap Corpus, a collection of sample documents and related information we are creating (hyperlinks suppressed at SO's insistence); the Alloy model found a couple of errors in our initial design for the corpus catalog, so it was well worth the effort.
And we have also used Alloy to formalize some modeling work we have done on the nature of transcription and on the extension of the type/token distinction to document structure (for our paper, look for the 2010 proceedings of Balisage: The Markup Conference). This lies a little bit outside Alloy's usual area of application, as it has nothing to do with software design, but Alloy's ability to check models for consistency and generate instances has been invaluable in showing us some of the logical consequences of this or that possible axiom for our model.
To answer your specific questions: yes, Alloy has helped me specify cleaner domain models, and yes, it has found errors and glitches. They have often been small, for the reasons Daniel Jackson explains in his book Software Abstractions: first, if you use models during design, you catch errors early, when everything is still small. And, second (in Jackson's words), "In hindsight, most software design issues are trivial."
He continues: "But if you don't address them head-on, trivial issues have a nasty habit of becoming nontrivial." My experience amply confirms this. Much better to head off such problems early. So yes, I will use Alloy again.
Yes, I've used Alloy and it's cousins industrially. Alloy has been most helpful in convincing me that my models weren't wildly wrong---or rather, showing me where they were wrong and gave rise to silly results. Other more specific tools, like Song's Athena and Guttman and Ramsdell's CPSA have been more useful in their narrower domains. What more would you like to hear about?
Belatedly adding to this thread... Eunsuk Kang has recently applied Alloy to perform security analyses of web APIs for some start ups (following many applications of Alloy in security such as Apurva's analysis of OAuth and Barth et al's analysis of browser based security mechanisms for CSRF etc); Pamela Zave has been working on an impressive analysis of Chord, a peer to peer storage system, and has recently written up a fix to the original algorithm.

How do you write good highly useful general purpose libraries?

I asked this question about Microsoft .NET Libraries and the complexity of its source code. From what I'm reading, writing general purpose libraries and writing applications can be two different things. When writing libraries, you have to think about the client who could literally be everyone (supposing I release the library for use in the general public).
What kind of practices or theories or techniques are useful when learning to write libraries? Where do you learn to write code like the one in the .NET library? This looks like a "black art" which I don't know too much about.
That's a pretty subjective question, but here's on objective answer. The Framework Design Guidelines book (be sure to get the 2nd edition) is a very good book about how to write effective class libraries. The content is very good and the often dissenting annotations are thought-provoking. Every shop should have a copy of this book available.
You definitely need to watch Josh Bloch in his presentation How to Design a Good API & Why it Matters (1h 9m long). He is a Java guru but library design and object orientation are universal.
One piece of advice often ignored by library authors is to internalize costs. If something is hard to do, the library should do it. Too often I've seen the authors of a library push something hard onto the consumers of the API rather than solving it themselves. Instead, look for the hardest things and make sure the library does them or at least makes them very easy.
I will be paraphrasing from Effective C++ by Scott Meyers, which I have found to be the best advice I got:
Adhere to the principle of least astonishment: strive to provide classes whose operators and functions have a natural syntax and an intuitive semantics. Preserve consistency with the behavior of the built-in types: when in doubt, do as the ints do.
Recognize that anything somebody can do, they will do. They'll throw exceptions, they'll assign objects to themselves, they'll use objects before giving them values, they'll give objects values and never use them, they'll give them huge values, they'll give them tiny values, they'll give them null values. In general, if it will compile, somebody will do it. As a result, make your classes easy to use correctly and hard to use incorrectly. Accept that clients will make mistakes, and design your classes so you can prevent, detect, or correct such errors.
Strive for portable code. It's not much harder to write portable programs than to write unportable ones, and only rarely will the difference in performance be significant enough to justify unportable constructs.
Even programs designed for custom hardware often end up being ported, because stock hardware generally achieves an equivalent level of performance within a few years. Writing portable code allows you to switch platforms easily, to enlarge your client base, and to brag about supporting open systems. It also makes it easier to recover if you bet wrong in the operating system sweepstakes.
Design your code so that when changes are necessary, the impact is localized. Encapsulate as much as you can; make implementation details private.
Edit: I just noticed I very nearly duplicated what cherouvim had posted; sorry about that! But turns out we're linking to different speeches by Bloch, even if the subject is exactly the same. (cherouvim linked to a December 2005 talk, I to January 2007 one.) Well, I'll leave this answer here — you're probably best off by watching both and seeing how his message and way of presenting it has evolved :)
FWIW, I'd like to point to this Google Tech Talk by Joshua Bloch, who is a greatly respected guy in the Java world, and someone who has given speeches and written extensively on API design. (Oh, and designed some exceptionally good general purpose libraries, like the Java Collections Framework!)
Joshua Bloch, Google Tech Talks, January 24, 2007:
"How To Design A Good API and Why it
Matters" (the video is about 1 hour long)
You can also read many of the same ideas in his article Bumper-Sticker API Design (but I still recommend watching the presentation!)
(Seeing you come from the .NET side, I hope you don't let his Java background get in the way too much :-) This really is not Java-specific for the most part.)
Edit: Here's another 1½ minute bit of wisdom by Josh Bloch on why writing libraries is hard, and why it's still worth putting effort in it (economies of scale) — in a response to a question wondering, basically, "how hard can it be". (Part of a presentation about the Google Collections library, which is also totally worth watching, but more Java-centric.)
Krzysztof Cwalina's blog is a good starting place. His book, Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries, is probably the definitive work for .NET library design best practices.
http://blogs.msdn.com/kcwalina/
The number one rule is to treat API design just like UI design: gather information about how your users really use your UI/API, what they find helpful and what gets in their way. Use that information to improve the design. Start with users who can put up with API churn and gradually stabilize the API as it matures.
I wrote a few notes about what I've learned about API design here: http://www.natpryce.com/articles/000732.html
I'd start looking more into design patterns. You'll probably not going to find much use for some of them, but as you get deeper into your library design the patterns will become more applicable. I'd also pick up a copy of NDepend - a great code measuring utility which may help you decouple things better. You can use .NET libraries as an example, but, personally, i don't find them to be great design examples mostly due to their complexities. Also, start looking at some open source projects to see how they're layered and structured.
A couple of separate points:
The .NET Framework isn't a class library. It's a Framework. It's a set of types meant to not only provide functionality, but to be extended by your own code. For instance, it does provide you with the Stream abstract class, and with concrete implementations like the NetworkStream class, but it also provides you the WebRequest class and the means to extend it, so that WebRequest.Create("myschema://host/more") can produce an instance of your own class deriving from WebRequest, which can have its own GetResponse method returning its own class derived from WebResponse, such that calling GetResponseStream will return your own class derived from Stream!
And your callers will not need to know this is going on behind the scenes!
A separate point is that for most developers, creating a reusable library is not, and should not be the goal. The goal should be to write the code necessary to meet requirements. In the process, reusable code may be found. In that case, it should be refactored out into a separate library, where it can be reused in the future.
I go further than that (when permitted). I will usually wait until I find two pieces of code that actually do the same thing, or which overlap. Presumably both pieces of code have passed all their unit tests. I will then factor out the common code into a separate class library and run all the unit tests again. Assuming that they still pass, I've begun the creation of some reusable code that works (since the unit tests still pass).
This is in contrast to a lesson I learned in school, when the result of an entire project was a beautiful reusable library - with no code to reuse it.
(Of course, I'm sure it would have worked if any code had used it...)

Registration for Cocoa shareware

What is the best way to protect a Cocoa shareware application from software piracy? Are there developer libraries/tools out there for this task?
Allan Odgaard - using openssl for license keys is one way to do it.
AquaticPrime is a simple, easy Cocoa licensing framework. It uses securely signed plist's as it’s “license key”, which makes it simple to embed arbitrary information into the license.
With AquaticPrime one would generally distribute the license as a small file, rather than as a text string, which may be an advantage or disadvantage for your application.
The framework includes code to generate licenses in C#, Carbon, Cocoa, PHP, Python, Ruby and STL. It also includes a GUI one-off license generator. My experience with it has been great.
Speaking of Wil Shipley, he has made his in-application payment and registration framework available for licensing under the name of Golden % Braeburn. I believe that Delicious Library and SousChef both use this framework.
Let's see here.
Home-brewed approaches:
Common Counter-Measures
Serializing
Recommended Service Solutions:
Aquaticmac
eSellerate - Also handles transaction processing
This list is by no means comprehensive, but rather just a brief mention of some of the popular choices. Obviously, they can only detour piracy so there is clearly a compromise on how much time should be invested. I'd also suggest googling Wil Shipley's (delicious app) thoughts about why developers shouldn't go to great lengths to prevent piracy.
Don't forget to do some reading on Common methods of cracking Cocoa apps. Be wary of method swizzling and Input Managers. You don't have to go out of your way to protect your app -- A cracker will always outdo you -- but don't be naive either!
I nice solution that I can recommend is the potion store (from potion factory) together with the cocoafob classes (from gleb dolgich). They provide code generation and verification and an actual store for selling your app(s)
both are open source

Where can I find UML diagrams (instead of reinventing the wheel)?

I am currently trying to draw a set of UML diagrams to represent products, offers, orders, deliveries and payments. These diagrams have probably been invented by a million developers before me.
Are there any efforts to standardize the modeling of such common things? Or even the modeling of specific domains (for example car-manufacturing).
Do you know if there is some sort of repository containing UML diagrams (class diagrams, sequence diagrams, state diagrams...)?
There is a movement for documenting (as opposed to standardizing) models for certain domains. These are called analysis patterns and is a term Martin Fowler came up with. He actually wrote a book called Analysis patterns. Also, he has a dedicated section on his website where he presents some of these patterns accompanied by UML diagrams.
Maybe you'll find some inspiration that will help you in modeling your domain. I've stressed the word inspiration as I think different businesses have different requirements although they operate the same domain so the solutions you might read about may not be appropriate for your problem.
There are many tools out there that do both - but they're generally not free!
Microsoft Visio does both and is extensible. For UML artefacts they come with auto generators into VB/Java template code - but you can modify them to auto-generate any code. There are many users of Visio that have created models from which to use as templates.
Artisan Enterprize is by far the most powerful UML tool (but it's not cheap).
Some would argue that Rational Rose or RUP is the better tool
But for Car-Manufacturing and other similar real world modelling, by far the best tool is Mathworks Simulink (not because it's one of the most expensive). It is by far the best tool beccause you can animate the model - you can prove the model working before generating the slik code (in whatever grammar/language/other Models you care to push it)!
You can obtain a student license for around £180; with the 'real thing' pushing £4000 (for car-related artefacts). The full product with all the trimmings is about £15k. Simulink is also extensible with a C like language though there is a .Net addin and APIs to use a plethora of other langhuages. And, just like Visio there is a world-wide forum creating saleable, shareware & freeware real world model templates. Many world-wide Auto-Manufacturers are already using Simulink.
I think that MiniQuark question is really good and will sooner or later be provided by vendors such as Omondo, Rational IBM etc... Users doesn't just need tools, they need models out of the box and just add their business rules inside an existing well defined architecture. Why to develop from scratch a new architecture if the job has already be done ? In Java we use plenty of frameworks, existing methods etc...so why not to go one level higher and reuse architecture ? It is today impossible to guess how a project will evole and new demands are coming every day. We therefore need a stable architecture which has been tested previously and is extensible. I have seen so many projects starting with a nice architecture then realizing in the middle of the project that this is not what is the best and then changing their architecture. Renaming classes, splitting classes, creating packages etc...after the first iteration it is getting a real mess. Could you imagine what we found after 10 iterations !! a total mess !!
This mess would had been avoided if using a predefined model which has been tested previously because the missing class, or package etc..would have already been created and only a class rename would be sufficient for architecture purposes. Adding business rules methods will end the codding stage before deployment test.
I think there is a confusion between patterns and the initial question which is related to UML model re usability.
There is no today any reusable model out of the box which has been developped. This is really strange but the job has never been done or never been shared.
Omondo has tried to launch an initiative without real success. I have heard that they are working on hundred of out of box models which will be open source and given for free to the community. I hope this will be done because this is really important for me and would save me a lot of time at the beginning of a project.