Word Semantic-Similarity (distance measures) webservices? - semantics

Is there any web-service that provides a word semantic-similarity measurements?
I'm aware of Disco, but I'd prefer a service with an ongoing growing base (and it would be most helpful if you had tried it in your projects).
I'm also aware of WordNet-based algorithms, but installing and managing it as part of the project resources brings heavy weight.

You could consider using the google index API, there is a sample application here - http://www.mechanicalcinderella.com/ uses it and has links to the original paper describing the technique. Source code here - http://complearn.org/index.html.
i have not used it myself, other than for fun... but it seems pretty decent.

Related

Functional Specification Process Management

Developing functional specifications is never a pleasurable experience, but I kind of find a sick pleasure in planning a project well. I think I have some father issues.
Regardless of my own issues, I can find any number of articles on how to create a single functional spec in varying degrees of usefulness. There are templates and examples aplenty, and I've got a good library of my own. However I am finding it difficult to find anyone who discusses a manner in which to produce multiple functional specs with any efficiency.
Does anyone know of a source discussing how to manage the process of quickly generating disparate types of functional specs? Say a company that delivers web apps, perhaps using a rapid development tool like ColdFusion or PhoneGap or something where the experience lies within the use of the tool not the end result. So the functional specs can have a wonderful array of difference in them.
Can anyone point me towards a way of managing this process to ease the burden of building each of these from scratch?
EDIT - I really like OmniGraffle, however I'm not trying to maintain a look and feel or do anything visual (saving past screen shots might be useful if they can be indexed). Code Snippets seems closer to what I wanted. But in actuality I think I am looking for the method to archive/index past blocks of text.
So if I described a purchase order system a year ago and I am building something similar today, I want to find that functional spec from a year ago to have some example text to start from.
In my head this is liek some novel writing software where like code snippets a block of text (either a scene, chapter or blurb or whatever can be written and then moved aroudn int eh body of the whole. yWriter does this. However I need to find a way to index/search through these large chunks of text for relevance. I am hoping to learn more about that kind of system.
Fleshing out the ambiguity
If you are asking about templates that are primarily textual, then your best bet is probably just to have a 'stationary' file that you can open a copy, adding pieces that are copies of the template structure you've saved to the 'stationary', and then save out the draft spec.
If you are referring to diagrams and other visual schematic that follow a 'spec language' that is unique to your development framework, then I would suggest a tool like OmniGraffle, Visio, or LucidCharts, which have active communities that develop 'stencil libraries' (e.g. graffletopia)
I think you more mean #1, in which case you might look to examples like OmniOutliner templates which can contain sophisticated stylization of fonts and format, akin to 'type styles' in Word documents.
Code Snippets are one mechanism for solving this, but you will only get snippet libraries for programming IDEs, which generally will lack text style features. Code Snippet libraries are like text macros: short strips that expand into large blocks of text. You could create your own snippets for the different structures of project spec that related to each kind of framework.
Another solution is to leverage the file interoperability of tools like OmniGraffle and OmniOutliner (or other pairings). WhenOmniGraffle opens an Outliner file, it displays the list structure as a tree of objects/nodes. After adding more nodes, the OmniGraffle file can be re-opened in OmniOutliner and viewed as a list, with all the attached Outliner styles.
This is a nice multi-modal approach, but locks you into a toolset. Probably unavoidable until more people demand tooling to do this kind of thing.

Should I choose Hiberlite for integrating SQLite into my Win/iOS application?

I am a composer by profession and my computer science skills are limited though I program quite a bit of the software that I use.
What are the most reasonable ways to approach SQLite integration as a file format and database in an iOS app (it also needs to run on windows, but that is a secondary concern)?
I have been researching Hiberlite, which looks fantastic, but it seems to be little used and apparently it doesn't run well on embedded systems (iOS?) and chokes up when thousands of objects are in play. I haven't been able to get a sense of how severe those bottle necks are when running under those conditions.
The settings of thousands of objects (~50,000 though that number could expand) would be read every 1-10 seconds and written periodically. Read performance is more critical as write operations can stutter with out effecting the core operation of the app.
Given those conditions, how should I approach SQLite? My understanding is that without something like Hiberlite the entire database (many millions of entries) must be read and rewritten for every entry, is that less efficient. If that is the best approach is there a good resource to follow for implementing it?
Any advice would be greatly appreciated. My current software that I rely on is beyond buggy and needs refactoring, but due to my inexperience I am having a difficult time finding information about a reasonable approach.
I'm guessing you've probably found a solution for this by now, but I've been interested myself in embedding SQLite on Android and IOS, and I came across many C++-based ORM solutions.
Hiberlite looked possibly not fully mature (I didn't readily see a method of returning subsets of data, which is fairly standard). A framework which did draw my attention was the POCO:Data ORM library. It's based on the stream-based mechanism used in SOCI ORM. The POCO library is modular and optimised for embedded environments (I believe it also has a minimal external dependencies). Wikipedia has an article here, they outline some of its users, of which OpenFrameworks is one.
The WT ORM also looked pretty interesting.
I'm listing some of the other C++ ORM frameworks I found here, in no particular order:
http://soci.sourceforge.net
webtoolkit WT DBO ORM
http://debea.net
http://www.qxorm.com
http://sourceforge.net/apps/trac/litesql
http://otl.sourceforge.net
http://cppcms.com/sql/cppdb
http://dtemplatelib.sourceforge.net
http://code.google.com/p/qdjango

Experiences with using Alloy in real-world projects

I have been interested in formal methods for some time. I have used formal methods to reason about some very specific sub-areas of a few projects I have been working on. I was never able to convince other team members to try the same let alone specify an entire domain with a formal method.
One method I have found particularly interesting is Alloy. I think that it may "scale" better as foundation for an entire project because it is conceptually and notationally very close to actual programming languages. Furthermore, the tools are quite solid so that the benefits of model verification are readily available.
I'd be very much interested to hear about any real-world experiences you folks might have had with using Alloy in your projects. Do you feel that it has helped you in designing a better domain model? Did find errors in your domain model during verification? Would you use it again?
I've used Alloy on a few projects and have found it helpful; on some but not all of those projects I have been able to persuade others involved to use Alloy as well, or at least to work with the Alloy models I wrote. These projects may or may not be what you have in mind in asking for 'real-world' projects, but they certainly took place in the part of the real world I work in.
In 2006 and 2007 I created a partial Alloy model for the then-current draft of the W3C XProc specification; as far as I could tell, most members of the working group never read the paper I wrote (at http://www.w3.org/XML/XProc/2006/12/alloy-models/models.html); they said "Oh, we changed that part of the spec last week, so what the model says is no longer relevant". But the paper did manage to persuade the editor of the spec that the abstract 'component' level described in the first draft of the spec was woefully underspecified and needed to be either fully specified or dropped. He dropped it, with (I think) good results for the readability and usability of the spec.
In 2010 I made an Alloy model of the XPath 1.0 data model, which uncovered some glitches in the specification. The reaction of most interested parties (including the W3C working group responsible for maintaining the XPath 1.0 spec) has, unfortunately, not been encouraging.
A research project I'm involved with has used Alloy to model the MLCD Overlap Corpus, a collection of sample documents and related information we are creating (hyperlinks suppressed at SO's insistence); the Alloy model found a couple of errors in our initial design for the corpus catalog, so it was well worth the effort.
And we have also used Alloy to formalize some modeling work we have done on the nature of transcription and on the extension of the type/token distinction to document structure (for our paper, look for the 2010 proceedings of Balisage: The Markup Conference). This lies a little bit outside Alloy's usual area of application, as it has nothing to do with software design, but Alloy's ability to check models for consistency and generate instances has been invaluable in showing us some of the logical consequences of this or that possible axiom for our model.
To answer your specific questions: yes, Alloy has helped me specify cleaner domain models, and yes, it has found errors and glitches. They have often been small, for the reasons Daniel Jackson explains in his book Software Abstractions: first, if you use models during design, you catch errors early, when everything is still small. And, second (in Jackson's words), "In hindsight, most software design issues are trivial."
He continues: "But if you don't address them head-on, trivial issues have a nasty habit of becoming nontrivial." My experience amply confirms this. Much better to head off such problems early. So yes, I will use Alloy again.
Yes, I've used Alloy and it's cousins industrially. Alloy has been most helpful in convincing me that my models weren't wildly wrong---or rather, showing me where they were wrong and gave rise to silly results. Other more specific tools, like Song's Athena and Guttman and Ramsdell's CPSA have been more useful in their narrower domains. What more would you like to hear about?
Belatedly adding to this thread... Eunsuk Kang has recently applied Alloy to perform security analyses of web APIs for some start ups (following many applications of Alloy in security such as Apurva's analysis of OAuth and Barth et al's analysis of browser based security mechanisms for CSRF etc); Pamela Zave has been working on an impressive analysis of Chord, a peer to peer storage system, and has recently written up a fix to the original algorithm.

Jumping into N-Tier architecture with WCF?

I work for a large state government agency that is a tad behind the times. Our skill sets are outdated and budgetary freezes prevent any training or hiring of new employees/consultants (firing people is also impossible). Designing business objects, implementing design patterns, establishing code libraries and services, unit testing, source control, etc. are all things that you will not find being done here. We are as much of a 0 on the Joel Test as you can possibly get. The good news is that we can only go up from here!
We develop desktop CRUD applications (in C++, C#, or Java) that hit the Oracle database directly through an ODBC connection. We basically have GUI's littered with SQL statements and patchwork code. We have been told to move towards a service-oriented n-tier architecture to prevent direct access to the database and remove the Oracle Client need on user machines.
Is WCF the path we should be headed down? We've done a few of the n-tier application walkthroughs (like this one) and they seem easy to implement, but we just don't know enough to understand if we are even considering the right technologies. Utilizing the .NET generated typed DataSets seems like a nice stopgap to save us month/years of work (as opposed to creating new business objects from the ground up for numerous projects). Is this canned approach viable for a first step?
I recently started using WCF services for my Data Layer in some web applications and I must say, it's frustrating at the beginning (the first week or so), but it is totally worth it once the code is deployed.
You should first try it out with a small existing app, or maybe a proof of concept to make sure it will fit your needs.
From the description of the environment you are in, I'm sure you'll realize the benefit almost immediately.
The last company I worked for chose WCF for almost the exact reason you describe above. There is lots of good documentation and books for WCF, its relatively easy to get working, and WCF supports a lot of configuration options.
There can be some headaches when you start trying to bend WCF to work in a way not specifically designed out of the box. These are generally configuration issues. But sites like this or IDesign can help you through those.
First of all, I would definitely not (sorry for the emphasis) worry about the time you'll save using typed DataSet's versus creating your own business objects. That is usually not where you will spend most of your development time. I prefer using business objects myself.
In you're situation I would want to implement a proof-of-concept first. One that addresses all issues you may encounter. This proof-of-concept should implement an entire use case, starting on the client, retrieving data from the database and returning it to the client. You should feel confident about your implementation before continuing.
Then about choice of technology. WCF is definitely a good choice for communication between your client applications and the service layer. I suppose that both your clients as well as your service layer will become C# applications? That makes things a lot easier since interoperability between different platforms (Java/C# for example) is still not trivial although it should work in most cases.
Take a look at Entity Framework (as there are a couple Oracle providers available for it already) in conjunction with .NET 3.5 SP1 which enables built-in WCF serialization of your EF generated classes.
Here is a good blog to get started: http://blogs.msdn.com/dsimmons
CSLA might be a good fit for your N-Tier desktop apps. It supports WCF, has a large dev community, and is well documented. It is very object oriented.

Where can I find UML diagrams (instead of reinventing the wheel)?

I am currently trying to draw a set of UML diagrams to represent products, offers, orders, deliveries and payments. These diagrams have probably been invented by a million developers before me.
Are there any efforts to standardize the modeling of such common things? Or even the modeling of specific domains (for example car-manufacturing).
Do you know if there is some sort of repository containing UML diagrams (class diagrams, sequence diagrams, state diagrams...)?
There is a movement for documenting (as opposed to standardizing) models for certain domains. These are called analysis patterns and is a term Martin Fowler came up with. He actually wrote a book called Analysis patterns. Also, he has a dedicated section on his website where he presents some of these patterns accompanied by UML diagrams.
Maybe you'll find some inspiration that will help you in modeling your domain. I've stressed the word inspiration as I think different businesses have different requirements although they operate the same domain so the solutions you might read about may not be appropriate for your problem.
There are many tools out there that do both - but they're generally not free!
Microsoft Visio does both and is extensible. For UML artefacts they come with auto generators into VB/Java template code - but you can modify them to auto-generate any code. There are many users of Visio that have created models from which to use as templates.
Artisan Enterprize is by far the most powerful UML tool (but it's not cheap).
Some would argue that Rational Rose or RUP is the better tool
But for Car-Manufacturing and other similar real world modelling, by far the best tool is Mathworks Simulink (not because it's one of the most expensive). It is by far the best tool beccause you can animate the model - you can prove the model working before generating the slik code (in whatever grammar/language/other Models you care to push it)!
You can obtain a student license for around £180; with the 'real thing' pushing £4000 (for car-related artefacts). The full product with all the trimmings is about £15k. Simulink is also extensible with a C like language though there is a .Net addin and APIs to use a plethora of other langhuages. And, just like Visio there is a world-wide forum creating saleable, shareware & freeware real world model templates. Many world-wide Auto-Manufacturers are already using Simulink.
I think that MiniQuark question is really good and will sooner or later be provided by vendors such as Omondo, Rational IBM etc... Users doesn't just need tools, they need models out of the box and just add their business rules inside an existing well defined architecture. Why to develop from scratch a new architecture if the job has already be done ? In Java we use plenty of frameworks, existing methods etc...so why not to go one level higher and reuse architecture ? It is today impossible to guess how a project will evole and new demands are coming every day. We therefore need a stable architecture which has been tested previously and is extensible. I have seen so many projects starting with a nice architecture then realizing in the middle of the project that this is not what is the best and then changing their architecture. Renaming classes, splitting classes, creating packages etc...after the first iteration it is getting a real mess. Could you imagine what we found after 10 iterations !! a total mess !!
This mess would had been avoided if using a predefined model which has been tested previously because the missing class, or package etc..would have already been created and only a class rename would be sufficient for architecture purposes. Adding business rules methods will end the codding stage before deployment test.
I think there is a confusion between patterns and the initial question which is related to UML model re usability.
There is no today any reusable model out of the box which has been developped. This is really strange but the job has never been done or never been shared.
Omondo has tried to launch an initiative without real success. I have heard that they are working on hundred of out of box models which will be open source and given for free to the community. I hope this will be done because this is really important for me and would save me a lot of time at the beginning of a project.