Lucene has quite poor support for Russian language.
RussianAnalyzer (part of lucene-contrib) is of very low quality.
RussianStemmer module for Snowball is even worse. It does not recognize Russian text in Unicode strings, apparently assuming that some bizarre mix of Unicode and KOI8-R must be used instead.
Do you know any better solutions?
My answer is probably too late, but for the record, I've found analyzers from AOT project much better then those shipped with Lucene.
I used http://code.google.com/p/russianmorphology/
If all else fails, use Sphinx
Project http://code.google.com/p/russianmorphology/ moved to https://github.com/AKuznetsov/russianmorphology. Please take into account the new hosting of this project.
That's the beauty of open source. You have the source code, so if the current implementations don't work for you, you can always create your own or even better, extend the existing ones.
A good start would be the "Lucene in Action" book.
Related
I know there is the CLucene project, which is a port of Lucene from Java to C. But is there a Lucene wrapper in C/Objective-C similar to PyLucene that uses JNI and and embeds a JavaVM with Lucene in a separate process?
I explored this in some depth after asking this similar question a while back. The answer appears to be "no." I found CLucene as you did. It's got problems. I also found something called LuceneKit which was also mostly a mess. For my project I just mangled up the code from PyLucene just enough to get it working, and then moved on to another project. Unless something else has come along since then, I feel confident saying that No, there's not a pre-existing, serviceable Lucene wrapper out there. Sorry.
You can also look on Ferret - it is ruby wrapper and pure c library for full text searching. Ferret is similar to lucene but it uses own file format for indexing.
Since the answer seems to be 'No' I have been looking for different options.
There seems to be an alternative (free for non-commercial apps; $1000k per app for commercial projects).
http://www.locayta.com/iOS-search-engine/locayta-search-mobile/
I have not used it but I just run into it and saw some comments praising this solution.
You can take a look at Lucy, which seems to be exactly what you are looking for:
http://lucy.apache.org/
Is there a good editor/ide/add-on for sproutcore out there?
I use Jetbrains WebStorm.
Good support for js, not SC specific, but somehow it does very good highlighting for SC as wel and syntax errors.
Also not heavyweight (read slow).
I like it very much, tried a couple of other before.
Intype is also nice, less syntax help and other features.
E-text editor is windows version of Textmate with less features, more mature then Intype.
TextMate has a SproutCore bundle which is helpful - at the very least it runs JSLint on your .js files at save time, which stops a lot of basic syntax errors.
I use intellij. you can ctrl-click into most methods, and the warnings that it provides are close to what jslint will give you. You can also autocomplete.
I'm using JetBrains Ruby Mine. It's pretty smart. Coding hinting has built in integration got GitHub and from what I can tell pretty light weight. Refactoring is pretty awesome too. One of the best I've seen.
I'm planning to do a Cocoa app that requires code syntax to be colored (in all common languages). Instead of writing my own code highlighter/parser, are there any pre-made solutions available?
Thanks
You might be able to use something like Geshi, but there're also the resources listed here: http://www.cocoadev.com/index.pl?SyntaxHighlighting
Edit
More links:
Syntax Highlighting in Cocoa TextView? Experiences? Suggestions? Ideas?
http://parsekit.com/okudakit/
An excellent solution is Uli Kusterer's UKSyntaxColoredTextDocument. It is fast and has several built-in syntax parsers. It's easy to add new languages.
It's free for non-commercial use and very cheap if you want it for a commercial app.
You can also use the JavaScript library SyntaxHighlighter and embed it into a WebView into your app.
After quite a bit of research trying to solve a similar problem, the simplest approach I found by far is to use a JavaScript library for syntax highlighting combined with a WebView. Spending time writing a syntax highlighter, a fairly complex task, is probably not what you'd want to spend time on.
I settled on using the popular CodeMirror and wrote an open source wrapper for Cocoa: https://github.com/swisspol/CodeMirrorView. You can use similar approaches to wrap other JavaScript based code editors in Cocoa apps.
You can use highlight that is used in QLColorCode :) (however, it's not a Framework that you include in your code, but a command-line utility)
EDIT: Ah yeah, use Geshi, it's probably better :D
If I were looking to create my own language are there any tools that would help me along? I have heard of yacc but I'm wondering how I would implement features that I want in the language.
Closely related questions (all taken by searching on [compiler] on stackoverflow):
Learning Resources on Parsers, Interpreters, and Compilers
Learning to write a compiler
Constructing a simple interpreter
...
And similar topics (from the same search):
Bootstrapping a language
How much of the compiler should we know?
Writing a compiler in its own language
...
Edit: I know the stackoverflow related question search isn't what we'd like it to be, but
did we really need the nth iteration of this topic? Meh!
The first tool I would recommend is the Dragon Book. That is the reference for building compilers. Designing a language is no easy task, implementing it is even more difficult. The dragon book helps there. The book even reference to the standard unix tools lex and yacc. The gnu equivalent tools are called flex and bison. They both generate lexer and parser. There exist also more modern tools for generating lexer and parser, e.g. for java there are ANTLR (I also remember javacc and CUP, but I used myself only ANTLR). The fact that ANTLR combines parser and lexer and that eclipse plugin is availabe make it very comfortable to use. But to compare them, the type of parser you need, and know for what you need them, you should read the Dragon book. There are also other things you have to consider, like runtime environment, programming paradigm, ....
If you have already certain design ideas and need help for a certain step or detail the anwsers could be more helpful.
ANTLR is a very nice parser generator written in Java. There's a terrific book available, too.
I like Flex (Fast Lex) [Lexical scanner]
and Bison (A Hairy Yacc) [Yet another compiler compiler]
Both are free and available on all *NIX installations. For Windows just install cygwin.
But I old school.
By using these tools you can also find the lex rules and yacc gramers for a lot of popular languages on the internet. Thus providing you with a quick way to get up and running and then you can customize the grammers as you go.
Example: Arithmetic expression handling [order of precedence etc is a done to death problem] you can quickly get the grammer for this from the web.
An alternative to think about is to write a front-end extension to GCC.
Non Trivial but if you want a compiled language it saves a lot of work in the code generation section (you will still need to know love and understand flex/bison).
I never finished the complete language, I had used rply and llvmlite implements a simple foxbase language, in https://github.com/acekingke/foxbase_compiler
so if you want use python, rply or llvmlite is helpful.
if you want use golang, goyacc maybe useful. But you should write a lexical analyzer by hard coding by hand. Or you can use https://github.com/acekingke/lexergo to simplify it.
I've been making my way through The Little Schemer and I was wondering what environment, IDE or interpreter would be best to use in order to test any of the Scheme code I jot down for myself.
Racket (formerly Dr Scheme) has a nice editor, several different Scheme dialects, an attempt at visual debugging, lots of libraries, and can run on most platforms. It even has some modes specifically geared around learning the language.
I would highly recommend both Chicken and Gauche for scheme.
PLT Scheme (DrScheme) is one of the best IDEs out there, especially for Scheme. The package you get when downloading it contains all you need for developing Scheme code - libraries, documentation, examples, and so on. Highly recommended.
If you just want to test your scheme code, I would recommend PLT Scheme. It offers a very complete environment, with debugger, help, etc., and works on most platforms.
But if you also want to get an idea of how the interpreter behind the scenes works, and have Visual Studio, I would recommend Tachy. It is a very lightweight scheme interpreter written in c#. It allows you to debug just your scheme code, or also step through the c# interpreter behind the scenes to see what is going on.
Just for the record I have to mention IronScheme.
IronScheme will aim to be a R6RS conforming Scheme implementation based on the Microsoft DLR.
Version 1.0 Beta 1 was just released. I think this should be good implementation for someone that is already using .NET framework.
EDIT
Current version is 1.0 RC 1 from Oct 23 2009
Google for the book's authors (Daniel Friedman and Matthias Felleisen). See whether either of them is involved with a popular, free, existing Scheme implementation.
It doesn't matter, as long as you subscribe to the mailing list(wiki/irc/online-community-site) for the associated community. It's probably worth taking a look at the list description and archives to be sure you are in the right one.
Most of these are friendly and welcoming to newcomers, so don't be afraid to ask.
It's also worth searching the archives of their mailing list(or FAQ or whatever they use) when you have a question - just in case it is a frequent question.
Good Luck!
Guile running under Geiser within Emacs provides a nice, lightweight implementation for doing the exercises. Racket will also run under Geiser and Emacs, though I personally prefer Guile and Chez Scheme a bit more.
Obviously installation of each will depend on your OS. I would recommend using Emacs version 24 and later since this allows you to use Melpa or Marmalade to install Geiser and other Emacs extensions.
The current version of Geiser also works quite nicely with Chicken Scheme, Chez Scheme, MIT Scheme and Chibi Scheme.
LispMe works on a Palm Pilot, take it anywhere, and scheme on the go. GREAT way to learn scheme.
I've used PLT as mentioned in some of the other posts and it works quite nicely. One that I have read about but have not used is Allegro Common LISP Express. I read a stellar review about their database app called Allegro Cache and found that they are heavy into LISP. Like I said, I don't know if it's any good, but it might be worth a try.
I am currently working through the Little Schemer as well and use Emacs as my environment, along Quack, which adds additional support and utilities for scheme-mode within Emacs.
If you are planning on experimenting with other Lisps (e.g. Common Lisp), Emacs has excellent support for those dialects as well (Emacs itself can be customized with its own dialect of Lisp, appropriately named Emacs Lisp).
As far as Scheme implementations go, I am currently using Petit Chez Scheme, which is an interpreted, freely distributable version of Chez Scheme (which uses a compiler and costs money to obtain a license).