Indexing for later full text search in Rails 3 - ruby-on-rails-3

I am working on an application, where then need is to index the data without storing it to database.
When I initialize an abject it should index it. Consider there is a pages table with fields page_title, tags, content.
The last field content may have a large amount of text data(some times in MBs). Which is not going to be used for processing at all.
My objective is to index that data without saving it to database. I mean only pages, page_title, tags will be saved into the DB and indexed as well, and content will be indexed only.
I am open to use any full-text search plugin/gem

Implemented this using ultrasphinx. I am indexing by manually generating xml docs for sphinx.

Related

Adding an extra field to already indexed data Solr

I have indexed approximately 1000 documents in Solr. But all of them are missing a field. I need to add a field to all these documents, and this field will have the same value for all of them. I do not have access to these documents to index them again. Is there any way to do this without re-indexing all the data again?
Unless you've configured your schema to store all values, no, there is no usable way to add a field to the documents without reindexing. If you all fields are stored, you can use atomic updates to add a new field for a document, so you could query Solr for the ids of all existing documents and perform an update that way.
Otherwise you're going to have to go with the suggestion from #michielvoo, and return a static value from the query string .. but then you could also just append it in your application before returning it to the user (or, you could add the field as a default value for the request handler in solrconfig.xml, so that you can edit and change it server side).

How to avoid retrieve entire stored field from solr

I'm using sunspot and solr for a rails app to search ebook contens, for highlight feature I have to set the ebook_content as a stored filed, every time I queried solr for result, it sends back the entire document content about the book, which makes the query very slow.
How could I only get the result without the stored field?
The fl parameter of Solr allows you to specify which fields you want returned in the result. If you had fields id, title, ebook_content, then you could use fl=id,title to omit the ebook_content field. I don't think there's support in Solr for getting all fields except one (e.g. -ebook_content).
Update
If you don't want to return the field in the normal results, but still want highlighting on that field, exclude the field as I described above, then turn on the highlighter:
hl=true
set the field(s) which should be highlighted:
hl.fl=ebook_content
and set the size of the highlighting fragment (in characters):
hl.fragsize=50
your finished query looks something like this:
?q=search term&fl=id,title&hl=true&hl.fl=ebook_content&hl.fragsize=50

Extract MS Word document chapters to SQL database records?

I have a 300+ page word document containing hundreds of "chapters" (as defined by heading formats) and currently indexed by word. Each chapter contains a medium amount of text (typically less than a page) and perhaps an associated graphic or two. I would like to split the document up into database records for use in an iPhone program - each chapter would be a record consisting of a title, id #, and content fields. I haven't decided yet if I would want the pictures to be a separate field (probably just containing a file name), or HTML or similar style links in the content text. In any case, the end result would be that I could display a searchable table of titles that the user could click on to pull up any given entry.
The difficulty I am having at the moment is getting from the word document to the database. How can I most easily split the document up into records by chapter, while keeping the image associations? I thought of inserting some unique character between each chapter, saving to text format, and then writing a script to parse the document into a database based on that character, but I'm not sure that I can handle the graphics in this scenario. Other options?
To answer my own question:
Given a fairly simply formatted word document
convert it to an Open Office XML document
write a python script to parse the document into a database using the xml.sax python module.
Images are inserted into the record as HTML, to be displayed using a web interface.

How to display search results in a new form

I've created a system and within that system i've a find/search page and a find/search results page. Basically, the find/search page consists of a number of text fields and the more the user completes, the more efficient the search will be.
I'm using SQL server 2005 to store the data and I can easily update/insert/save new data but I don't know how to search for the data ...
I want the user to fill out the fields in the find/search form and for the results to appear in the find/search results page. Can this be done?
It depends on what kind of Data you need to search.
If it's generic text data the best way is to use Full-Text Search
Yes. There are a number of ways you could achieve this. One possible way would be to pass the search criteria to the search results page via query string. Another way which is very similar is to store the search criteria in a session and redirect to the search results page. In either case on the search results page you'd want to take the data and build your SQL query. Depending on what you need you could utilize a full-text search like Kesty has suggested or you could simply use FIELD like '%user entered data%' in your queries. It really depends on your needs.

#DBColumn in Lotus Notes

I've been tasked with learning Lotus Domino Designer - not sure what I did in a previous life, but it must have been pretty bad... - and was wondering how to do a lookup on a database to get some values for selections. As this information could potentially be used in a lot of the applications, I'd prefer it only to be in the one place.
I gather I can use #DBColumn, but what happens if an entry in that lookup changes? If the unique value of the lookup is the text, then the relationship would be broken, wouldn't it? Is there any way of mimicing the idea of relational lookups?
I'm assuming I'm looking at Lotus development from the wrong angle, as this seems to be a real limitation of look ups.
I haven't found any decent learning material on the interwebs, so would appreciate any help.
Ta
You would want to store a unique ID along with the textual value in the source database (not unlike what you would do in an RDBMS). Then, only store that ID in any referencing documents, and use a computed-for-display field to lookup the display value. (There is a performance consideration here - and you could "de-normalize" the data and store the ID and text value in the referencing documents, and do some asynchronous work to keep the values in sync - eg: using a scheduled agent that runs every night or every week).
If DB1 has the key values and DB2 has the documents which will reference these values, then in the form in DB2, you would still do a #DbColumn to lookup your value list. In the lookup view in DB1, concat the text value and ID with a pipe separator (textField + "|" + ID) in the first column. That will tell Notes to store only the ID value (what follows the pipe is the "alias" and is what will be stored).
Note: I would avoid using #DocumentUniqueID as the unique ID for these values, as the Document Unique ID will change if the documents are copied and pasted, or the entire database is copied, etc. You can use the #unique formula function in a computed-when-composed field to generate something close to a unique ID (almost like an identity column in sql).
If you need relational properties, look for non-Notes solutions. It is possible to get some relational behavior using document UNIDs and update agents, but it will be harder than with a proper relational backend.
Your specific problem with referencing to a piece of text that might change can to some extent be resolved by using aliases in the choice fields. If a dialog list contains values on the form...
Foo|id1
Bar|id2
...the form will display Foo but the back-end document will store the value id1 - (and this is what you will be able to show in standard views - although xpages could solve that). Using the #DocumentUniqueID for alias can be a good idea under some circumstances.
It depends on where your using the data. The #DBLookup or #DBColumn will work in Lotus Notes fields if the fields are set to be computed for display. That way they always get the most up to date information when you open the form etc.
If you make it so the data is saved on to the document then you will have to write some update code when you need to refresh the values.
The Lotus Notes help files for designer are pretty good, have a look at that.
SM
You could use a key or alias to store the relationship to your lookup value so if the value itself changes, the connection remains because the alias is intact. For example, if your lookup values were being stored as a collection of documents, I'd have the #DBColumn retrieve Document UNID|lookup value pairs. When in display mode, you could then retrive the value using #GetDocField. If the lookup values are in a different database, then you'd have to retrieve them for display using #DBLookup and construct a view that is keyed off of the UNID or whatever key you decide to use.The only drawback to this technique is that you wouldn't be able to display the field value in views as the actual value isn't stored in the document, just a reference to it. Using XPages, though, you COULD map the relationship into a dynamic datatable just like you would in a truly relational system.
It's tricky, but using LEI, you could also use Notes to front-end a relational backend system, also giving you the dynamic relationship you desire in your lookups.
Hope this helps!
The content of the lookup can change freely. A problem only arises (as it would on any other platform in the same circumstances) if the lookup key changes. You need to use a key that won't change. Human-readable text is an advantage, but if you want to be able to change your key description from, say, "Divisions" to "Business Units" and still have lookups work, you need to use an alias of some kind, which will presumably be mapped to your text description and only used internally. #Unique is pretty good for this, and gives a shortish key, if that is important to you. #DocumentUniqueID is most reliable, but as Ed pointed out, will change (must change - it's a new document) if you copy/paste or make a non-replica copy. This is easy to get around, though. Create a Computed-when-composed field (called, say, "LookupRef") on the form you are using for your reference document with the formula "#DocumentUniqueID". That will capture the ID at the time of creation, and it will not change on copy/paste etc. Use that as your key.