Grails app: own query language for data stored in DB and files + full text search (Hibernate Search, Compass etc) - lucene

I have an application which stored short descriptive data in DB and lots of related textual data in text files.
I would like to add "advanced search" for DB. I was thinking about adding own query language like JIRA does (Jira Query Language). Then I thought about having full text search across those textual files (less priority).
Wondering which tool will better suite me to implement that faster and simpler.
I most of all I want to provide users with ability to write their own queries instead of using elements to specify search filters.
Thanks
UPD. I save dates in DB and most of varchar fields contain one word strings.
UPD2. Apache Derby is used right now.

Take a look at the Searchable plugin for Grails.
http://www.grails.org/plugin/searchable

Related

form a database out of .txt file

I have a .txt file with rows of the following format
SI1334596|MRKU3|High Cube|1|EGST|First Line|Vehicle one|25|13|
How do I form a database of above .txt entries to perform SQL queries on it? I also want to assign column names to each of the columns. I have little or no knowledge on importing txt file entries in a database. I am looking for a software which can be installed on my windows computer which can import .txt file and convert into a database and allow me to perform queries thereafter.
If you are asking for recommendations on specific tools, then your Question is off-topic for StackOverflow.com. See the Software Recommendations Stack Exchange.
Here are some possible approaches, with and without programming.
Database Import
Databases often have a built-in command or facility for importing data straight from a text file. When directly importing text with little or processing, the import is often very fast.
For example, Postgres has the copy command to import. This command includes a parameter DELIMITER where you can tell it to expect the vertical bar | as the separator between fields.
You would define your table structure ahead of time, before the import, defining a name and data type for each expected column/field.
Custom App
You can write an app to read the text file, process the incoming data, and feed the prepared data to the database. For example, write a Java app that reads the text file, uses JDBC to connect to the server, and SQL written as text strings to instruct the database server on what to do.
You can do this row by row. Or, for increased speed, you can write a batch statement telling the database server to create multiple rows at the same time.
This is the way to go if the data requires complicated processing or there are other related chores such as keeping a history of many such imports, logging other information, reporting duplicate data, and so on.
For Java, the Apache Commons CSV library helps with reading/writing plain text files.
Spreadsheet
Many spreadsheets, such as LibreOffice Calc, can parse the data, deduce the column headers as titles, and populate a spreadsheet. You can do queries within the spreadsheet. Works well for smaller amounts of data that can comfortably reside within memory. You may not need a database at all.
Database Tool
SQL database engines such as Postgres, H2, SQLite, and MySQL/MariaDB are just black-box engines not full-blown interactive data tools. You can obtain such tools that connect with these engines. This tools can import/export text files, display lists of data, let you enter/modify data, create forms for better access to the data, and generate reports.
But there are some such data tools that have a database engine built-in. Examples include:
FileMaker
4D
LibreOffice Base

SQL query for a search engine

My project is based on Questions and Answers (stackoverflow style).
I need to allow users to search for previously asked questions.
The Questions table would be like this:
Questions
-------------------------------------------------
id questions
-------------------------------------------------
1 How to cook pasta?
2 How to Drive a car?
3 When did Napoleon die?
Now when I'm going to write something to search for, I would write something like this:
When did Brazil win the world cup?
Let's say I'm gonna split this String on spaces, into an array of Strings.
What is the best SELECT SQL query to fetch all questions containing those Strings, ignoring upper case and lower case for each word, and sorting the results by the less mentioned word, why?
Because there will be so many questions which will contain When,and,will,how,etc.. , but not so many questions which will have Brazil, so Brazil would be like the Key Word.
I'm using SQL Server 2008.
You really don't want to be doing this in raw SQL.
I suggest you look into the full-text search options for your database, this might be a good place to start.
In mysql you have full-text indexes and the match() select function which allow just this,
in SQL Server you should use the function Contains()
Find more info on
http://msdn.microsoft.com/en-us/library/ms142571.aspx
Your option is not the best one. Take a look at open source Apache Solr project. http://lucene.apache.org/solr/
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.
Advanced Full-Text Search Capabilities Optimized for High Volume Web
Traffic Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces Server statistics
exposed over JMX for monitoring Linearly scalable, auto index
replication, auto failover and recovery Near Real-time indexing
Flexible and Adaptable with XML configuration Extensible Plugin
Architecture
Take a look at Detailed Features and aspecialy Query section. There all you need for your app.

How to integrate database search with pdf search in a web app?

I've a jsp web application with a custom search engine.
The search engine is basically build on top of a 'documents' table of a SQL Server database.
To exemplify, each document record has three fields:
document id
'descripion' (text field)
'attachment', a path of a pdf file in the filesystem.
The search engine actually searches keywords in description field and returns a result list in an HTML page. Now I want to search keywords even in the pdf file content.
I'm investigating about Lucene, Tika, Solr, but I don't understand how I can use these frameworks for my goal.
One possible solution: using Tika to extract pdf content and store in a new document table field, so I can write SQL queries on this field.
Are there better alternatives?
Can I use Solr/Lucene indexing features as an integration of SQL-based search engine and not as a totally substitute of it?
Thanks
I would consider Lucene to be completely independent of an SQL Database, i.e. you will not use SQL/jdbc/whatever DB to query Lucene, but its own API and its own data store.
You could of course use Tika to extract the full text of a pdf, store it, and use whatever your SQL DB provides re. fulltext search capacity.
If you are using Hibernate, Hibernate Search is a fantastic product which integrates both an SQL store and Lucene. But you would have to go the Hibernate/JPA way, which might be overkill for your project.

Multi database schema export

I'm writing a project that will make use of mySQL or Derby (it must work in both situations). To solve this I'm using Hibernate and it works nicely, but for a situation: I've a couple of tables that contain cities and towns, with data related so that if i know the town I can join and get the county, state and zip code. These tables are full of thousands of rows. I'm not using Hibernate to handle them, but plain JDBC. These table are not going to change in time, they're just for reference and for autocompletion needs. So what is the best way to reproduce this tables in both mySQL and JavaDB? Specifically they must be generated on the first start of the app. I thought of creating a special format and save everything to text files, then on first run they will be inserted in the db... but is there a way to save some coding and use a tool that is already there?
I found many saying to use CSV, but it is not the case as it doesn't keep information like the type of column or length. Same for the XML that my mySQL tool (sqlYog) produces. Do you have other suggestions or tools?
I would use Hibernate, and map the entities as mutable=false. Hibernate is then not permitted to change anything. Create the schema using the standard Hibernate means, then use IStatelessSession to insert the records, ensuring you've enabled batching.

Sql Server Full Text Search - Getting word occurances/location in text?

Suppose I have Sql Server (2005/2008) create an index from one of my tables.
I wish to use my own custom search engine (a little more tuned to my needs than Full Text Search).
In order to use it however, I need Sql Server to provide me the word positions and other data required by the search engine.
Is there anyway to query the "index" for this data instead of just getting search results?
Thanks
Roey
No. And if you could, what happens if Microsoft decide to change their internal data structures? Your code would break.
What are you trying to achieve?
You shouldnt rely on SQL servers internal data structures - they are tailored specifically for SQL servers use and aren't acessible for querying anyway.
If you want a fast indexer then you will probably have more success using a pre-written one rather than trying to write your own. Give Lucene.Net a try.