Case-insensitive search using Hibernate - sql

I'm using Hibernate for ORM of my Java app to an Oracle database (not that the database vendor matters, we may switch to another database one day), and I want to retrieve objects from the database according to user-provided strings. For example, when searching for people, if the user is looking for people who live in 'fran', I want to be able to give her people in San Francisco.
SQL is not my strong suit, and I prefer Hibernate's Criteria building code to hard-coded strings as it is. Can anyone point me in the right direction about how to do this in code, and if impossible, how the hard-coded SQL should look like?
Thanks,
Yuval =8-)

For the simple case you describe, look at Restrictions.ilike(), which does a case-insensitive search.
Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', '%fran%');
List results = crit.list();

Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', 'fran', MatchMode.ANYWHERE);
List results = crit.list();

If you use Spring's HibernateTemplate to interact with Hibernate, here is how you would do a case insensitive search on a user's email address:
getHibernateTemplate().find("from User where upper(email)=?", emailAddr.toUpperCase());

You also do not have to put in the '%' wildcards. You can pass MatchMode (docs for previous releases here) in to tell the search how to behave. START, ANYWHERE, EXACT, and END matches are the options.

The usual approach to ignoring case is to convert both the database values and the input value to upper or lower case - the resultant sql would have something like
select f.name from f where TO_UPPER(f.name) like '%FRAN%'
In hibernate criteria restrictions.like(...).ignoreCase()
I'm more familiar with Nhibernate so the syntax might not be 100% accurate
for some more info see pro hibernate 3 extract and hibernate docs 15.2. Narrowing the result set

This can also be done using the criterion Example, in the org.hibernate.criterion package.
public List findLike(Object entity, MatchMode matchMode) {
Example example = Example.create(entity);
example.enableLike(matchMode);
example.ignoreCase();
return getSession().createCriteria(entity.getClass()).add(
example).list();
}
Just another way that I find useful to accomplish the above.

Since Hibernate 5.2 session.createCriteria is deprecated. Below is solution using JPA 2 CriteriaBuilder. It uses like and upper:
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Person> criteria = builder.createQuery(Person.class);
Root<Person> root = criteria.from(Person.class);
Expression<String> upper = builder.upper(root.get("town"));
criteria.where(builder.like(upper, "%FRAN%"));
session.createQuery(criteria.select(root)).getResultList();

Most default database collations are not case-sensitive, but in the SQL Server world it can be set at the instance, the database, and the column level.

You could look at using Compass a wrapper above lucene.
http://www.compass-project.org/
By adding a few annotations to your domain objects you get achieve this kind of thing.
Compass provides a simple API for working with Lucene. If you know how to use an ORM, then you will feel right at home with Compass with simple operations for save, and delete & query.
From the site itself.
"Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search, index updates as well as more advanced concepts such as caching and index sharding (sub indexes). Compass also uses built in optimizations for concurrent commits and merges."
I have used this in the past and I find it great.

Related

Pure SQL queries in Rails view?

I am asked to display some sort of data in my Rails App view with pure SQL query without help of ActiveRecord. This is done for the Application owner to be able to implement some third-party reporting tool (Pentaho or something).
I am bad in SQL and I am not really sure if that is even possible to do something similar. Does anyone have any suggestions?
If you must drop down to pure SQL, you can make life more pleasant by using 'find by sql':
bundy = MyModel.find_by_sql("SELECT my_models.id as id, my_articles.title as title from my_models, my_articles WHERE foo = 3 AND ... ...")
or similar. This will give you familiar objects which you can access using dot notation as you'd expect. The elements in the SELECT clause are available, as long as you use 'as' with compound parameters to give them a handle:
puts bundy.id
puts bundy.title
To find out what all the results are for a row, you can use 'attributes':
bundy.first.attributes
Also you can construct your queries using ActiveRecord, then call 'to_sql' to give you the raw SQL you're using.
sql = MyModel.joins(:my_article).where(id: [1,2,3,4,5]).to_sql
for example.
Then you could call:
MyModel.find_by_sql(sql)
Better, though, may be to just use ActiveRecord, then pass the result of 'to_sql' into whatever the reporting tool is that you need to use with it. Then your code maintains its portability and maintainability.

How to query documents in MarkLogic and process results

I've been working off of the tutorial pages but seem to have a fundamental disconnect in my thinking transitioning off of RDBMS systems. I'm using MarkLogic and handling this database interaction through the Java API focusing on the search access via POJO method outlines in the tutorial documentation.
My reference up to this point has come from here principally: http://developer.marklogic.com/learn/java/processing-search-results
My scenario is this:
I have a series of documents. We'll call them 'books' for simplicity. I'm writing these books into my DB like this:
jsonDocMgr.write("/" + book.getID() + "/",
new StringHandle(
"{name: \""+book.getID()+"\","+
"chaps: "+ book.getNumChaps()+","+
"pages: "+ book.getNumPages()+","+
"}"));
What I want is to execute the following type of operation:
-Query all documents with the name "book*" (as ID is represented by book0, book1, book2, etc)
where chaps > 3. For these documents only, I want to modify the number of pages by reducing by half.
In an RDBMS, I'd use something like jdbcTemplate and get a result set for me to iterate through. For each iteration I'd know I was working with a single record (aka a book), parse the field values from the result set, make a note of the ID, then update the DB accordingly.
With MarkLogic, I'm awash in a sea of different handlers and managers...none of which seems to follow the pattern of the ResultSet with a cursor abstraction. Ultimately I want to do a two-step operation of check the chapter count then update the page field for that specific URI.
What's the most common approach to this? It seems like the most basic of operations...
Try the high-level Java API and see if it works for you. Create a multi-statement transaction with a query by example, then use document operations.
At a lower level, the closest match to a ResultSet is the ResultSequence class. The examples at http://docs.marklogic.com/javadoc/xcc/overview-summary.html are pretty good. For updates the interaction model between Java and MarkLogic is a bit different from JDBC and SQL. There is no SELECT... FOR UPDATE syntax.
The most efficient low-level technique is to select and update in one XQuery transaction, something like a stored procedure. However this requires good knowledge of XQuery. The other low-level approach is to use an XCC multi-statement transaction, which requires a little less knowledge of XQuery.
A minor issue in your code ... you definately do NOT want to end your JSON docuement URIs with "/" as you do in your sample code. You should end them with the ".json" or some other extension or no extension but definately not "/" as that is treated specially in the server.

Hibernate and dynamic SQL

do you know anything about Hibernate and its ability to generate dynamic sql queries with HQL?
I you have any links I would appreciate posting, I can't find nothing about it in Hibernate`s documentation.
Best regards
Gabe
//edit
so maybe I will precise what I mean. I am wondering if some HQL code generates SQL queries which uses something like EXECUTE (for postgres)
Not sure what you're aiming for here? HQL in Hibernate will always generate SQL, and you can put your HQL together differently based on input. It will always generate new SQL for each permutation and run. You can make Hibernate precompile/cache queries but that's just a performance optimization and shouldn't be your first concern.
I would also consider looking into the Criteria API which lets you stay a lot more object oriented instead of working with tons of strings.
If you're talking about static queries using dynamic arguments, the syntax is
select f from Foo f where f.bar = ? and f.zim = ?
or, with named parameters
select f from Foo f where f.bar = :bar and f.zim = :zim
If you're talking about completely dynamically created queries based on a set of criteria, then the API to use is... the Criteria API.
Both are largely covered in the Hibernate reference documentation.

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Openbase SQL case-sensitivity oddities ('=' vs. LIKE) - porting to MySQL

We are porting an app which formerly used Openbase 7 to now use MySQL 5.0.
OB 7 did have quite badly defined (i.e. undocumented) behavior regarding case-sensitivity. We only found this out now when trying the same queries with MySQL.
It appears that OB 7 treats lookups using "=" differently from those using "LIKE": If you have two values "a" and "A", and make a query with WHERE f="a", then it finds only the "a" field, not the "A" field. However, if you use LIKE instead of "=", then it finds both.
Our tests with MySQL showed that if we're using a non-binary collation (e.g. latin1), then both "=" and "LIKE" compare case-insensitively. However, to simulate OB's behavior, we need to get only "=" to be case-sensitive.
We're now trying to figure out how to deal with this in MySQL without having to add a lot of LOWER() function calls to all our queries (there are a lot!).
We have full control over the MySQL DB, meaning we can choose its collation mode as we like (our table names and unique indexes are not affected by the case sensitivity issues, fortunately).
Any suggestions how to simulate the OpenBase behaviour on MySQL with the least amount of code changes?
(I realize that a few smart regex replacements in our source code to add the LOWER calls might do the trick, but we'd rather find a different way)
Another idea .. does MySQL offer something like User Defined Functions? You could then write a UDF-version of like that is case insesitive (ci_like or so) and change all like's to ci_like. Probably easier to do than regexing a call to lower in ..
These two articles talk about case sensitivity in mysql:
Case Sensitive mysql
mySql docs "Case Sensitivity"
Both were early hits in this Google search:
case sensitive mysql
I know that this is not the answer you are looking for .. but given that you want to keep this behaviour, shouldn't you explicitly code it (rather than changing some magic 'config' somewhere)?
It's probably quite some work, but at least you'd know which areas of your code are affected.
A quick look at the MySQL docs seems to indicate that this is exactly how MySQL does it:
This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a.