I am a newbie to search engines and information retrieval. Can someone explain how different is Lucene search engine compared to Azure Search.
I read the Azure Search documents and see that Azure Search supports Lucene queries as well, so is Azure Search built on top of Lucene or inherits certain features of it?
There is no proper documentation as such, can someone point me in the right direction.
Thanks in advance.
According to this Microsoft page the full text search is built on Lucene.
https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search
"The full text search engine in Azure Search is built on Apache Lucene, an industry standard in information retrieval."
Azure Search is not built on top of Apache Lucene as such, but it does support Lucene Query syntax.
https://learn.microsoft.com/en-us/rest/api/searchservice/lucene-query-syntax-in-azure-search
Related
I have tried going through the Azure data lake documentation in MSDN as well as couple of slides in slideshare to figure out an answer. From what I gathered, The Azure Data Catalog is used for discoverability based on metadata and few annotations user can provide. Would not having a content based search add more value to the lake?
Content search and full-text search on data in the Data Lake can be very useful indeed.
I would expect that you could use either HDINSIGHT or U-SQL's extensibility mechanism, to add content search (and indexing) with something like Lucene or Solr.
If you would like to see something out of the box, please file a feature request at http://aka.ms/adlfeedback. Thanks!
I am building an enterprise search application in which i am using the lucidworks as a search engine and EMC Documentum as a back end technology for storing documents and meta Data.Currently i am using DQL to fire up queries and fetch data in intermediate scenario but i am looking for some other instruments to connect the two maybe third party connectors or anything else.Please Suggest me the possible ways to connect the lucidworks with documentum.
LucidWorks offers a Connector that extracts content from documentum. So you can use LucidWorks Search as well to search for documentum content.
http://www.lucidworks.com/market_app/documentum-connector-for-lucidworks/
regards
Markus
My project is based on Questions and Answers (stackoverflow style).
I need to allow users to search for previously asked questions.
The Questions table would be like this:
Questions
-------------------------------------------------
id questions
-------------------------------------------------
1 How to cook pasta?
2 How to Drive a car?
3 When did Napoleon die?
Now when I'm going to write something to search for, I would write something like this:
When did Brazil win the world cup?
Let's say I'm gonna split this String on spaces, into an array of Strings.
What is the best SELECT SQL query to fetch all questions containing those Strings, ignoring upper case and lower case for each word, and sorting the results by the less mentioned word, why?
Because there will be so many questions which will contain When,and,will,how,etc.. , but not so many questions which will have Brazil, so Brazil would be like the Key Word.
I'm using SQL Server 2008.
You really don't want to be doing this in raw SQL.
I suggest you look into the full-text search options for your database, this might be a good place to start.
In mysql you have full-text indexes and the match() select function which allow just this,
in SQL Server you should use the function Contains()
Find more info on
http://msdn.microsoft.com/en-us/library/ms142571.aspx
Your option is not the best one. Take a look at open source Apache Solr project. http://lucene.apache.org/solr/
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.
Advanced Full-Text Search Capabilities Optimized for High Volume Web
Traffic Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces Server statistics
exposed over JMX for monitoring Linearly scalable, auto index
replication, auto failover and recovery Near Real-time indexing
Flexible and Adaptable with XML configuration Extensible Plugin
Architecture
Take a look at Detailed Features and aspecialy Query section. There all you need for your app.
I want to use lucene to make searching on my database table fast.
The table query is select x,y,z from tablexyz.The searchable field is x.It has 2 million rows.I want to use it in a web application and show the data on a search page.Has anyone used Lucene to store entire table in a text file?
I think that Apache Solr is what you are looking for.
To get started:
first read the tutorial to understand the basics,
then have a look at DataImportHandler which would probably be the easiest way to index your content.
No matter what technology you are using for your web application, Solr has a lot of connectors.
I've a jsp web application with a custom search engine.
The search engine is basically build on top of a 'documents' table of a SQL Server database.
To exemplify, each document record has three fields:
document id
'descripion' (text field)
'attachment', a path of a pdf file in the filesystem.
The search engine actually searches keywords in description field and returns a result list in an HTML page. Now I want to search keywords even in the pdf file content.
I'm investigating about Lucene, Tika, Solr, but I don't understand how I can use these frameworks for my goal.
One possible solution: using Tika to extract pdf content and store in a new document table field, so I can write SQL queries on this field.
Are there better alternatives?
Can I use Solr/Lucene indexing features as an integration of SQL-based search engine and not as a totally substitute of it?
Thanks
I would consider Lucene to be completely independent of an SQL Database, i.e. you will not use SQL/jdbc/whatever DB to query Lucene, but its own API and its own data store.
You could of course use Tika to extract the full text of a pdf, store it, and use whatever your SQL DB provides re. fulltext search capacity.
If you are using Hibernate, Hibernate Search is a fantastic product which integrates both an SQL store and Lucene. But you would have to go the Hibernate/JPA way, which might be overkill for your project.