What is the SOLR plugin for Liferay meant for? - apache

I am using Liferay 6.1 and I am trying to learn how to incorporate search functionality into Liferay Portal. I was able to run Apache SOLR inside Liferay's tomcat container but I don't understand what the solr plugin for liferay is meant for.
Here is the link
Can someone please explain what are the benefits for using the plugin (for liferay) and what it accomplishes on top of using SOLR?
Thanks

Per this link it is to externalize the search function from the portal.
Using Solr instead of Lucene gives you the additional capabilities of Solr such as Replication, Sharding, Result clustering through Carrot2, Use of custom Analyzers/Stemmers etc.
It also can offload search server processing to a separate cluster.
Opens up the possibilities of search driven UI (facetted classification etc) separate from your portal UI.

Related

How to auto-index data using solr and nutch?

i want to automatically index a document or a website when it is fed to apache solr . How we can achieve this ? I have seen examples of using a CRON job that need to be called via a php script , but they are not quite clear in explaination. Using java api SolrJ , is there any way that we can index data automatically , without having the need to manually do it ??
You can write a scheduler and call the solrJ code which is doing indexing/reindexing.
For writing the scheduler please refer below links
http://www.mkyong.com/java/how-to-run-a-task-periodically-in-java/
http://archive.oreilly.com/pub/a/java/archive/quartz.html
If you are using Apache Nutch, you have to use Nutch solr-index plugin. With using this plugin you can index web documents as soon as they be crawled by Nutch. But the main question would be how can you schedule Nutch to start periodically.
As far as I know you have to use a scheduler for this purpose. I did know an old Nutch project called Nutch-base which uses Apache Quartz for the purpose of scheduling Nutch jobs. You can find the source code of Nutch-base from the following link:
https://github.com/mathieuravaux/nutchbase
If you consider this project there is a plugin called admin-scheduling. Although it is implemented for and old version of Nutch but it could be a nice start point for developing scheduler plugin for Nutch.
It is worth to say that if you are going to crawl website periodically and fetch the new arrival links you can use this tutorial.

Viewing Apache solr logs on windows

I have drupal based site with solr integration. My localhost is on windows and the live site on Linux.
How do I enable and view solr logging for both setups? I can see a log folder in my localhost but its empty.
Just to elaborate, solr search etc works great in both setups. However I built a solr view that works perfectly on local but gives less accurate results on live. So I wanted to see the final solr queries being built to see the source of the difference.
While starting the Solr instance pass the following parameter to enable Solr logging to file.
-Djava.util.logging.config.file=etc/logging.properties
Then modify your /example/etc/logging.properties inside you Solr instance to customize your logging pattern.
Using Solr Version: Apache Solr 8.9.0
You could use the Solr Administration User Interface
Go to Solr Admin UI and click the link for "Logging".
Then you will see log info.
Selecting the Level link on the left, you see the hierarchy of classpaths and classnames for your instance.

ElasticSearch Indexing Confluence pages

Can ElasticSearch index Confluence pages?
There are a lot of river plugins but none for Confluence. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
Although there is a github project https://github.com/obazoud/elasticsearch-river-confluence but the last commit is a year ago, so I guess it's not up-to-date.
Elasticsearch deprecated river.
Elasticsearch has a solution built over it called workplace search which could connect to confluence for ingesting data.
Ideally, you might need to do it by the Confluent API via a script to Elasticsearch. You might also need to use the "ingest-attachment" plugin if you need to parse PDF content.

How can I integrate the Apache Solr Search with my Drupal 7 Site?

Can anyone Give me the Good tutorial links which will be helpful to me so I can check that How to Integrate the Solr Search with my Drupal Site to get good performance.
What are the modules available for Drupal 7.x Version of Apache Solr Search.
Which version of Solr will support the Drupal 7.x.
What are the Configuration should required in Apache Solr / Drupal 7.x to Search?
There are two modules that support Solr with Drupal that are widely used:
Search API Solr
ApacheSolr search
Both have their various configuration 'quirks', I'd say you'd need to try both to see how they fit in with your site, to see which suits you best.
Make sure you have Java 5 or higher installed already on your server.
Tutorial on setting up site with Search API for Solr
Tutorial on setting up site with ApacheSolr
Look at and use Search API - Apache Solr module will not be the way forward for the future of Drupal.
The two maintainers of Search API and Apache Solr have meet in person and have determined a way forward for advanced searches with Drupal and they both have agreed that Search API is it.

How To implement LuceneNet using Amazon S3

I'm trying to implement Lucene in my app using Amazon S3 to storage the indexes that I generate, but I can find any code examples or a clear article. So anyone that have some kind of experience with this please give a guide or something that can help me start
There's a similar question here.
Here's an interesting article of how the biggest Solr service provider Lucid Imagination proposes to deploy their Solr implementation on EC2.
And here's their Search-as-a-Service solution.
If you're not bound to S3, you can use dedicated Solr cloud service called WebSolr.
Also, if you need complete ALM/CI solution for your development project, there's a WebSolr module included in CloudBees.