related keyword recommendation using solr and mongodb - apache

Im new to solr...
I have been looking into related content recommendation engine... for implementing it to my core php and mongodb website.. its music listening website.. so i have added keywords to every music like singer,music,lyrics to mongodb.
My question is related music recommendation using keywords can solr (more like this handler) recommend it?
example keywords : bobby-singer,a-r-rehaman,shreya goshal
its should look for related keywords in order like:
bobby-singer,a-r-rehaman,shreya goshal
bobby-singer,a-r-rehaman
bobby-singer,shreya goshal
a-r-rehaman,shreya goshal
bobby-singer
a-r-rehaman
shreya goshal
my keywords are already in mongodb.. im planning to work with apache solr morelikethis handler.. or please recommend me some good recommendation engine..
Thanks

There are a couple of different things here.
First of all, you can use MLT to get Solr to bring to you related documents but...
I am wondering if you could also benefit from synonyms so that on certain searches you can get results that are similar which may satisfy the user
And also if you have already the list of relationships you can build a small index where you can run an OR of your query and get potential related searches or execute potential related searches and get related results
Hope this helps

Related

Disabling built-in indexes in Google Cloud Datastore

I'm currently doing a benchmark to see if Google Cloud Datastore could suit our needs but I've got a problem with how indexes are handled.
I know that I will never have to filter on anything except the key field, and thus I would like to be able to disable the built-in indexing of all the other fields. I just want to use it as a key/value store.
I'm currently looking at potentially multiple TB of indexes if I cannot disable them (~50 fields, billions of rows) and that would kill our budget.
Is there any way to remove these indexes ? It seems the index.yaml file this link talks about is only about composite indexes.
Thanks for your help !
Found it ! You can explicitly tell Datastore not to index your field by doing it like this (excluded properties)
I have researched in Datastore github issues about this same question, about (2015), the last inquiry was on 2019. But there is no response. You can ask there if it has been any
I have also researched in the Public Issue Tracker PIT of Google Cloud Platform for an existing Feature Request (FR) or Issue related with this, but not found any.
I think the best way to proceed is to file a FR with the proper components. In this way the Engineering team will have visibility about this. The PIT uses the number of "stars" (people who have indicated interest in an issue) to prioritize work on the platform. Given that there is no FR opened, you should open a new one.

how to store different files that need to searched in ASP.NET MVC 4 website

My requirement is like job sites where a user can upload a document(can be PDF,Text or word document) like Resume/CV. Then all these documents can be searched for a specific or a combination of keyword and they also have to be ranked based on those key words. I need to know which technology can be good from performance point of view when the number of files are huge and also there are good number of request for searching and indexing.
The website is built using SQL Server. So can I store those files in SQL Server? Will be good in terms of performance.
Or can it be done alone using Lucene.NET and i can store those files in single folder?
I think, the best suggestion is to use Lucene ....
you can save your documents as they are with some unique path name/file_name , and use that as identifier when you index the documents ... I am sure you can find a lot of similar examples if you search Lucene ..

User Specific Lucene Search

I don't think this is a very obscure Lucene problem, but somehow I just don't seem to be able to find a good solution to it. I will use an example.
Let's say I am building a news articles website. Registered users can bookmark articles that they are interested in. I want to allow users to search for only articles that he/she bookmarks. For the sake of example, let's also assume that a user can potentially bookmark thousands of articles, and we have hundreds of thousands of users in our database. How do I build a scalable solution for this problem?
Thanks a lot!
This is a very typical Lucene problem as it does not support joins. More specifically, there's no first class support and you have to find your ways around it. I can suggest a few:
You could have a database, which has users, articles and bookmarks tables (the latter would have foreign keys pointing to the first two). You would also have articles indexed in Lucene. When running a search against articles, you could write a Lucene Filter which would exclude all articles not bookmarked by the current user.
You could index all articles and bookmarks in Lucene - probably best if you do this using separate indices. Then you could run a query for bookmarks (to retrieve which articles current user has bookmarked) and then run another separate query for articles. Like in the previous example, you could use the results of the first query to exclude all other articles which are not bookmarked by the current user.
I personally prefer option #1 as this is classical relational structure and databases are designed for exactly this purpose. With the option #2 you would have to modify both user storage and Lucene index when user gets deleted.

Drupal 6: How to sort/filter search results by date

How to customize standard search behavior in Drupal 6? I need search results to be sorted by date. In example, people want to show items within 2 past weeks or something like that.
I've tried a lot things on this reference without luck. Have you ever encountered such problem? Any help will b appreciated. Thanks!
You can sort by date using search solutions like Apache Solr. But I understand you want to use standard Drupal search.
In that situation I would recommend using the faceted search module http://drupal.org/project/faceted_search
Faceted Search module does not require the installation of a separate search engine. It also has views integration which will allow you to do thinks like show results from last 2 weeks and so on.
Please see:
http://drupalcode.org/viewvc/drupal/contributions/modules/faceted_search/README.txt?view=co
You can search for "views" in the above document for information.
You can choose to also not show any facets if you don't want your users to see them. In that case you would be installing the module only because of the benefits of views integration.

Automating WebTrends analysis

Every week I access server logs processed by WebTrends (for about 7 profiles) and copy ad clickthrough and visitor information into Excel spreadsheets. A lot of it is just accessing certain sections and finding the right title and then copying the unique visitor information.
I tried using WebTrends' built-in query tool but that is really poorly done (only uses a drag-and-drop system instead of text-based) and it has a maximum number of parameters and maximum length of queries to query with. As far as I know, the tools in WebTrends are not suitable to my purpose of automating the entire web metrics gathering process.
I've gotten access to the raw server logs, but it seems redundant to parse that given that they are already being processed by WebTrends.
To me it seems very scriptable, but how would I go about doing that? Is screen-scraping an option?
I use ODBC for querying metrics and numbers out of webtrends. We even fill a scorecard with all key performance metrics..
Its in German, but maybe the idea helps you: http://www.web-scorecard.net/
Michael
Which version of WebTrends are you using? Unless this is a very old install, there should be options to schedule these reports to be emailed to you, and also to bookmark queries. Let me know which version it is and I can make some recommendations.